How to Iterate Efficiently Through Lists Files and Dictionaries

by: George El., January 2019, Reading time: 3 minutes

In this post I will show you how to iterate efficiently through lists, dictionaries and files in pythons

if you are familiar with other languages, maybe you are used at iterating through an array/list using an index like

for(var i=0; i<arr.length; i++) {
    print(arr[i])
}

this in python is not needed as lists and dictionaries are iterables and implement the next method. This means that you can iterate through them like this:

for item in list:
    print(item)

Lets see some examples

>>> l=["john","nick","pat","larry"]
>>> for name in l:
...     print(name)
...
john
nick
pat
larry

if you want also an index, you can use the enumerate function. The enumerate returns a tuple

>>> for name in enumerate(l):
...     print(name)
...
(0, 'john')
(1, 'nick')
(2, 'pat')
(3, 'larry')

you can unpack the tuple like this:

>>> for index,name in enumerate(l):
...     print(index,name)
...
0 john
1 nick
2 pat
3 larry

dictionaries can be iterated also using for, but you get the keys back. Please note ordered is not guarranteed.

>>> dict1 = { 'a':1,'b':2,'c':3 }
>>> for key in dict1:
...     print(key)
...
a
b
c

we can get the values also in the following ways

>>> for key in dict1:
...     print(key,dict1[key])
...
a 1
b 2
c 3

the above way is not efficient because it rehashes the keys. you should use dict1.values() or dict1.items() or

>>> for value in dict1.values():
...     print(value)
...
1
2
3

or

>>> for key,value in dict1.items():
...     print(key,value)
...
a 1
b 2
c 3

If you use python 2.X you should use iteritems() which is more efficient. In python3 it has become items()

Lets say now I have a file, I want to read the lines and sort them. The file looks like this:

george
adam
patty
jenny
amanda

Open the file and read in a list like this

>>> with open("test.txt") as f:
...     l = f.readlines()
>>> for item in l:
...     print(item)

this method is not efficient. If you had a huge file, you would load all this into memory. When you open a file it returns an TextIOWrapper which is iterable and an iterator

>>> f=open("test.txt")
>>> f
<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>

Let see if it has the iter method and if the iter returns an iterator which has the next method

>>> '__iter__' in dir(f)
True
>>> '__next__' in dir(f)
True
>>> f.__iter__() is f
True
>>>

as we see it returns itself so it is both iterable and iterator. This means we can iterate it like this:

>>> for item in open("test.txt"):
...     print(item, end='')
...
george
adam
patty
jenny
amanda

the items have already a new line character at the end, that is why i used end=“, or i could use strip(). Lets sort the items.

>> for item in sorted(open("test.txt")):
...     print(item.strip())
...
adam
amanda
george
jenny
patty

what if i want to get it as a list?

>>> [item.strip() for item in sorted(open("test.txt"))]
['adam', 'amanda', 'george', 'jenny', 'patty']

Lets say now I have some duplicates and I want also to remove the duplicates. To remove the duplicates we have to create a set. Because sets are unordered, we will first convert the list to set and then sort it. Lets say I have

george
adam
patty
jenny
amanda
adam

Now I can do all this in one line

>>> [item.strip() for item in sorted(set(open("test.txt")))]
['adam', 'amanda', 'george', 'jenny', 'patty']
comments powered by Disqus