Before diving into the concept of generators, it’s crucial to differentiate between “iterables” and “iterators” in Python, as these are foundational to understanding how generators work.
Iterables and Iterators
In Python, data structures like lists
, tuples
, strings
, and dictionaries
are classified as iterables. An iterable is essentially an object that can return its elements one at a time, allowing it to be looped over in a for
loop. Let’s illustrate this with an example:
test_list = [1, 2, 3]
for val in test_list:
print(val)
The output will be:
1
2
3
For an object to be considered an iterable, it must implement the __iter__()
method. We can verify whether our list object has this method using Python’s built-in dir()
function:
print(dir(test_list))
The output will include the following methods:
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
As shown, the list object includes the __iter__()
method, confirming that it is an iterable.
However, an iterator is slightly different from an iterable. An iterator is an object that maintains a state and remembers where it is during the iteration. It knows how to get the next value in the sequence.
If we attempt to get the next value directly from our list without converting it into an iterator, Python will raise an error:
print(next(test_list))
This code will result in the following error:
TypeError: 'list' object is not an iterator
To successfully retrieve the next value, we must first convert the list into an iterator:
test_iter = iter(test_list)
print(test_iter)
This will output:
<list_iterator object at 0x7fd6d6f2de90>
Now that we have an iterator, we can use the next()
function to iterate through its elements:
print(next(test_iter)) # Outputs: 1
print(next(test_iter)) # Outputs: 2
print(next(test_iter)) # Outputs: 3
If we call next()
again after exhausting all elements, it will raise a StopIteration
error:
print(next(test_iter))
This will output:
StopIteration
This error indicates that the iterator knows where to stop. In a for
loop, Python handles this internally and stops iterating when there are no more items:
for val in test_iter:
print(val)
For a more detailed understanding, we can simulate the iteration process using a while
loop:
while True:
try:
item = next(test_iter)
print(item)
except StopIteration:
break
Creating Custom Iterators
Python also allows you to create custom iterator classes. Below is an example of a custom iterator similar to the built-in range
function:
class RangeNew:
def __init__(self, start, end, step=1):
self.value = start
self.end = end
self.step = step
def __iter__(self):
return self
def __next__(self):
if self.value >= self.end:
raise StopIteration
current = self.value
self.value += self.step
return current
range_iter = RangeNew(1, 10, 2)
for val in range_iter:
print(val)
The output will be:
1
3
5
7
9
Generators
Generators in Python are a powerful tool for creating iterators in a more concise and readable manner. They differ from regular functions in that they don’t hold the entire result in memory at once but yield one result at a time.
Creating Generators
There are two main ways to create generators:
Using a Generator Function
A generator function uses the yield
keyword to produce a sequence of values over time. Here’s an example:
def squares_gen(num):
for i in num:
yield i**2
Compare this to a regular function that returns a list:
def squares(num):
results = []
for i in num:
results.append(i**2)
return results
Performance Comparison
- Elapsed time for list:
7.360722
seconds - Elapsed time for generator:
5.999999999950489e-06
seconds - Time difference:
7.360716
seconds fornum = np.arange(1,10000000)
Using Generator Expressions
Similar to list comprehensions, generator expressions are an efficient way to create generators:
resl = [i**2 for i in num] # List comprehension
resg = (i**2 for i in num) # Generator expression
Performance Comparison
- Elapsed time for list:
7.663468000000001
seconds - Elapsed time for generator:
9.999999999621423e-06
seconds - Time difference:
7.663458000000001
seconds fornum = np.arange(1,10000000)
Retrieving Generator Results
You can retrieve results from a generator function using either the next()
function or a loop:
- Using
next()
:
resg = squares_gen(num)
print('res of generator: ', next(resg))
print('res of generator: ', next(resg))
print('res of generator: ', next(resg))
- Using a Loop
for n in resg:
print(n)
Advantages of Using Generators
- Memory Efficiency: Generators do not store the entire result in memory, making them more memory-efficient than lists, especially for large datasets.
- Performance: Generators are faster in terms of execution time, particularly when dealing with large data.
Summary of Results
- Function vs. Loop: Using functions to generate values is generally faster than using loops or list comprehensions, regardless of whether you’re working with lists or generators.
- Generators vs. Lists: The performance advantage of generators becomes more pronounced when using generator expressions compared to list comprehensions. Generators are consistently faster and use less memory.
- Complete Dataset Requirement: When the entire result is needed at once, the time and memory used to create a list or convert a generator to a list are almost the same.
Overall, generators provide significant performance benefits in both execution time and memory usage.
Appendix
Measuring Execution Time
To measure the time taken by a process, I used Python’s time
module:
time.process_time()
gives the system and user CPU time of the current process in seconds.time.process_time_ns()
provides the result in nanoseconds.
Note: The “time taken” in this study may vary depending on the computer and CPU state. However, generators consistently demonstrate faster performance across different scenarios.