In this short guide, I'll show you how to iterate over chunks of data in Python. We will cover 3 examples showing how to iterate over chunks.
itertools + iter()
To iterate a list in chunks in Python we can use itertools. We split the list in chunks of a specific size using the itertools module and a while loop:
import itertools
my_iterator = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
chunk_size = 3
while True:
chunk = list(itertools.islice(my_iterator, chunk_size))
if not chunk:
break
print(chunk)
result:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]
Where:
- create an iterator
my_iterator
with a list of integers - sets chunk_size to 3
- while loop is used to iterate over the iterator in chunks of 3
itertools.islice()
creates a slice object- to get the next chunk_size elements from the iterator.
list()
converts the slice object to a list of elements in the chunk.- print chunk elements if the chunk is not empty
- Continue until:
- the end of the iterator is reached
- there are no more chunks left
itertools + grouper()
Alternatively we can use grouper to split iterable into chunks with Python:
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
my_iterator = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
chunk_size = 3
for chunk in grouper(my_iterator, chunk_size):
print(chunk)
result:
(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10, None, None)
To split the iterable into groups of elements we do:
grouper()
is a function that groups an iterable into chunks of size nargs
is a list of n references to the same iterator object created from iterableitertools.zip_longest()
aggregates the elements of each chunk into tuples, filling missing values with fillvalue.- The
for
loop iterates over my_iterator in chunks of chunk_size.
generator function
Finally we can create function to generator chunks in Python:
def chunk_iterator(iterator, chunk_size):
chunk = []
for item in iterator:
chunk.append(item)
if len(chunk) == chunk_size:
yield chunk
chunk = []
if chunk:
yield chunk
my_iterator = iter([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
chunk_size = 3
for chunk in chunk_iterator(my_iterator, chunk_size):
print(chunk)
Result:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]