29  Iterators, Generators, and Classic Coroutines

NoteCore idea

Python’s for loop calls iter() then repeatedly calls next(). Generators implement the iterator protocol lazily. Classic coroutines extend generators to receive values via send(). Understanding the iterator protocol makes all of Python’s loop machinery transparent.

In this chapter you will learn to:

  1. Trace what for x in obj does at the protocol level.
  2. Distinguish iterables (build an iterator) from iterators (have state).
  3. Write a generator function with yield.
  4. Use itertools for filtering, mapping, merging, and combinatoric generation.
  5. Compose generators with yield from.
  6. Write a classic coroutine that receives values via send().

29.1 How for actually works

Every for loop is desugared into the same pattern:

s = "ABC"
it = iter(s)
next(it), next(it), next(it)
('A', 'B', 'C')
next(it)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[2], line 1
----> 1 next(it)

StopIteration: 

iter(s) calls s.__iter__() and returns an iterator. next(it) calls it.__next__(). When the iterator is exhausted, it raises StopIteration, which for catches silently.

flowchart TB
  A["for x in obj:"] --> B["it = iter(obj)"]
  B --> C{"next(it)"}
  C -- value --> D["x = value<br/>run loop body"]
  D --> C
  C -- StopIteration --> E["loop ends"]

Every for loop is this dance. Custom iterables only need to answer two questions: how do I produce an iterator (__iter__)? how does that iterator yield the next value (__next__)?

A useful side form: iter(callable, sentinel) creates an iterator that calls callable() until the result equals the sentinel. It’s the cleanest way to wrap a “call until done” function — file readers, queue pollers, network reads — into something a for loop can drive.

import io
fp = io.StringIO("line one\nline two\n\nignored\n")
for line in iter(fp.readline, "\n"):
    print(repr(line))
'line one\n'
'line two\n'

Walking through what happens:

  • io.StringIO(...) gives us an in-memory file-like object backed by a string — same .read(), .readline(), .write() API as a real file, but with no filesystem touch. The bytes counterpart is io.BytesIO. Handy for testing and for passing “a file” to code that expects one when you have only a string in hand.
  • iter(fp.readline, "\n") builds an iterator. Each next() calls fp.readline(); when that call returns the sentinel "\n" (an empty-content line, just a newline), the iterator stops.
  • The for loop drives that iterator and prints each line. The blank line and everything after it are skipped — the sentinel matched.

The general pattern: iter(callable, sentinel) turns any zero-argument call-until-done function into a one-line iterable.

29.2 Iterables versus iterators

The two roles are distinct:

  • An iterable has __iter__ — calling it returns a fresh iterator each time.
  • An iterator has __next__ and __iter__ (which returns self).

The most common bug is making one class do both — then iterating consumes the only iterator and the next loop sees nothing.

class Sentence:
    def __init__(self, text):
        self._words = text.split()
    def __iter__(self):
        return SentenceIterator(self._words)

class SentenceIterator:
    def __init__(self, words):
        self._words = words
        self._index = 0
    def __next__(self):
        try:
            word = self._words[self._index]
        except IndexError:
            raise StopIteration
        self._index += 1
        return word
    def __iter__(self):
        return self

s = Sentence("Hello beautiful world")
list(s), list(s)
(['Hello', 'beautiful', 'world'], ['Hello', 'beautiful', 'world'])

Walking through the two classes:

  • Sentence is the iterable. Its __iter__ returns a fresh SentenceIterator every time it’s called — that’s why two passes both work.
  • SentenceIterator is the iterator. It carries the cursor (self._index) — the per-iteration state.
  • __next__ returns the current word and advances. When self._index walks off the end, indexing raises IndexError, which we translate to StopIteration — the protocol’s “I’m done” signal.
  • SentenceIterator.__iter__ returns self. Iterators are required to be iterable too, so that iter(it) is it and for x in it: works directly on an iterator.

Two iterations gave two complete results — Sentence.__iter__ produces a fresh SentenceIterator each time. Compare with a class whose __iter__ returns self: the second loop would be empty.

The general pattern: keep the iterable (the data) and the iterator (the cursor) as separate objects. The iterable is the public type; the iterator is the disposable per-loop helper.

29.3 Generator functions

A generator function — any function with yield — produces an iterator without the boilerplate. The two-class Sentence/SentenceIterator setup collapses to a few lines:

class SentenceGen:
    def __init__(self, text):
        self._text = text
    def __iter__(self):
        for word in self._text.split():
            yield word

list(SentenceGen("Hello beautiful world")), list(SentenceGen("Hello beautiful world"))
(['Hello', 'beautiful', 'world'], ['Hello', 'beautiful', 'world'])

Walking through what’s happening:

  • __iter__ contains yield, so Python treats it as a generator function — every call returns a fresh generator object instead of running the body.
  • That generator object is itself an iterator: it has __next__ and __iter__, both supplied automatically.
  • Each next() runs the body up to the next yield, hands back the value, and pauses. The function’s local state (for word in ... cursor included) is preserved between calls.
  • When the loop finishes, the generator function returns, which Python converts into a StopIteration.

Two passes both work: each call to __iter__ builds a new generator with its own for word in ... cursor.

The general pattern: any time you’d write a separate iterator class with a manual cursor, write a generator function instead. Same protocol, a fraction of the code.

A generator expression is the inline version:

gen_exp = (word for word in "Hello beautiful world".split())
next(gen_exp), next(gen_exp)
('Hello', 'beautiful')

For lazy iteration over a regex, the generator-expression form is even shorter:

import re
RE_WORD = re.compile(r"\w+")

class LazySentence:
    def __init__(self, text):
        self.text = text
    def __iter__(self):
        return (m.group() for m in RE_WORD.finditer(self.text))

list(LazySentence("Hello there, beautiful world!"))
['Hello', 'there', 'beautiful', 'world']

The text is never split into a list — finditer returns matches lazily, and each call to iter() creates a fresh generator.

29.4 The itertools cookbook

itertools is the standard library’s lazy-iterator toolkit. Worth knowing by category:

import itertools, operator, functools

list(itertools.compress("ABCDEF", [1, 0, 1, 0, 1, 1]))
['A', 'C', 'E', 'F']

compress(data, selectors) walks data and selectors in lockstep and yields each data element whose matching selector is truthy. Here the selectors are [1, 0, 1, 0, 1, 1] — keep, drop, keep, drop, keep, keep — so the output is ['A', 'C', 'E', 'F']. Useful when you’ve computed a boolean mask separately and want to apply it to a sequence.

list(itertools.dropwhile(lambda x: x < 5, [1, 5, 2, 6, 7]))
[5, 2, 6, 7]

dropwhile(pred, iterable) skips elements while pred(x) is true; once it’s false, the rest of the input is yielded — even elements that would have made pred true again. Here 1 < 5 is true (drop), 5 < 5 is false (stop dropping), and from there everything passes through: [5, 2, 6, 7]. Note 2 survives — dropwhile doesn’t re-check after the first false.

list(itertools.takewhile(lambda x: x < 5, [1, 4, 6, 4, 1]))
[1, 4]

takewhile is the mirror: yield elements while pred(x) is true; stop on the first false. 1 < 5, 4 < 5 (keep both), 6 < 5 is false (stop). The trailing 4, 1 are not yielded even though they’d individually satisfy the predicate. Output: [1, 4].

list(itertools.accumulate([1, 2, 3, 4, 5])), list(itertools.accumulate([1, 2, 3, 4, 5], operator.mul))
([1, 3, 6, 10, 15], [1, 2, 6, 24, 120])

accumulate folds left-to-right and yields each intermediate result (not just the final one). With the default +: 1, 1+2=3, 3+3=6, 6+4=10, 10+5=15 — running sum. With operator.mul: 1, 1*2=2, 2*3=6, 6*4=24, 24*5=120 — running product. Output: ([1, 3, 6, 10, 15], [1, 2, 6, 24, 120]).

list(itertools.chain("ABC", range(2)))
['A', 'B', 'C', 0, 1]

chain concatenates iterables lazily — same as in chapter 11. The string "ABC" yields three characters; range(2) yields 0, 1. Combined: ['A', 'B', 'C', 0, 1]. Mixed types in the output are fine — the result is just whatever each input yielded.

list(itertools.product("AB", range(2)))
[('A', 0), ('A', 1), ('B', 0), ('B', 1)]

product(a, b) is the cartesian product — every pair (x, y) for x in a and y in b. Output: [('A', 0), ('A', 1), ('B', 0), ('B', 1)]. Equivalent to a nested for-loop, written inline.

list(itertools.combinations("ABC", 2)), list(itertools.permutations("ABC", 2))
([('A', 'B'), ('A', 'C'), ('B', 'C')],
 [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')])

combinations("ABC", 2) yields unordered pairs without repetition: [('A', 'B'), ('A', 'C'), ('B', 'C')]. permutations("ABC", 2) yields ordered pairs: same three pairs plus their reversed forms — [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]. The same distinction from the standard-library tour: combinations cares about which; permutations cares about which and in what order.

list(itertools.islice(itertools.cycle("ABC"), 7))
['A', 'B', 'C', 'A', 'B', 'C', 'A']

cycle("ABC") repeats A, B, C, A, B, C, A, B, C, ... forever. islice(it, 7) takes the first 7. Output: ['A', 'B', 'C', 'A', 'B', 'C', 'A'] — two full cycles plus one. The pattern is the canonical “round-robin pick the next colour/server/slot” loop, capped at the count you actually need.

Two more daily idioms — keep filterfalse for “the rejected items” and starmap for “the same function on tuples”:

list(itertools.filterfalse(lambda x: x % 2, range(10)))
[0, 2, 4, 6, 8]

filterfalse(pred, iterable) keeps elements where pred(x) is falsy — the inverse of filter. n % 2 is 0 for evens (falsy), so the output is [0, 2, 4, 6, 8]. Useful as the “rejected” half of a partition; pair filter(pred, ...) with filterfalse(pred, ...) to split an iterable.

list(itertools.starmap(pow, [(2, 3), (3, 2), (10, 3)]))
[8, 9, 1000]

starmap(fn, iterable_of_tuples) calls fn(*tuple) for each tuple — the elements get spread (starred) as positional arguments. So pow(2, 3) = 8, pow(3, 2) = 9, pow(10, 3) = 1000. Output: [8, 9, 1000]. Compare with map(pow, ...) which would only work on a single iterable; starmap is the form for “I have an iterable of argument-tuples already.”

The reduction functions — all, any, sum, max, min, functools.reduce — consume an iterable and return a single value:

all([1, 2, 3]), any([0, 0, 1]), sum([1, 2, 3]), functools.reduce(operator.add, [1, 2, 3, 4, 5])
(True, True, 6, 15)

Four reductions in one cell:

  • all([1, 2, 3]) is True — every element is truthy. all([]) is True (vacuously); all([1, 0, 2]) would be False because of the 0.
  • any([0, 0, 1]) is True — at least one truthy element. any([]) is False (vacuously); any([0, 0, 0]) would be False.
  • sum([1, 2, 3]) is 6 — adds the elements with an initial value of 0 by default. Pass sum(xs, start) to use a different initial value.
  • functools.reduce(operator.add, [1, 2, 3, 4, 5]) is 15 — the same fold from earlier in the chapter, applied with add.

Both all and any short-circuit: all stops at the first falsy element; any stops at the first truthy. That’s why both are O(n) worst-case but often O(1) on real data — pair them with a generator expression for max efficiency: any(x > 100 for x in big_list).

29.5 Walrus in iteration

The walrus operator := (3.8+) is the cleanest way to write “read until empty”:

import io
buf = io.BytesIO(b"abcdefghij")
chunks = []
while (chunk := buf.read(3)):
    chunks.append(chunk)
chunks
[b'abc', b'def', b'ghi', b'j']

Walking through the loop:

  • buf.read(3) returns up to 3 bytes; at end-of-stream it returns b"" (empty, falsy).
  • chunk := buf.read(3) is assignment-as-expression: it stores the result in chunk and evaluates to that result, so the while can test it.
  • The loop body uses chunk — the same name the condition just bound. No re-read, no off-by-one.
  • When buf.read(3) returns b"", the condition is falsy and the loop exits naturally.

The condition reads “while the next chunk, bound to chunk, is truthy.” Replaces the while True / if not …: break pattern.

The general pattern: when a value needs to be both tested and used inside the loop, walrus binds it once at the condition.

29.6 yield from

yield from sub_iterable is more than a for x in sub: yield x — it forwards send() and throw() to the inner generator, which is what makes it the foundation of asyncio.

def chain(*iterables):
    for it in iterables:
        yield from it

list(chain("ABC", range(3), [10, 20]))
['A', 'B', 'C', 0, 1, 2, 10, 20]

Walking through chain:

  • *iterables collects all positional arguments into a tuple — chain("ABC", range(3), [10, 20]) gives iterables = ("ABC", range(3), [10, 20]).
  • For each one, yield from it re-yields every value the sub-iterable produces. The outer caller sees one flat stream — 'A', 'B', 'C', 0, 1, 2, 10, 20.
  • The same loop without yield from would be for x in it: yield x. Equivalent for value flow, but yield from also forwards send and throw — the machinery asyncio relies on.
class Tree:
    def __init__(self, value, *children):
        self.value = value
        self.children = list(children)

def depth_first(node):
    yield node.value
    for child in node.children:
        yield from depth_first(child)

t = Tree("root",
         Tree("A", Tree("A1"), Tree("A2")),
         Tree("B", Tree("B1")))
list(depth_first(t))
['root', 'A', 'A1', 'A2', 'B', 'B1']

Walking through the tree walk:

  • Tree(value, *children) stores the node value and a list of child trees — a tiny n-ary tree.
  • depth_first yields the current node’s value, then recurses into each child. Each recursive call is itself a generator.
  • yield from depth_first(child) plugs that child generator into the outer one — every value the child yields flows through to the caller without manual relay.
  • The output ['root', 'A', 'A1', 'A2', 'B', 'B1'] is a depth-first preorder traversal.

The general pattern: when a generator wants to delegate to another generator (or any iterable), yield from is the one-line way. It’s also the only way to forward exceptions and send() correctly, which is why it underpins coroutine composition.

29.7 A note on classic coroutines

Generators can also receive values via gen.send(x) — a generator-based form of coroutine that predates async/await. It’s still a useful internal mechanism (it’s how yield from forwards send and throw), but day-to-day asynchronous code uses native coroutines instead. The deep treatment of async def / await is in Chapter 33.

TipWhy this matters

Every Python for loop is secretly: it = iter(obj); next(it) in a loop. Every generator function is a factory for iterator objects. yield from connects generators into pipelines, forwarding values and exceptions transparently. This is the foundation that async/await builds upon.

29.8 Build: a generator-based ETL pipeline

The most natural use of generators in real code: an ETL pipeline (extract, transform, load) where each stage is its own generator and the stages compose by chaining iterables. Memory stays constant — only one record exists in flight at any moment.

Step 1: an extractor that yields raw records. Skip blank lines and comments; yield the rest:

def extract(text):
    for line in text.strip().splitlines():
        line = line.strip()
        if not line or line.startswith("#"):
            continue
        yield line

raw = """
# header — skipped
Alice,95
Bob,45

Carol,72
"""

list(extract(raw))
['Alice,95', 'Bob,45', 'Carol,72']

for line in ... .splitlines() reads the multi-line string lazily-ish. continue skips the unwanted lines; yield line produces the rest. The function is a generator because of yield — calling extract(raw) returns a generator object, not a list.

Step 2: parse and filter — generators chained as iterables. Each stage takes the previous generator and yields the next-stage values. parse produces dicts; passing filters by score:

def parse(lines):
    for line in lines:
        name, raw_score = line.split(",")
        yield {"name": name.strip(), "score": int(raw_score)}

def passing(records, threshold=60):
    for r in records:
        if r["score"] >= threshold:
            yield r

list(passing(parse(extract(raw))))
[{'name': 'Alice', 'score': 95}, {'name': 'Carol', 'score': 72}]

passing(parse(extract(raw))) is a pipeline — three generators stacked. The innermost runs lazily; each next() on the outer pulls one record through every stage. Memory stays at one record because the previous stages don’t materialise — they just yield on demand. Reading extract -> parse -> passing left-to-right is the dataflow direction; reading the call right-to-left gives the same order in code.

Step 3: batch using yield from for the trailing remainder. A batch(records, n) generator yields lists of up to n records — useful for sending records in chunks to a database or HTTP API. The trailing partial chunk needs special handling:

def batch(records, n):
    chunk = []
    for r in records:
        chunk.append(r)
        if len(chunk) == n:
            yield chunk
            chunk = []
    if chunk:                           # final partial chunk, if any
        yield chunk

raw_many = "\n".join(f"User{i},{i*10}" for i in range(7))
pipeline = batch(passing(parse(extract(raw_many)), threshold=20), n=3)
list(pipeline)
[[{'name': 'User2', 'score': 20},
  {'name': 'User3', 'score': 30},
  {'name': 'User4', 'score': 40}],
 [{'name': 'User5', 'score': 50}, {'name': 'User6', 'score': 60}]]

The generator accumulates records into chunk. When chunk reaches size n, it yields the list and starts a fresh one. After the loop, the trailing if chunk: yield chunk handles the leftover (3, 7 → two chunks of 3 plus one of 1). The final pipeline batch(passing(parse(extract(...)))) is four lazy stages stacked; calling list(pipeline) is what actually pulls every record through.

The build is the chapter’s lesson made concrete: each yield-using function is a generator that participates as an iterable in the next stage’s for loop. The whole pipeline computes lazily — memory is one record (or one chunk) at a time, no matter how big the source. This is why generator pipelines scale to millions of records on a laptop where a list-based version would crash.

29.9 Exercises

  1. Iterable vs. iterator. Write a class Counter that, on iter, counts up forever. Make __iter__ return self. Try iterating twice — what happens?

  2. Generator pipeline. Compose three generators: read lines, strip whitespace, drop empty lines. Combine with yield from.

  3. itertools.tee. Read its docs. Use it to iterate over the same generator twice without re-running it. What’s the memory cost?

  4. yield from is more than a loop. Demonstrate that yield from forwards throw() to the inner generator. (Hint: write a generator that catches a particular exception.)

29.10 Summary

The iterator protocol is the engine under every loop. Generator functions are the easy way to write iterators; yield from composes them; classic coroutines extend them with send(). Hold these in your head and the rest of Python’s lazy-evaluation story falls into place.

Next, Chapter 30 covers three control-flow blocks that don’t appear in many other languages: with for context managers, match for structural pattern matching, and the surprising else clause on for/while/try.