24  Special Methods for Sequences

NoteCore idea

Build a proper multi-dimensional Vector class by incrementally adding special methods. This chapter demonstrates how protocols (informal interfaces) work in practice.

In this chapter you will learn to:

  1. Implement the sequence protocol__len__ and __getitem__ — and get iteration, slicing, and in for free.
  2. Make slicing return the same class instead of a raw list.
  3. Use __getattr__ and __setattr__ to expose dynamic attributes (v.x, v.y, v.z).
  4. Hash a sequence by reducing its components with XOR.
  5. Use zip (and zip_longest) for parallel iteration and matrix transposition.

24.1 Protocols and duck typing

Python’s sequence protocol requires only two methods: __len__ and __getitem__. Anything that has them behaves as a sequence — slicing, iteration, in — without inheriting from anything. To see the protocol in action, we’ll generalize Vector2d to N dimensions and earn slicing, iteration, and len() from just those two dunders.

import operator
from array import array

class Vector:
    typecode = "d"

    def __init__(self, components):
        self._components = array(self.typecode, components)

    def __iter__(self):
        return iter(self._components)

    def __repr__(self):
        components = repr(self._components)
        components = components[components.find("["):-1]
        return f"Vector({components})"

    def __len__(self):
        return len(self._components)

    def __getitem__(self, key):
        if isinstance(key, slice):
            cls = type(self)
            return cls(self._components[key])
        index = operator.index(key)
        return self._components[index]

v = Vector(range(7))
v[1], v[-1], len(v)
(1.0, 6.0, 7)

Walking through the moving parts:

  • array(self.typecode, components) stores the components in a typed array — much more memory-efficient than a list. The typecode = "d" class attribute means every component is a 64-bit float; Python rejects anything that doesn’t fit at construction time.
  • __iter__ returns iter(self._components) — a cheap, lazy iterator over the array. Once __iter__ exists, for x in v, tuple(v), and unpacking all work.
  • __repr__ slices off the bracketed part of the array’s repr (array('d', [0.0, 1.0, ...]) becomes Vector([0.0, 1.0, ...])). It’s a small string surgery — there’s no friendlier API for “show me an array’s contents.”
  • __len__ makes len(v) work and lets the sequence protocol report a length.
  • __getitem__ is the interesting one. It receives either an integer (v[1]) or a slice object (v[1:4]); we have to handle both.
    • For a slice, we delegate to the underlying array’s slicing — which returns another array — and rewrap it in cls(...). Using cls = type(self) instead of hard-coding Vector means subclasses get their own type back.
    • For an integer, operator.index(key) extracts an integer index, raising TypeError for floats and other non-integer numeric types — matching what built-in lists do.

The general pattern: __len__ + __getitem__ is the minimum sequence interface. The clever part is the slice handling. Without it, v[1:4] would return an array.array — losing the Vector type. With it, slicing returns another Vector:

v[1:4]
Vector([1.0, 2.0, 3.0])

operator.index(key) raises TypeError for non-integers (like floats), matching how built-in sequences behave.

24.2 Dynamic attribute access

The first few components of a vector deserve familiar names: x, y, z, t. Writing four properties (one per name) is repetitive and doesn’t scale to higher dimensions. __getattr__ is the single hook: Python calls it whenever a normal attribute lookup fails, and it’s free to return whatever it wants.

class VectorXYZT(Vector):
    shortcut_names = "xyzt"

    def __getattr__(self, name):
        cls = type(self)
        if len(name) == 1:
            pos = cls.shortcut_names.find(name)
            if 0 <= pos < len(self._components):
                return self._components[pos]
        raise AttributeError(f"{cls.__name__!r} object has no attribute {name!r}")

v = VectorXYZT(range(5))
v.x, v.y, v.z, v.t
(0.0, 1.0, 2.0, 3.0)

Walking through __getattr__:

  • The method only runs when the normal lookup fails. v.x first checks the instance dict, then the class chain — neither has x — so Python falls back to __getattr__("x").
  • cls.shortcut_names.find(name) returns the position of the name in "xyzt"0 for x, 1 for y, etc., and -1 for anything else.
  • The bounds check 0 <= pos < len(self._components) guards against two failures: a name not in shortcut_names (pos == -1), and a name past the end of the vector (asking for .t on a 2-element vector).
  • If we don’t have a match, we must raise AttributeError — anything else (returning None, raising a different exception) breaks the protocol that callers like hasattr rely on.

The general pattern: __getattr__ is the catch-all for “compute an attribute on demand.” Use it when the attribute set is rule-based (like positional shortcuts) rather than fixed.

There’s a trap. If we don’t override __setattr__, the user can do this:

v.x = 99
v.x, v[0]
(99, 0.0)

v.x = 99 created an instance attribute that shadows our __getattr__. Now v.x and v[0] disagree. The fix is to override __setattr__ and forbid those names:

class VectorXYZT2(VectorXYZT):
    def __setattr__(self, name, value):
        cls = type(self)
        if len(name) == 1:
            if name in cls.shortcut_names:
                raise AttributeError(f"readonly attribute {name!r}")
            elif name.islower():
                raise AttributeError(f"can't set attributes 'a' to 'z' in {cls.__name__!r}")
        super().__setattr__(name, value)

w = VectorXYZT2(range(5))
try:
    w.x = 99
except AttributeError as e:
    print("blocked:", e)
blocked: readonly attribute 'x'

Walking through the fix:

  • __setattr__ runs on every assignment to an attribute — not just the named ones. So we have to be careful to forward the rest with super().__setattr__(name, value), otherwise the class becomes unusable.
  • The first check rejects the four shortcut names explicitly. The second blocks every other one-letter lowercase name as a precaution — it’s better to refuse v.q = 1 than to let users invent attributes that look component-like.
  • We use super().__setattr__ (not self.__dict__[name] = value) so that further subclasses can hook into assignment too. Direct dict writes bypass the chain.

The general pattern: when __getattr__ synthesizes an attribute, pair it with __setattr__ that refuses the same name. The two halves keep reads and writes consistent.

24.3 Hashing a sequence

Hashing two components used hash((self.x, self.y)) — building a tuple. For an N-component vector, the same trick works, but it’s instructive to do it manually with a fold — the pattern generalizes to any “combine the hashes of the parts” computation.

import functools

class HashableVector(Vector):
    def __eq__(self, other):
        return len(self) == len(other) and all(a == b for a, b in zip(self, other))

    def __hash__(self):
        hashes = (hash(x) for x in self)
        return functools.reduce(operator.xor, hashes, 0)

v1 = HashableVector([1, 2, 3])
v2 = HashableVector([1.0, 2.0, 3.0])
hash(v1) == hash(v2), v1 == v2, len({v1, v2})
(True, True, 1)

Walking through the implementation:

  • __eq__ first checks that the lengths match — without this, Vector([1, 2]) and Vector([1, 2, 3]) would compare equal up to the second component. Then zip(self, other) pairs components; all(a == b for ...) short-circuits on the first mismatch.
  • __hash__ builds a generator expression of per-component hashes — lazy, so we don’t materialize an intermediate list.
  • functools.reduce(operator.xor, hashes, 0) folds the generator with XOR, starting from the seed 0. Each step is acc = acc ^ hash(component).
  • hash(v1) == hash(v2) because 1 == 1.0 and hash(1) == hash(1.0) — Python’s numeric tower (the family of related types bool ⊂ int ⊂ float ⊂ complex, defined in the numbers module) keeps these aligned. The interpreter explicitly arranges that any two numerically-equal values across the tower hash the same. That’s the rule “equal objects must have equal hashes” in action.

The general pattern: reduce(op, iterable, seed) is the canonical fold. XOR is the right op for hash-combining because it’s commutative and associative — order doesn’t change the result — and bit-mixing keeps the output well-distributed. The trade-off is that XOR-folding produces the same hash for [1, 2, 3] and [3, 2, 1]; that’s a feature when permutations should compare equal, a bug otherwise.

Warning__eq__ is slow on long sequences

The all(a == b for ...) form short-circuits on the first mismatch — good. But for very long sequences, hashing both and comparing the hashes first is faster. Profile before optimizing; correctness first.

24.4 The awesome zip

zip is the standard parallel-iteration tool — it pairs elements from multiple iterables so you can walk them in lockstep. It has three variants worth knowing because each handles uneven lengths differently. The default stops at the shortest:

list(zip(range(3), "ABC")), list(zip(range(3), "ABCDE"))
([(0, 'A'), (1, 'B'), (2, 'C')], [(0, 'A'), (1, 'B'), (2, 'C')])

The D and E were silently dropped — that’s the gotcha. Two safer variants:

  • zip(strict=True) (Python 3.10+) raises ValueError if the inputs have different lengths.
  • itertools.zip_longest fills the gap with a sentinel rather than truncating:
from itertools import zip_longest
list(zip_longest(range(3), "ABCDE", fillvalue="?"))
[(0, 'A'), (1, 'B'), (2, 'C'), ('?', 'D'), ('?', 'E')]

A nice idiom: zip(*matrix) transposes a list of equal-length sequences:

matrix = [(1, 2), (3, 4), (5, 6)]
list(zip(*matrix))
[(1, 3, 5), (2, 4, 6)]

Walking through the transpose:

  • *matrix unpacks [(1, 2), (3, 4), (5, 6)] into three positional arguments — equivalent to writing zip((1, 2), (3, 4), (5, 6)).
  • zip then takes the first element of each — (1, 3, 5) — and the second — (2, 4, 6). Those tuples are the columns of the original matrix.

The general rule: pick zip for “pair these and stop at the shortest”, zip(strict=True) when same-length is a precondition, and zip_longest when you want to pad. The zip(*matrix) pattern works for any operation that needs to flip a “list of rows” into a “list of columns.”

TipWhy this matters

Protocols are informal interfaces defined by convention: the sequence protocol is __len__ + __getitem__. If your class has those, Python treats it as a sequence — no inheritance required. This is duck typing made formal and powerful.

24.5 Build: a sequence-like Matrix class

Vectors live in one dimension; matrices in two. We’ll build a Matrix that’s a sequence of rowslen(m) is the row count, m[0] is a row, m[1:3] is a sub-matrix, m == n compares element-by-element, and m can be used as a dict key. Every dunder from the chapter earns its place.

Step 1: __init__, __len__, __getitem__ with slice handling. Rows are stored as immutable tuples so the matrix can be hashable later. __getitem__ dispatches on int vs slice, and the slice case returns another Matrix:

class Matrix:
    def __init__(self, rows):
        self._rows = tuple(tuple(row) for row in rows)

    def __len__(self):
        return len(self._rows)

    def __getitem__(self, key):
        if isinstance(key, slice):
            return type(self)(self._rows[key])         # slice -> Matrix
        return self._rows[key]                          # int -> tuple (a row)

m = Matrix([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
[len(m), m[0], type(m[0:2]).__name__]
[3, (1, 2, 3), 'Matrix']

tuple(tuple(row) for row in rows) does two things at once: copies the input so the matrix isn’t aliased to the caller’s data (the chapter-18 lesson), and freezes each row into a tuple so the whole structure is immutable and hashable. The slice branch returns a new Matrix (using type(self) so subclasses get their own type back), exactly the pattern from the Vector example earlier in the chapter.

Step 2: __iter__, __eq__, __repr__. Add iteration (so for row in m: works), structural equality, and a clean repr:

class Matrix:
    def __init__(self, rows):
        self._rows = tuple(tuple(row) for row in rows)

    def __len__(self):
        return len(self._rows)

    def __getitem__(self, key):
        if isinstance(key, slice):
            return type(self)(self._rows[key])
        return self._rows[key]

    def __iter__(self):
        return iter(self._rows)

    def __eq__(self, other):
        return isinstance(other, Matrix) and self._rows == other._rows

    def __repr__(self):
        return f"{type(self).__name__}({list(self._rows)!r})"

m = Matrix([(1, 2), (3, 4)])
n = Matrix([(1, 2), (3, 4)])
[m == n, list(m), repr(m)]
[True, [(1, 2), (3, 4)], 'Matrix([(1, 2), (3, 4)])']

__iter__ returning iter(self._rows) is the lazy-friendly form — no copy, just a fresh iterator over the row tuples. __eq__ compares the tuple-of-tuples directly, which delegates equality to the components recursively. __repr__ uses type(self).__name__ so subclasses report their own class.

Step 3: __hash__ via XOR-fold, plus a transpose using zip(*self). The chapter’s hash recipe (reduce with XOR over the components’ hashes) generalises one level higher — we hash each row tuple, then XOR-fold those:

import functools, operator

class Matrix:
    def __init__(self, rows):
        self._rows = tuple(tuple(row) for row in rows)

    def __len__(self):
        return len(self._rows)

    def __getitem__(self, key):
        if isinstance(key, slice):
            return type(self)(self._rows[key])
        return self._rows[key]

    def __iter__(self):
        return iter(self._rows)

    def __eq__(self, other):
        return isinstance(other, Matrix) and self._rows == other._rows

    def __hash__(self):
        return functools.reduce(operator.xor, (hash(row) for row in self), 0)

    def __repr__(self):
        return f"{type(self).__name__}({list(self._rows)!r})"

    def transpose(self):
        return type(self)(zip(*self))

m = Matrix([(1, 2, 3), (4, 5, 6)])
mt = m.transpose()
[mt, len({m, Matrix([(1, 2, 3), (4, 5, 6)])})]
[Matrix([(1, 4), (2, 5), (3, 6)]), 1]

functools.reduce(operator.xor, (hash(row) for row in self), 0) folds the row hashes with XOR — the chapter’s recipe applied at the row level rather than the scalar level. The matrix can now sit in a set or be a dict key. transpose is the zip(*self) trick from the chapter: *self unpacks the rows into positional args; zip then walks them column-by-column, yielding the transposed rows. Wrapping in type(self)(...) keeps the result a Matrix.

The build is the chapter applied at one level of recursion: the matrix is a sequence of rows just as the vector was a sequence of components. __len__, __getitem__ (with slice handling), __iter__, __eq__, __hash__, __repr__, and zip(*self) for transpose — every tool from the chapter, in one fifty-line class.

24.6 Exercises

  1. Slice returns Vector. Verify type(v[1:4]) is Vector, not array.array. Why does that matter for chained operations like v[1:4][0:2]?

  2. __getattr__ for indexed names. Extend shortcut_names so that v.r returns the magnitude (abs(v)) and v.theta returns the angle (for 2D vectors). Where should the special-case live?

  3. zip(strict=True). Replace the __eq__ body with len == len and all(zip(self, other, strict=True)). What happens if you compare two vectors of different length?

  4. Build a transposer. Write transpose(matrix) using only zip and *. Verify on a non-square matrix.

  5. Hash that ignores order. Why does XOR-folding produce the same hash for [1, 2, 3] and [3, 2, 1]? When is that a feature; when is it a bug?

24.7 Summary

The sequence protocol — __len__ and __getitem__ — is the most rewarding contract in Python. Implement it carefully (return the right type for slices, raise the right error for non-integers), pair it with __getattr__/__setattr__ for named components, and your class behaves like the built-ins.

Next, Chapter 25 steps back and surveys Python’s four approaches to typing: duck, goose, static, runtime — and shows when to reach for each.