13  The Python Data Model

“The Python interpreter invokes special methods to perform basic object operations, often triggered by special syntax.” — Luciano Ramalho, Fluent Python (2e, ch. 1)

NoteCore idea

Python has a uniform API for everything — the “data model” — implemented through special methods (dunder methods). Understand this and you understand why Python feels consistent across built-in types, standard library, and your own code.

In this chapter you will learn to:

  1. Recognize the most common special methods (the dunders) by name and category.
  2. Implement two of them on a custom class and watch iteration, slicing, and random.choice start working “for free.”
  3. Read a class and predict which built-in functions and operators it supports.
  4. Explain why len(x) is a function rather than a method, and what that says about Python’s design.

13.1 A Pythonic card deck

The shortest path to seeing the data model in action is to implement two special methods on a single class and observe everything you get without writing another line.

import collections

Card = collections.namedtuple("Card", ["rank", "suit"])

class FrenchDeck:
    ranks = [str(n) for n in range(2, 11)] + list("JQKA")
    suits = "spades diamonds clubs hearts".split()

    def __init__(self):
        self._cards = [Card(rank, suit)
                       for suit in self.suits
                       for rank in self.ranks]

    def __len__(self):
        return len(self._cards)

    def __getitem__(self, position):
        return self._cards[position]

Walking through this piece by piece:

  • Card = collections.namedtuple(...) builds a tiny class with two fields, rank and suit. A namedtuple is the lightest way to make a record-like type — no methods, just attribute access by name.
  • ranks and suits are class attributes — defined in the class body, shared by every FrenchDeck instance. They’re constants describing what a deck is.
  • __init__ builds self._cards as the full 52-card list using a nested comprehension. The leading underscore on _cards is a convention: “internal — don’t touch from outside.”
  • __len__(self) is the dunder Python calls when you write len(deck). We delegate to the underlying list’s length.
  • __getitem__(self, position) is what deck[i] calls. We forward position straight to self._cards, so whatever the list supports — integers, negatives, slices — we support too.

Two dunder methods: __len__ and __getitem__. That’s the entire interface. Now look at what the class can already do.

deck = FrenchDeck()
len(deck)
52

len(deck) works because the built-in len function calls deck.__len__(). Nothing special — it’s a contract. If you implement the contract, the built-in is there.

deck[0], deck[-1]
(Card(rank='2', suit='spades'), Card(rank='A', suit='hearts'))

Indexing is the same story. deck[0] is sugar for deck.__getitem__(0). Negative indexing works because we delegated to the underlying list, which supports it.

from random import choice
choice(deck)
Card(rank='J', suit='hearts')

random.choice requires only that its argument support len and __getitem__ — exactly the two methods we wrote. No inheritance, no registration. The function works on our class because our class fulfills the protocol it expects.

for card in deck[:3]:
    print(card)
Card(rank='2', suit='spades')
Card(rank='3', suit='spades')
Card(rank='4', suit='spades')

Two more powers came in that line. Slicing — deck[:3] — works because slicing is just __getitem__(slice(None, 3, None)), and our __getitem__ forwards the index to a list. Iteration — the for card in ... — works because the absence of __iter__ falls back to repeatedly calling __getitem__(0), __getitem__(1), … until IndexError.

Card("Q", "hearts") in deck
True

The in operator falls back to iteration when __contains__ isn’t defined, so it works too. Sorting works as well, but to sort cards we first need a way to compare them. Cards don’t have a built-in order — is the King of hearts greater than the Queen of spades? — so we define one. The convention used here is “spades-high”: within a rank, suits order spades > hearts > diamonds > clubs; across ranks, A > K > Q > … > 2.

The way to express that to sorted is a key function — a function that takes one card and returns something sorted already knows how to compare (a number). Before the code, the pieces it uses:

  • FrenchDeck.ranks reads the class attribute we defined in the class body — ["2", "3", …, "J", "Q", "K", "A"]. Reading ClassName.attr from outside the class is how you access shared constants like this.
  • list.index(value) is the list method that returns the position of value — so FrenchDeck.ranks.index("A") is 12. We use the position as the rank’s numeric weight.
  • card.rank and card.suit are the field names from the Card = namedtuple("Card", ["rank", "suit"]) we built earlier; card[0] and card[1] would do the same thing.
  • sorted(iterable, key=fn) calls fn on each element and sorts by the result (not the element itself).
suit_values = dict(spades=3, hearts=2, diamonds=1, clubs=0)

def spades_high(card):
    rank_value = FrenchDeck.ranks.index(card.rank)
    return rank_value * len(suit_values) + suit_values[card.suit]

for card in sorted(deck, key=spades_high)[-3:]:
    print(card)
Card(rank='A', suit='diamonds')
Card(rank='A', suit='hearts')
Card(rank='A', suit='spades')

spades_high collapses each card into a single integer: multiplying the rank by len(suit_values) (4) leaves room for the suit weight to break ties without ever crossing into the next rank. The slice [-3:] then takes the last three from the sorted list — the three highest cards.

We wrote two methods; we got len, indexing, slicing, iteration, in, random.choice, and sorted. That is the data model.

13.2 Emulating numeric types

The same protocol idea works for arithmetic. A two-dimensional vector is the canonical example — it lets you implement most of the numeric dunders in a class small enough to read in one screen.

import math

class Vector:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Vector({self.x!r}, {self.y!r})"

    def __abs__(self):
        return math.hypot(self.x, self.y)

    def __bool__(self):
        return bool(abs(self))

    def __add__(self, other):
        return Vector(self.x + other.x, self.y + other.y)

    def __mul__(self, scalar):
        return Vector(self.x * scalar, self.y * scalar)

Each dunder maps one piece of Python syntax onto Vector:

  • __repr__(self) is what repr(v) and the REPL call. Using !r inside the f-string applies repr to each component, so a Vector(1, 2) prints unambiguously.
  • __abs__(self) is abs(v). We define it as the Euclidean length via math.hypot, which is the right meaning for a 2-D vector.
  • __bool__(self) is bool(v) and if v:. We make zero-length vectors falsy by reusing abs(self).
  • __add__(self, other) is v1 + v2. Notice it returns a new Vector — never mutates either operand. That’s the convention for arithmetic.
  • __mul__(self, scalar) is v * n. Here scalar is a number, not a Vector — multiplying two vectors would need a different operation (dot product, cross product), and we don’t define one.

The general shape: each operator and built-in has a corresponding dunder, and implementing the dunder lights up the syntax.

v1 = Vector(2, 4)
v2 = Vector(2, 1)
v1 + v2
Vector(4, 5)

v1 + v2 calls v1.__add__(v2). The result is a new Vector (not a mutation), which matches how + works on numbers, strings, and tuples.

abs(Vector(3, 4))
5.0

abs() calls __abs__. We defined it as the Euclidean length, so (3, 4) returns 5.0.

v1 * 3
Vector(6, 12)

* with a scalar calls __mul__. Note what happens with the reflected operand:

3 * v1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 3 * v1

TypeError: unsupported operand type(s) for *: 'int' and 'Vector'

3 * v1 fails because int.__mul__ doesn’t know about Vector, and we haven’t defined __rmul__. We will fix this in Chapter 28 — the reflected operators are a topic of their own.

print(repr(Vector(1, 2)))
print(bool(Vector(0, 0)))
print(bool(Vector(1, 0)))
Vector(1, 2)
False
True

__repr__ is what gives a useful display when you type v1 at a prompt or print it. __bool__ is what makes if vector: work.

Tip__repr__ vs __str__

If you implement only one of the two, implement __repr__. Without __str__, Python uses __repr__ for both repr(x) and str(x). The reverse is not true: a class with only __str__ will fall back to the unhelpful <__main__.Vector object at 0x...> for repr.

13.3 An overview of special methods

The full taxonomy of special methods is large. The table below is the working subset — the dunders that every working Python programmer should be able to recognize:

Category Special methods
Strings/bytes __repr__, __str__, __format__, __bytes__
Numeric conversion __bool__, __complex__, __int__, __float__
Collection emulation __len__, __getitem__, __setitem__, __contains__
Iteration __iter__, __next__, __reversed__
Callable __call__
Context manager __enter__, __exit__
Attribute access __getattr__, __setattr__, __delattr__, __dir__
Arithmetic (regular) __add__, __sub__, __mul__, __truediv__, __mod__
Arithmetic (reflected) __radd__, __rsub__, __rmul__, …
Augmented assignment __iadd__, __isub__, __imul__, …
Rich comparison __lt__, __le__, __eq__, __ne__, __gt__, __ge__
Hashing __hash__

The shape of the contract — operator or built-in on the left, the dunder Python actually calls on the right:

Call site Dunder Python calls
len(x) __len__
x[i] __getitem__
x[i] = v __setitem__
x + y __add__
x * n __mul__
abs(x) __abs__
bool(x) __bool__
repr(x) __repr__
x == y __eq__
x in y __contains__
for v in x __iter__
with x: ... __enter__ / __exit__

You don’t memorize this table. You recognize the pattern — every operator and built-in delegates to a dunder — and look up the specific name when you need it.

13.4 Why len is not a method

len(x) is a function, not a method. Why?

When Python sees len(deck), the interpreter calls deck.__len__() unless the object is a built-in type, in which case it reads the length directly from a C struct field — an O(1) shortcut. If len were a regular method, that shortcut would be impossible: every call would have to go through method lookup.

The function form preserves both possibilities. For built-ins, len is a near-instant struct read. For your class, len calls your __len__. The caller doesn’t know — and doesn’t need to.

This is the design principle of the data model in miniature. Built-ins and custom classes look identical at the call site; the implementation behind them is free to differ.

TipWhy this matters

The Python data model is a framework contract. Implement the right special methods and your objects behave like built-ins — they work with for loops, slicing, random.choice(), len(), bool(), abs(), and the rest of the language, without inheriting from anything.

13.5 Build: a Polynomial class that feels built-in

FrenchDeck showed the sequence side of the data model; Vector showed the numeric side. A polynomial sits in both — it’s a sequence of coefficients and a thing you can add. We’ll build it in three steps and watch the dunders compose.

Step 1: shape and printable form. Store coefficients as a list — Polynomial(1, 2, 3) is 1 + 2x + 3x². Implement __repr__ so the REPL prints something honest:

class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = list(coeffs)

    def __repr__(self):
        terms = ", ".join(repr(c) for c in self.coeffs)
        return f"Polynomial({terms})"

p = Polynomial(1, 2, 3)
p
Polynomial(1, 2, 3)

*coeffs (chapter 6) collects the variable-length argument list into a tuple, which we copy into a list. The __repr__ reconstructs the call expression — the convention from the chapter’s __repr__ callout.

Step 2: implement the sequence side. Three dunders make len(p), p[i], and equality work — and iteration falls out for free because Python’s iterator protocol falls back to repeated __getitem__ (the same trick FrenchDeck used):

class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = list(coeffs)

    def __repr__(self):
        return f"Polynomial({', '.join(repr(c) for c in self.coeffs)})"

    def __len__(self):
        return len(self.coeffs)

    def __getitem__(self, i):
        return self.coeffs[i]

    def __eq__(self, other):
        return isinstance(other, Polynomial) and self.coeffs == other.coeffs

p = Polynomial(1, 2, 3)
[len(p), p[0], list(p), p == Polynomial(1, 2, 3), p == Polynomial(1, 2)]
[3, 1, [1, 2, 3], True, False]

list(p) works without an explicit __iter__ — Python notices __getitem__ is defined and walks 0, 1, 2, … until IndexError, exactly as it did with FrenchDeck. p[0] is the constant term; p == Polynomial(1, 2) is False because the lists differ.

Step 3: implement the numeric side. __add__ defines p1 + p2 (returning a new polynomial). __call__ makes the instance itself callable, so p(2) evaluates the polynomial at x = 2:

class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = list(coeffs)

    def __repr__(self):
        return f"Polynomial({', '.join(repr(c) for c in self.coeffs)})"

    def __len__(self):
        return len(self.coeffs)

    def __getitem__(self, i):
        return self.coeffs[i]

    def __eq__(self, other):
        return isinstance(other, Polynomial) and self.coeffs == other.coeffs

    def __add__(self, other):
        size = max(len(self), len(other))
        a = self.coeffs + [0] * (size - len(self.coeffs))
        b = other.coeffs + [0] * (size - len(other.coeffs))
        return Polynomial(*(x + y for x, y in zip(a, b)))

    def __call__(self, x):
        return sum(c * x**i for i, c in enumerate(self.coeffs))

p = Polynomial(1, 2, 3)         # 1 + 2x + 3x^2
q = Polynomial(0, 1)            # x
r = p + q                       # 1 + 3x + 3x^2
[r, r(2), p(0), p(1)]
[Polynomial(1, 3, 3), 19, 1, 6]

__add__ pads the shorter coefficient list with zeros so the two have matching length, then adds element-by-element with zip. __call__ evaluates Σ cᵢ · xⁱ with enumerate for the powers — implementing it makes the instance itself look like a function, so p(2) works as if p were a regular function.

The build is the chapter in motion: two dunders give us len, indexing, equality, and iteration; two more give us arithmetic and callability. The class stops being a wrapper and starts being a built-in-shaped thing.

13.6 Exercises

  1. A two-card hand. Subclass FrenchDeck (or write a new class) called Hand that holds exactly two cards and supports len, indexing, and iteration. What’s the smallest set of dunders you need?

  2. Vector subtraction and equality. Add __sub__ and __eq__ to the Vector class above. Verify that Vector(3, 4) - Vector(1, 2) == Vector(2, 2) is True.

  3. Reverse iteration. What does for card in reversed(deck) do? Walk the chain: reversed looks for __reversed__ first, then falls back to what?

  4. Negative __bool__. Why is the default __bool__ (the one Python uses if you don’t define one) True? Read the docs for bool() and explain in one sentence what Python does for objects without __bool__ or __len__.

  5. Read the source. Open the collections.UserList source (python -c "import collections, inspect; print(inspect.getsource(collections.UserList))"). How many of the dunders from the table above does it implement?

13.7 Summary

This chapter introduced the contract: a set of special methods that Python’s syntax and built-in functions delegate to. Implementing them on your own classes makes those classes feel like built-ins — usable with len, iteration, slicing, in, abs, +, comparison, and more.

The next chapter, Chapter 14, takes one corner of this contract — the sequence protocol — and shows how Python’s many sequence types (list, tuple, str, bytes, array, deque) all sit underneath the same handful of dunders.