from collections import namedtuple
Coordinate = namedtuple("Coordinate", ["lat", "lon"])
moscow = Coordinate(55.756, 37.617)
moscow.lat, moscow == Coordinate(55.756, 37.617)(55.756, True)
Python offers three ways to build data classes, each with a different trade-off. @dataclass is the modern default; NamedTuple is the immutable alternative; old-style classes are only when you need full control.
In this chapter you will learn to:
collections.namedtuple, typing.NamedTuple, @dataclasses.dataclass — and pick one for a given problem.field() to declare default factories, control repr, and control comparison.__init__ with __post_init__.ClassVar.frozen=True and order=True to opt into immutability and ordering.case ClassName(...).namedtuple |
typing.NamedTuple |
@dataclass |
|
|---|---|---|---|
| Mutable fields | no | no | yes |
| Class statement | no | yes | yes |
| Default values | limited (3.6.1+) | yes | yes |
__repr__ |
yes | yes | yes |
__eq__ |
yes | yes | yes |
| Ordering | yes (tuple-style) | yes | opt-in (order=True) |
| Hashable | yes | yes | conditional |
__slots__ |
yes | yes | opt-in (slots=True) |
The choice in one line: use @dataclass unless you need a tuple, in which case use typing.NamedTuple. The classic collections.namedtuple is fine but offers strictly less than typing.NamedTuple once you want type hints.
namedtuple is a factory that returns a class:
from collections import namedtuple
Coordinate = namedtuple("Coordinate", ["lat", "lon"])
moscow = Coordinate(55.756, 37.617)
moscow.lat, moscow == Coordinate(55.756, 37.617)(55.756, True)
namedtuple("Coordinate", ["lat", "lon"]) is a factory call: it builds and returns a new class object whose name is "Coordinate" and whose fields are lat and lon. We assign it to Coordinate so we can use it. moscow.lat reads the lat field by name (55.756); the == comparison succeeds because two Coordinate instances with the same field values are equal — namedtuple generates __eq__ for you, comparing field-by-field. The cell’s output is (55.756, True).
Defaults are right-aligned, in the same way default arguments are:
Coordinate = namedtuple("Coordinate", ["lat", "lon", "reference"], defaults=["WGS84"])
Coordinate(55.756, 37.617)Coordinate(lat=55.756, lon=37.617, reference='WGS84')
defaults=["WGS84"] supplies one default — and Python attaches it to the rightmost field, reference. Constructing with two arguments fills lat and lon; reference falls back to "WGS84". The output is Coordinate(lat=55.756, lon=37.617, reference='WGS84'). To default two fields, you’d pass defaults=[default_for_second_to_last, default_for_last] — Python lines them up from the right.
Because a named tuple is a tuple, you can unpack it positionally:
lat, lon, ref = Coordinate(55.756, 37.617)
lat, ref(55.756, 'WGS84')
Three names on the left, three slots in the tuple on the right — Python unpacks element-by-element. lat gets 55.756, lon gets 37.617, ref gets "WGS84" (from the default). The cell prints (55.756, 'WGS84'). This is exactly the unpacking from the tuples chapter — namedtuples didn’t introduce a new mechanism; they just added attribute access on top of the same tuple shape.
Two helpers worth remembering — both bridge between named-tuples and “regular” Python data shapes:
_asdict() turns a namedtuple instance into a plain dict keyed by field names. Useful when you need to hand the data to something that expects a mapping — json.dumps, a templating engine, an HTTP request body. Note the leading underscore: the namedtuple machinery prefixes its own helpers with _ so they can’t collide with user-defined field names. (Imagine a namedtuple with a field literally called keys — without the prefix the helper would shadow it.)
moscow = Coordinate(55.756, 37.617)
moscow._asdict(){'lat': 55.756, 'lon': 37.617, 'reference': 'WGS84'}
The result is an ordinary dict. The default reference="WGS84" is included because every field of the tuple — defaulted or not — has a concrete value at this point.
_make(iterable) is the inverse: it builds a namedtuple instance from any iterable, without writing out the field names. Compare Coordinate(*row) (positional unpacking) with Coordinate._make(row) — they’re equivalent in effect, but _make reads as “construct from this row” and works directly on iterators that you don’t want to materialise as a list first.
Coordinate._make([55.756, 37.617, "WGS84"])Coordinate(lat=55.756, lon=37.617, reference='WGS84')
The classic use case: parsing a CSV into namedtuples — for row in csv.reader(f): Coordinate._make(row). Each row is already a list of strings; _make slots them into the right fields by position, not by name.
typing.NamedTuple lets you write a tuple as a class — type hints, methods, and all:
from typing import NamedTuple
class Coordinate(NamedTuple):
lat: float
lon: float
reference: str = "WGS84"
def __str__(self):
ns = "N" if self.lat >= 0 else "S"
we = "E" if self.lon >= 0 else "W"
return f"{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}"
print(Coordinate(55.756, 37.617))55.8°N, 37.6°E
The instance is still a tuple — the methods are added on top.
@dataclass@dataclass is the closest Python has to a Kotlin data class or a Scala case class. It generates __init__, __repr__, and __eq__ from the class’s annotated attributes:
from dataclasses import dataclass, field
from typing import ClassVar
@dataclass
class ClubMember:
name: str
guests: list[str] = field(default_factory=list)
athlete: bool = field(default=False, repr=False)
ClubMember("Anna")ClubMember(name='Anna', guests=[])
Walking through what each annotated line declares:
name: str — a required field. The decorator generates __init__(self, name, ...) that assigns self.name = name.guests: list[str] = field(default_factory=list) — an optional field with a fresh empty list per instance. Writing = [] directly would share one list across every ClubMember, which is the mutable-default trap; default_factory=list calls list() for each new instance.athlete: bool = field(default=False, repr=False) — defaults to False. The repr=False flag tells the generated __repr__ to omit this field, which is why the printed output shows only name and guests.field() is the customization hook. Each option is a small but useful escape hatch:
field() option |
Purpose |
|---|---|
default |
static default value |
default_factory |
callable producing default (use this for any mutable type) |
repr |
include in __repr__? |
compare |
include in __eq__ and ordering? |
hash |
include in __hash__? |
init |
accept as parameter to __init__? |
Rule: never use a mutable default (= [], = {}) — every instance would share the same list. The whole reason field(default_factory=list) exists is to dodge that trap.
__post_init__The generated __init__ only assigns the fields you declared. What if you need to derive one — say, default handle to the first word of name? You can’t do it in a field(default=...) because the default doesn’t see the other fields. The hook is __post_init__: a method the generated __init__ calls right after the assignments are done.
@dataclass
class HackerClubMember:
name: str
guests: list = field(default_factory=list)
handle: str = field(default="", init=True)
def __post_init__(self):
if self.handle == "":
self.handle = self.name.split()[0]
HackerClubMember("Anna Ravenscroft", handle="AnnaRaven").handle, \
HackerClubMember("Leo Rochael").handle('AnnaRaven', 'Leo')
Walking through what runs at construction time:
@dataclass decorator generates an __init__ that assigns self.name, self.guests, and self.handle from the constructor arguments.__init__ automatically calls self.__post_init__() if the method exists.__post_init__, every field is already set, so we can read self.name and self.handle. If the caller passed an empty handle, we replace it with the first word of name."Anna Ravenscroft" the caller passed handle="AnnaRaven", so the if is False — handle stays as given. For "Leo Rochael" the handle defaulted to "", so we derive "Leo".The general rule: __post_init__ is the right place to validate, normalize, or compute any value that depends on the other fields after the auto-generated __init__ finishes its assignments.
ClassVar for class-level attributesAnnotated attributes become __init__ parameters by default. To opt out — to declare a class-level attribute that’s shared across instances — wrap the type in ClassVar:
@dataclass
class HackerClub:
name: str
guests: list = field(default_factory=list)
all_handles: ClassVar[set[str]] = set()
HackerClub.all_handles.add("anna")
HackerClub.all_handles{'anna'}
Now all_handles is not a parameter to __init__ — it’s a single set shared by every instance.
frozen and orderTwo flags handle the most common configurations. frozen=True makes instances immutable:
@dataclass(frozen=True)
class FrozenCoordinate:
lat: float
lon: float
c = FrozenCoordinate(55.756, 37.617)
c.lat = 0--------------------------------------------------------------------------- FrozenInstanceError Traceback (most recent call last) Cell In[10], line 7 3 lat: float 4 lon: float 5 6 c = FrozenCoordinate(55.756, 37.617) ----> 7 c.lat = 0 File <string>:16, in __create_fn__.<locals>.__setattr__(self, name, value) 14 'Could not get source, probably due dynamically evaluated source code.' FrozenInstanceError: cannot assign to field 'lat'
order=True generates __lt__, __le__, __gt__, __ge__ based on the field order — the comparison is field-by-field, top to bottom:
@dataclass(order=True)
class Card:
rank: int
suit: str
Card(2, "hearts") < Card(3, "spades")True
@dataclass featuresThree flags and one helper, all from Python 3.10+, cover the configurations worth knowing beyond frozen and order.
slots=True generates __slots__ for the class — instances skip the per-object __dict__, save memory, and reject undeclared attributes:
@dataclass(slots=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
p.__slots__, p.x(('x', 'y'), 1.0)
p.z = 3.0--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[13], line 1 ----> 1 p.z = 3.0 AttributeError: 'Point' object has no attribute 'z' and no __dict__ for setting new attributes
kw_only=True forces every field to be passed by keyword. This pays off when the field order in __init__ is incidental — keyword-only calls survive field reorderings without breaking callers:
@dataclass(kw_only=True)
class Window:
width: int
height: int
Window(width=800, height=600)Window(width=800, height=600)
Window(800, 600)--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[15], line 1 ----> 1 Window(800, 600) TypeError: Window.__init__() takes 1 positional argument but 3 were given
dataclasses.replace is the canonical way to “modify” a frozen=True instance — it returns a new instance with the requested fields changed:
from dataclasses import replace
@dataclass(frozen=True)
class Config:
host: str
port: int
tls: bool = False
prod = Config("example.com", 443, tls=True)
staging = replace(prod, host="staging.example.com")
prod, staging(Config(host='example.com', port=443, tls=True),
Config(host='staging.example.com', port=443, tls=True))
Walking through the call:
replace(prod, host="staging.example.com") reads every field of prod, overrides the ones you name, and constructs a new Config from the merged values.Config(host="staging.example.com", port=prod.port, tls=prod.tls) — but you don’t have to spell out the unchanged fields.prod is unchanged; staging is a new immutable instance.The general rule: for any frozen dataclass, replace(obj, field=new_value) is the substitute for assignment.
Class patterns in match/case work cleanly with dataclasses, because the class already exposes its field names:
@dataclass
class City:
continent: str
name: str
country: str
def describe(record):
match record:
case City(continent="Asia", name=city):
return f"Asian city: {city}"
case City(continent="Europe", name=city):
return f"European city: {city}"
case City(name=city):
return f"City: {city}"
describe(City("Asia", "Tokyo", "JP")), describe(City("Africa", "Lagos", "NG"))('Asian city: Tokyo', 'City: Lagos')
Walking through the cases:
case City(continent="Asia", name=city): matches when the value is a City instance and record.continent == "Asia". The name=city part captures record.name into the local name city.case City(continent="Europe", name=city): is the same shape for Europe.case City(name=city): is the catch-all for any City regardless of continent — it only constrains the type, not the field values, and still captures name.match, case clauses are tried top-to-bottom; the first one that fits wins.The general rule: case ClassName(field=pattern, ...) matches an instance of that class whose named fields satisfy each sub-pattern. The same syntax works on typing.NamedTuple instances too.
A data class with zero methods is a code smell — it’s a dumb data container that forces callers to know its internals. Either add the methods that belong with the data, or use a tuple/dict and be explicit about its structure. A class earns its existence by encapsulating both data and behavior.
Config with environment dispatchConfiguration objects show up in every program — and they hit every dataclass feature we’ve covered: validation, immutability for safety, replace for variants, and pattern matching for environment-specific behaviour.
Step 1: a frozen dataclass with __post_init__ validation. Lock down the fields, validate them at construction, slot the instance for memory efficiency:
from dataclasses import dataclass, replace
@dataclass(frozen=True, slots=True)
class Config:
env: str
host: str
port: int = 5432
tls: bool = False
def __post_init__(self):
if self.env not in {"dev", "staging", "prod"}:
raise ValueError(f"unknown env: {self.env!r}")
if not 1 <= self.port <= 65535:
raise ValueError(f"port out of range: {self.port}")
dev = Config(env="dev", host="localhost")
devConfig(env='dev', host='localhost', port=5432, tls=False)
frozen=True rejects any post-construction mutation, so the validation in __post_init__ is the only time the values can be wrong — once a Config exists, it’s known-good. slots=True skips the per-instance __dict__. The __post_init__ runs after the generated __init__ finishes assigning fields, so we can read self.env and self.port to validate them; raising ValueError aborts construction.
Config(env="prod", host="db.example.com", port=70000) # invalid port--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[19], line 1 ----> 1 Config(env="prod", host="db.example.com", port=70000) # invalid port File <string>:7, in __create_fn__.<locals>.__init__(self, env, host, port, tls) Cell In[18], line 14, in Config.__post_init__(self) 10 def __post_init__(self): 11 if self.env not in {"dev", "staging", "prod"}: 12 raise ValueError(f"unknown env: {self.env!r}") 13 if not 1 <= self.port <= 65535: ---> 14 raise ValueError(f"port out of range: {self.port}") ValueError: port out of range: 70000
Step 2: derive variants with dataclasses.replace. A frozen instance can’t be mutated — the canonical “make a tweaked copy” function is replace:
prod = replace(dev, env="prod", host="db.prod.example.com", tls=True)
staging = replace(prod, env="staging", host="db.staging.example.com")
[dev, prod, staging][Config(env='dev', host='localhost', port=5432, tls=False),
Config(env='prod', host='db.prod.example.com', port=5432, tls=True),
Config(env='staging', host='db.staging.example.com', port=5432, tls=True)]
replace(dev, env="prod", ...) reads every field of dev, overrides the named ones, and runs the constructor again — so __post_init__ re-validates the new instance. staging derives from prod rather than from scratch, picking up tls=True for free; that’s the value-object pattern in motion.
Step 3: dispatch by environment with match/case. Class patterns let you branch on field values without touching the dataclass code:
def connect_url(cfg):
match cfg:
case Config(env="prod", host=h, port=p, tls=True):
return f"https://{h}:{p}/?strict=true"
case Config(env="staging", host=h, port=p):
return f"https://{h}:{p}/"
case Config(env="dev", host=h, port=p):
return f"http://{h}:{p}/"
case _:
raise ValueError(f"no URL handler for {cfg!r}")
[connect_url(dev), connect_url(staging), connect_url(prod)]['http://localhost:5432/',
'https://db.staging.example.com:5432/',
'https://db.prod.example.com:5432/?strict=true']
case Config(env="prod", host=h, port=p, tls=True): matches when the value is-a Config and the named fields equal the literals (env="prod", tls=True). Bare names (h, p) capture; literal values ("prod", True) constrain. Top-to-bottom evaluation gives prod-specific handling first, with staging and dev fallbacks. Adding a fourth environment is one more case clause, no if/elif chain.
The build is the chapter in motion: frozen=True + slots=True for an immutable, lightweight value object, __post_init__ for invariant validation, replace for derived variants, and match/case class patterns for dispatch — all on the same fifteen-line dataclass.
Defaults trap. Write a Bag dataclass with a contents: list = [] default. Create two Bag instances and add to one. Predict and explain what happens. Then fix it with field(default_factory=list).
__post_init__ validation. Write a Temperature dataclass with value: float and unit: str. Reject any unit that isn’t "C", "F", or "K" by raising ValueError in __post_init__.
Hashable but mutable? Create a non-frozen dataclass with frozen=False (the default). Try inserting an instance into a set. What happens? Why does frozen=True fix it?
Match against NamedTuple. Rewrite the Coordinate NamedTuple example to dispatch on hemisphere — north of the equator vs. south — using match/case.
__slots__ opt-in. Read the docs for @dataclass(slots=True). Create a class with and without slots, and compare the size with sys.getsizeof.
Python’s three data-class builders cover three different needs: namedtuple for tuple-shaped immutable records, typing.NamedTuple for the same with type hints and methods, and @dataclass for everything else. They all generate the boring boilerplate (__init__, __repr__, __eq__); they all integrate with pattern matching; and they all reward the discipline of giving your data classes behavior alongside their data.
Next, Chapter 18 fixes the three ideas every Python programmer eventually trips over: variables are labels, not boxes; == and is ask different questions; and == does not survive a copy by default.