def factorial(n):
"""Return n!"""
return 1 if n < 2 else n * factorial(n - 1)
factorial.__doc__, factorial.__name__, type(factorial)('Return n!', 'factorial', function)
Functions in Python are objects. They can be stored in variables, passed as arguments, returned from functions, and introspected. This is the foundation of decorators, higher-order functions, and functional programming in Python.
In this chapter you will learn to:
map, filter, sorted, reduce) and recognize when a comprehension is more readable.__call__.*, **, and / to define flexible parameter signatures.operator and functools.partial.In Java or C, a function is a special syntactic thing — you can call it but not store it in a variable or read its docstring at runtime. In Python a function is an object of type function, with attributes you can inspect (and sometimes write):
def factorial(n):
"""Return n!"""
return 1 if n < 2 else n * factorial(n - 1)
factorial.__doc__, factorial.__name__, type(factorial)('Return n!', 'factorial', function)
Walking through what’s on display:
factorial.__doc__ is the docstring — the triple-quoted string at the top of the body. Python attaches it to the function object automatically.factorial.__name__ is the name the function was defined under. Because it’s stored on the object (not on the variable), it survives reassignment to other names.type(factorial) reports <class 'function'> — factorial is an instance of the function class, just as 42 is an instance of int. This is the whole point: functions are values.Assigning a function to another name does not copy it — it adds a label:
fact = factorial
fact(5), fact is factorial(120, True)
Passing it as an argument works the same way:
list(map(factorial, range(6)))[1, 1, 2, 6, 24, 120]
map received the factorial object and called it on each element of range(6). Nothing magical — map calls its first argument like any other. The general rule: anywhere you can pass an int or a str, you can pass a function.
A higher-order function takes a function as a parameter or returns one. The standard library has many:
fruits = ["strawberry", "fig", "apple", "cherry", "raspberry", "banana"]
sorted(fruits, key=len)['fig', 'apple', 'cherry', 'banana', 'raspberry', 'strawberry']
sorted calls len(item) for each item and sorts by the result. map and filter are similar:
list(map(factorial, range(6)))[1, 1, 2, 6, 24, 120]
map(fn, iterable) returns a lazy iterator that calls fn on each element. list(...) materialises it: [factorial(0), factorial(1), ..., factorial(5)] == [1, 1, 2, 6, 24, 120]. The function and the iterable each get to be lazy — useful with very large inputs.
list(filter(lambda n: n % 2, range(10)))[1, 3, 5, 7, 9]
filter(pred, iterable) keeps only the elements where pred(x) is truthy. n % 2 is 1 for odd numbers (truthy) and 0 for even (falsy), so the result is [1, 3, 5, 7, 9] — every odd number in range(10).
Comprehensions usually read better than map/filter:
[factorial(n) for n in range(6)][1, 1, 2, 6, 24, 120]
Same output as list(map(factorial, range(6))), with the function call written out explicitly. Reads as “the list of factorial(n) for each n in range(6).”
[n for n in range(10) if n % 2][1, 3, 5, 7, 9]
Same output as list(filter(lambda n: n % 2, range(10))). The if n % 2 is the comprehension filter; the expression before for is what’s kept (here, just n). Reads as “the list of n for each n in range(10) if n is odd.”
functools.reduce is the cumulative-sum form:
from functools import reduce
from operator import add
reduce(add, range(100))4950
reduce(op, iterable) folds the iterable left-to-right with the binary operator. Step by step: add(0, 1) is 1; add(1, 2) is 3; add(3, 3) is 6; …; the final accumulator after 100 elements is the sum 0 + 1 + ... + 99 = 4950. Optional third argument is the initial value (default: the first element). For sums specifically, prefer sum(range(100)) — reduce shines for non-trivial accumulators (reduce(operator.mul, ...) for products, reduce(max, ...) for the running maximum).
A lambda is a one-expression function literal. It’s useful when you’d otherwise have to give a name to something the reader will read once:
sorted(fruits, key=lambda word: word[::-1])['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']
Sorted by reversed word, so words ending in a cluster.
When a lambda is hard to read:
def.The named version is almost always easier to read.
A worked example. This sort key — “profit per unit minus cost” — packs three operations into one line:
data = [("widget", 10, 5, 2), ("gizmo", 8, 4, 1), ("gadget", 12, 6, 5)]
sorted(data, key=lambda r: r[1] * r[2] - r[3])[('gizmo', 8, 4, 1), ('widget', 10, 5, 2), ('gadget', 12, 6, 5)]
Apply the recipe — give the expression a name:
def profit(record):
name, units, margin, cost = record
return units * margin - cost
sorted(data, key=profit)[('gizmo', 8, 4, 1), ('widget', 10, 5, 2), ('gadget', 12, 6, 5)]
The def form documents what the key means. Reach for lambda only when the expression is short and obvious.
callable() reports whether an object can be called:
callable(abs), callable([])(True, False)
In practice you’ll meet six kinds. Python’s reference manual splits these further (built-in vs user, generator vs coroutine vs async generator) — the practical taxonomy is:
| Kind | Example |
|---|---|
def / built-in / method |
factorial, len, list.append |
lambda |
lambda x: x * 2 |
| Class | BingoCage(...) — calling a class instantiates it |
__call__ instance |
bingo() — classes whose instances are callable |
| Generator function | def gen(): yield ... |
async def (coroutine / async generator) |
async def fetch(): ... |
flowchart LR
C(("callable()"))
C --> A["def / lambda"]
C --> B["class"]
C --> D["__call__ instance"]
C --> E["generator (yield)"]
C --> F["async def"]
The fourth row — instances callable via __call__ — is the trick that makes a class as flexible as a function with state.
Sometimes you want a callable that also keeps state — a function-like thing that remembers what it has done. A plain def can’t easily hold private data; a class can. Define __call__ and an instance becomes callable while still being an object with attributes and methods:
import random
class BingoCage:
def __init__(self, items):
self._items = list(items)
random.shuffle(self._items)
def pick(self):
try:
return self._items.pop()
except IndexError:
raise LookupError("pick from empty BingoCage")
def __call__(self):
return self.pick()
random.seed(0)
bingo = BingoCage(range(3))
bingo(), bingo(), callable(bingo)(1, 2, True)
Walking through this:
__init__ copies items into self._items (so the caller’s list isn’t mutated) and shuffles it in place. The leading underscore is the convention for “internal — don’t poke at this from outside.”pick pops the last item; if the cage is empty, list.pop raises IndexError. We re-raise it as LookupError because that’s the abstraction this class exports — callers shouldn’t have to know it’s backed by a list.__call__(self) is the special method Python invokes when you write bingo(). The expression bingo() is sugar for bingo.__call__(). Without this method, bingo is just an object; with it, bingo is a callable.callable(bingo) returns True precisely because __call__ is defined on its class.The general pattern: when you need a callable with memory, a class with __call__ gives you the function-call syntax of a def plus the state and methods of an object. Closures (next chapter) are the lighter alternative when the state is small.
Real-world functions often need a mix: one or two required arguments, an open-ended set of extras, and named options that must be passed by keyword to avoid confusion. Python’s parameter syntax — the /, *, ** markers — lets you state exactly which parameters are positional, which are keyword-only, and where the variable-length collectors sit. The tag example exercises most of it:
def tag(name, /, *content, class_=None, **attrs):
"""Generate one or more HTML tags."""
if class_ is not None:
attrs["class"] = class_
attr_pairs = (f' {a}="{v}"' for a, v in sorted(attrs.items()))
attr_str = "".join(attr_pairs)
if content:
elements = (f"<{name}{attr_str}>{c}</{name}>" for c in content)
return "\n".join(elements)
return f"<{name}{attr_str} />"
tag("br")'<br />'
tag("p", "hello")'<p>hello</p>'
print(tag("p", "hello", "world"))<p>hello</p>
<p>world</p>
tag("p", "hello", class_="sidebar")'<p class="sidebar">hello</p>'
tag("img", src="sunset.jpg", class_="framed", alt="sunset")'<img alt="sunset" class="framed" src="sunset.jpg" />'
Reading the signature def tag(name, /, *content, class_=None, **attrs):
name is positional-only (the / after it forbids name=...).*content collects extra positional arguments.class_ is keyword-only (anything after *content must be passed by keyword).**attrs collects extra keyword arguments.The trailing underscore on class_ avoids a clash with Python’s reserved word class.
To see / in isolation — this is how built-ins like divmod are spelled in pure Python:
def divmod_(a, b, /):
return a // b, a % b
divmod_(7, 3) # ok
divmod_(a=7, b=3) # TypeError — a, b are positional-only--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[20], line 5 1 def divmod_(a, b, /): 2 return a // b, a % b 3 4 divmod_(7, 3) # ok ----> 5 divmod_(a=7, b=3) # TypeError — a, b are positional-only TypeError: divmod_() got some positional-only arguments passed as keyword arguments: 'a, b'
operator moduleYou’ll often want to pass an operator (*, <, [i], .attr) as a function — to reduce, sorted(key=...), map. The natural reflex is lambda a, b: a * b or lambda r: r[1], but those throwaway lambdas clutter the call site. operator exports the arithmetic, comparison, and access operators as named functions:
from operator import mul, itemgetter, attrgetter, methodcaller
from functools import reduce
reduce(mul, range(1, 6))120
Walking through this small example:
mul is operator.mul — the function form of *. mul(3, 4) returns 12.reduce(mul, range(1, 6)) walks [1, 2, 3, 4, 5] left-to-right, computing ((((1*2)*3)*4)*5) — i.e., 5!. Without operator.mul you’d write reduce(lambda a, b: a * b, ...), which says the same thing more noisily.itemgetter(i) is equivalent to lambda r: r[i]:
metro_data = [
("Tokyo", "JP", 36.933),
("Delhi NCR", "IN", 21.935),
("São Paulo", "BR", 21.090),
]
for city in sorted(metro_data, key=itemgetter(2), reverse=True):
print(city[0], city[2])Tokyo 36.933
Delhi NCR 21.935
São Paulo 21.09
itemgetter accepts multiple indices and returns a tuple — useful for sort keys with tie-breakers, or to project a few fields out of a record:
cc_name = itemgetter(1, 0)
cc_name(metro_data[0])('JP', 'Tokyo')
attrgetter is the same trick for attribute access. It understands dotted paths and accepts multiple fields:
from collections import namedtuple
Coord = namedtuple("Coord", "lat lon")
City = namedtuple("City", "name pop coord")
cities = [
City("Tokyo", 36.933, Coord(35.69, 139.69)),
City("Lagos", 13.46, Coord(6.45, 3.40)),
]
name_lat = attrgetter("name", "coord.lat")
name_lat(cities[0])('Tokyo', 35.69)
Walking through this:
namedtuple("Coord", "lat lon") builds a tiny class with two fields. We use it to give cities a nested structure (each city has a coord with its own lat and lon).attrgetter("name", "coord.lat") returns a callable that, given any object, fetches obj.name and obj.coord.lat. The dotted path "coord.lat" does the nested lookup for you — no need for lambda c: c.coord.lat.name_lat(cities[0]) returns a tuple ("Tokyo", 35.69) — one entry per attribute path, in the order you listed them.methodcaller("replace", " ", "-") is the third sibling — it freezes a method name and arguments, returning a callable that invokes that method on whatever you pass it. You’ll reach for attrgetter and itemgetter daily; methodcaller only occasionally. The general pattern: any time you’d write a one-line lambda that does index, attribute, or method access, there’s an operator function that says it more clearly.
partialA function that takes a function (map, sorted(key=...)) usually expects a callable of one specific arity — map wants f(x), not f(x, y). When the function you have is wider than the slot you need to fit it into, you’d write lambda x: mul(3, x) to fix one argument. functools.partial does the same job without the lambda:
from operator import mul
from functools import partial
triple = partial(mul, 3)
list(map(triple, range(1, 6)))[3, 6, 9, 12, 15]
Walking through this:
partial(mul, 3) returns a new callable that, when called, invokes mul(3, ...) — the 3 is frozen as the first argument.triple gives the resulting one-argument function a name. triple(7) is mul(3, 7), i.e., 21.map(triple, range(1, 6)) then needs only a one-argument callable, which triple is.A useful real-world example — pre-binding the form of Unicode normalization:
import unicodedata, functools
nfc = functools.partial(unicodedata.normalize, "NFC")
nfc("café")'café'
Now nfc(s) is a one-argument function with "NFC" baked in. Cleaner at the call site than passing the form repeatedly. The general pattern: when a callable has too many parameters for the role you need, partial is the surgical tool to specialize it.
Functions are objects — they have attributes (__name__, __doc__, __annotations__, __defaults__), they can be passed around and stored, and they can be called with __call__. This is not a special trick — it’s Python’s fundamental design.
Pipeline of transformationsMost data-cleaning code is a sequence of one-argument transformations: strip, lower-case, replace, parse. Three or four of them in a row is fine; ten of them is a Russian doll of nested calls. We’ll build a Pipeline class whose instances are callable and stages are first-class functions — the chapter’s tools applied to themselves.
Step 1: a callable class that chains stages. Store a list of one-argument functions; __call__ feeds the input through them left-to-right:
class Pipeline:
def __init__(self, stages=None):
self.stages = list(stages) if stages else []
def __call__(self, x):
for stage in self.stages:
x = stage(x)
return x
shout = Pipeline([str.strip, str.upper])
shout(" hello ")'HELLO'
__call__ makes a Pipeline instance behave like a function: shout("...") is sugar for shout.__call__("..."). The stages are stored as function objects — str.strip and str.upper are unbound methods you can pass around exactly like any other callable.
Step 2: a fluent .then for extension. Return a new Pipeline rather than mutating self.stages — value semantics keep callers safe from aliasing surprises (the lesson from chapter 18):
class Pipeline:
def __init__(self, stages=None):
self.stages = list(stages) if stages else []
def __call__(self, x):
for stage in self.stages:
x = stage(x)
return x
def then(self, fn):
return Pipeline([*self.stages, fn])
trim_lower = Pipeline().then(str.strip).then(str.lower)
[trim_lower(" Hello "), trim_lower("WORLD")]['hello', 'world']
then builds a new list ([*self.stages, fn]) and wraps it in a new Pipeline. The original is unchanged — you can fork off variants without disturbing what’s already in use. The empty-default Pipeline() plus chained .then reads almost like the textual description of the transformation.
Step 3: build pipelines from operator and partial. Real stages often need parameters (a separator, a regex, a replacement). partial and methodcaller express those without ever writing a lambda:
from operator import methodcaller, itemgetter
from functools import partial
slug = (
Pipeline()
.then(str.strip)
.then(str.lower)
.then(methodcaller("replace", " ", "-"))
)
[slug(" Hello World "), slug("PYTHON Rules")]
# A pipeline over records: extract field, then transform it
records = [{"name": " Alice ", "score": 95}, {"name": "BOB", "score": 87}]
clean_name = (
Pipeline()
.then(itemgetter("name")) # dict -> str
.then(str.strip)
.then(str.title)
)
[clean_name(r) for r in records]['Alice', 'Bob']
methodcaller("replace", " ", "-") returns a one-argument callable that invokes s.replace(" ", "-") on whatever it gets — exactly the right shape for a pipeline stage. itemgetter("name") projects the name field out of a dict; the rest of the pipeline operates on the extracted string. None of the stages is a lambda — the chapter’s operator and functools.partial tools say what’s happening more clearly.
The build threads the chapter’s ideas through one example: __call__ to make instances callable, value semantics in .then to avoid aliasing, and methodcaller/itemgetter/partial so each stage is named after what it does rather than written as a one-line lambda.
Sort by reversed word. Without lambda, sort fruits by reversed word using operator.itemgetter or a small def. Compare readability.
__call__ for caches. Write a Memoize class whose instances behave like a memoized version of a function passed in __init__. bingo shows the pattern.
Keyword-only parameters. Modify tag so that id_ is also keyword-only. Verify positionally passing id_ raises TypeError.
partial vs lambda. Express lambda x: x.replace(" ", "-") using methodcaller. Now express it using partial(str.replace, ...) — does that work, and why or why not?
Find the callable. Given a list of mixed values (functions, classes, instances, integers), filter only the callables using callable.
Functions are first-class objects. Every higher-order pattern in Python — comprehensions, sorted(key=...), decorators, callbacks, functools.partial, callable instances — depends on this. Once you internalize that a function is just another object, the rest of the language becomes consistent.
Next, Chapter 20 layers an optional, gradual type system on top of these functions: hints checked by tools, ignored by the interpreter, and powerful enough to catch real bugs.