6  Functions

NoteCore idea

A function is a named, reusable block of code. In Python, functions are objects — you can store them in variables, pass them as arguments, and return them from other functions. That single property is the foundation of decorators, callbacks, and most of Python’s higher-order patterns.

In this chapter you will learn to:

  1. Define functions with def and document them with docstrings.
  2. Use positional, default, keyword, *args, and **kwargs parameters.
  3. Reason about scope (LEGB) and avoid the mutable-default trap.
  4. Recognize closures, lambda expressions, and what a first-class function means.
  5. Read your first decorator and your first generator.

6.1 Defining a function

When you find yourself writing the same few lines twice, factor them into a function: a named, parameterised block of code you can call from anywhere. Even one-line bodies earn their keep when the name documents intent.

def greet(name):
    """Return a greeting string."""
    return f"Hello, {name}!"

greet("Alice")
'Hello, Alice!'
  • def greet(name): declares a function called greet taking one parameter, name.
  • The first string in the body — """Return a greeting string.""" — is the docstring. It’s not a comment; it’s accessible at runtime via greet.__doc__ and shown by help(greet) and IDEs.
  • return f"Hello, {name}!" produces the value the call evaluates to.
  • greet("Alice") invokes the function, binding "Alice" to the parameter name, and returns the string.

The general rule: def name(params): opens a function, the body is indented, the first string is the docstring, and return produces the result.

Without return, a function returns None:

def shout(message):
    print(message.upper())

result = shout("hi")   # prints HI
result is None
HI
True

6.2 Parameters

A serious function rarely takes just one positional argument. Real APIs need optional defaults, keyword-only flags that read clearly at the call site, and “any number of extras” for forwarding. Python packs all five mechanisms into one syntax — positional, default, keyword-only, *args, **kwargs:

def connect(host, port=5432, *, timeout=30, **options):
    return {
        "host": host,
        "port": port,
        "timeout": timeout,
        "options": options,
    }

connect("localhost")
{'host': 'localhost', 'port': 5432, 'timeout': 30, 'options': {}}
  • host is a required positional — must be supplied.
  • port=5432 is a default — optional, falls back to 5432 if omitted.
  • * is a separator (not a parameter): everything after it is keyword-only.
  • timeout=30 is a keyword-only default — callers must pass it by name (timeout=60), never positionally.
  • **options collects any other keyword arguments into a dict — a catch-all for forwarding.
connect("localhost", 6543, timeout=60, ssl=True, retries=3)
{'host': 'localhost',
 'port': 6543,
 'timeout': 60,
 'options': {'ssl': True, 'retries': 3}}
  • "localhost" and 6543 fill host and port positionally.
  • timeout=60 is required to be by-name (because of the *).
  • ssl=True, retries=3 aren’t named parameters, so they land in **options as a dict.

The general rule: parameters before * accept positional or keyword; parameters after * are keyword-only; **name sweeps up any leftover keyword arguments. So connect("localhost", 5432, 60) would fail — timeout must be named.

*args and **kwargs work as parameters too — they accept any number of extra arguments:

def total(*numbers):
    return sum(numbers)

def record(**fields):
    return dict(fields)

[total(1, 2, 3, 4, 5), record(name="Alice", age=30)]
[15, {'name': 'Alice', 'age': 30}]
  • *numbers collects all positional arguments into a tuple — total(1, 2, 3, 4, 5) makes numbers = (1, 2, 3, 4, 5).
  • **fields collects all keyword arguments into a dict — record(name="Alice", age=30) makes fields = {"name": "Alice", "age": 30}.

When calling, the same * and ** syntax unpacks an iterable or dict back into arguments:

args = [6543]
kwargs = {"timeout": 60, "ssl": True}
connect("localhost", *args, **kwargs)
{'host': 'localhost', 'port': 6543, 'timeout': 60, 'options': {'ssl': True}}
  • *args spreads [6543] as positional arguments — equivalent to writing 6543 directly.
  • **kwargs spreads {"timeout": 60, "ssl": True} as keyword arguments — equivalent to timeout=60, ssl=True.

The general rule: * / ** in a def collects, in a call spreads. Same syntax, complementary directions.

The mutable-default trap. Default arguments are evaluated once, at definition time. So a default like target=[] creates a single list shared across every call.

def add_item_buggy(item, target=[]):
    target.append(item)
    return target

add_item_buggy("a"), add_item_buggy("b")
(['a', 'b'], ['a', 'b'])

The output (['a'], ['a', 'b']) is the bug made visible. The second call did not start with a fresh empty list — target=[] was evaluated once when the def was processed, and that one list is shared across every call that doesn’t supply its own target. The first call appended 'a'; the second call saw a list that already contained 'a' and appended 'b'. Both return values are the same list — calling add_item_buggy(...) a third time would show ['a', 'b', 'next_thing']. State leaks between unrelated calls.

The fix is the None sentinel:

def add_item(item, target=None):
    if target is None:
        target = []
    target.append(item)
    return target

add_item("a"), add_item("b")
(['a'], ['b'])

The output (['a'], ['b']) is what callers expected the first time. target=None is safe to share because None is immutable. Inside the body, if target is None: target = [] builds a fresh [] per call — so two callers get two different lists. The if target is None: (not if not target:) is intentional: a caller passing an intentionally empty list [] should be honoured, not replaced with a new one. The sentinel-plus-fresh-allocation is the universal idiom for any mutable default.

6.3 Returning multiple values

Some functions naturally produce more than one value — min_max wants to return both the minimum and maximum in one pass. Languages that allow only a single return value force you to wrap it in a struct or use out-parameters; Python piggy-backs on tuples and unpacking instead.

def min_max(data):
    return min(data), max(data)

lo, hi = min_max([3, 1, 4, 1, 5, 9, 2, 6])
lo, hi
(1, 9)
  • return min(data), max(data) returns a single object — the 2-tuple (min, max) — the comma builds the tuple implicitly.
  • lo, hi = min_max(...) unpacks the returned tuple into two names.
  • The caller writes lo, hi = ... exactly as if the function had two return slots.

The general rule: “multiple returns” is just a tuple. The caller chooses whether to unpack (a, b = f()) or keep the pair (pair = f()).

6.4 Scope: LEGB

When you write print(x) inside a function, where does Python look for x? In a function with nested functions and a module-level variable, four places are possible — and Python checks them in a fixed order.

Python looks up names in this order:

  • Local — inside the current function
  • Enclosing — inside any wrapping function
  • Global — at the module/file level
  • Built-in — len, range, print, …
x = 10                # global

def outer():
    x = 20            # enclosing (visible to inner)
    def inner():
        x = 30        # local
        return x
    return inner(), x

outer(), x
((30, 20), 10)
  • The module-level x = 10 is global.
  • Inside outer, x = 20 creates a new local x for outer — it does not touch the global one.
  • Inside inner, x = 30 creates a new local x for inner — neither the enclosing nor the global one.
  • inner() returns its own 30. Then outer reads its own x (20) and returns the pair.
  • The outermost call shows outer() returning (30, 20), while the global x is still 10.

The general rule: each function’s assignments create local names by default, even if the same name exists in an enclosing scope. Lookup walks outward (Local → Enclosing → Global → Built-in) until something matches; assignment creates a new local unless declared otherwise.

To modify an outer variable from inside a function, you need an explicit declaration: global or nonlocal. Without it, Python’s “assignments make locals” rule will silently shadow the outer name instead. We’ll see the trap first, then the fix, then the deeper lesson — closures.

Step 1: see the trap. Try writing a counter the obvious way, with no nonlocal:

def make_counter():
    count = 0
    def increment():
        count += 1   # assignment makes count a *local* of increment
        return count
    return increment

c = make_counter()
c()
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Cell In[11], line 9
      5         return count
      6     return increment
      7 
      8 c = make_counter()
----> 9 c()

Cell In[11], line 4, in make_counter.<locals>.increment()
      3     def increment():
----> 4         count += 1   # assignment makes count a *local* of increment
      5         return count

UnboundLocalError: cannot access local variable 'count' where it is not associated with a value
  • Inside increment, the line count += 1 is shorthand for count = count + 1 — an assignment to count. That assignment makes count a local of increment.
  • But the right-hand side reads count before anything has been assigned to it locally — so Python raises UnboundLocalError.
  • The enclosing count = 0 is visible for reading in nested scopes, but as soon as you assign to a name, Python decides “that name is local now” and the enclosing version is shadowed.

Step 2: declare nonlocal. Tell Python “don’t make a new local — bind to the enclosing one”:

def make_counter():
    count = 0
    def increment():
        nonlocal count
        count += 1
        return count
    return increment

c = make_counter()
[c(), c(), c()]
[1, 2, 3]
  • nonlocal count says “this count is the one in the enclosing function, not a new local.” The += now reads and writes that enclosing slot.
  • make_counter() returns the increment function — without calling it. Calling c() repeatedly bumps the same count because increment keeps a reference to its enclosing scope after make_counter has already returned.

Step 3: the closure. That last fact is the key idea: the inner function carries its enclosing variables with it, alive even after the outer function returned. The function-plus-its-captured-scope is called a closure. A closure can hold more than a single number — here’s a running-average factory:

def make_averager():
    samples = []
    def averager(value):
        samples.append(value)
        return sum(samples) / len(samples)
    return averager

avg = make_averager()
[avg(10), avg(20), avg(30)]
[10.0, 15.0, 20.0]

Each call to make_averager() returns a fresh averager with its own samples list — samples lives as long as the returned function does. Note we did not need nonlocal here: samples.append(...) is a method call, not an assignment to samples, so the local-name rule never trips.

The general rule: nonlocal name says “modify the enclosing function’s name”; global name says “modify the module-level name”.

The picture for “where does Python find a name?”:

flowchart LR
    L[Local<br/>current function] --> E[Enclosing<br/>any wrapping function]
    E --> G[Global<br/>module level]
    G --> B[Built-in<br/>len, range, print, ...]
    B --> X[NameError]

Python walks left to right and stops at the first scope that defines the name. global x skips straight to the global cell; nonlocal x binds to the nearest enclosing cell.

6.5 Functions are objects

In many languages, functions are second-class — you can call them but not pass them around. In Python, a function is just an object you happen to be able to invoke. That single property unlocks higher-order patterns: storing functions in lists, passing them as arguments, returning them from other functions.

def double(x): return x * 2
def square(x): return x ** 2

ops = [double, square, abs]
[op(5) for op in ops]
[10, 25, 5]
  • double, square, and the built-in abs are all values — naming them without () gets the function object itself, not its result.
  • [double, square, abs] is a regular list whose elements happen to be callable.
  • [op(5) for op in ops] iterates the list, calling each function on 5, and collects the results.

The same idea is what makes key= arguments work — pass a function in, the caller calls it for you:

sorted(["banana", "Apple", "cherry"], key=str.lower)
['Apple', 'banana', 'cherry']
  • str.lower is the unbound method — a function that takes a string and returns the lowercased version.
  • sorted calls str.lower(item) once per element to derive a sort key, then sorts by those keys.
  • The original strings stay in the result; only the comparison uses the lowered form, giving a case-insensitive sort.

lambda is a one-line, anonymous function form — useful when you’d rather not invent a name for a throwaway helper:

sorted([("apple", 3), ("banana", 1), ("cherry", 2)], key=lambda pair: pair[1])
[('banana', 1), ('cherry', 2), ('apple', 3)]
  • lambda pair: pair[1] is a function of one argument that returns its second element.
  • sorted uses it as the key function, sorting the list of pairs by their numeric second element.

The general rule: lambda args: expression is shorthand for a single-expression def, useful when you’d prefer not to name the function.

Use lambda only for trivial expressions. If it needs a comment to explain it, use def and give it a name. Compare:

records = [{"name": "Alice", "score": 95}, {"name": "Bob", "score": 87}]

# Trivial — lambda is fine:
sorted(records, key=lambda r: r["score"])
[{'name': 'Bob', 'score': 87}, {'name': 'Alice', 'score': 95}]
# Needs a comment to explain — promote to a named def:
def by_score_then_name(record):
    """Sort key: descending score, then name ascending for ties."""
    return (-record["score"], record["name"])

sorted(records, key=by_score_then_name)
[{'name': 'Alice', 'score': 95}, {'name': 'Bob', 'score': 87}]

The named version reads itself; the equivalent lambda would force a comment above the sorted call. The full treatment of first-class functions, callables, and higher-order patterns is in Chapter 19.

6.6 A first decorator

Suppose you want to time how long a function takes — and you want it for several functions, without copying the timing code into each one. A decorator is a function that takes a function and returns a new function with extra behavior wrapped around it. We’ll build one in three small steps.

Step 1: a function that takes a function and returns a new one.

import time

def timer(func):
    def wrapper(n):
        start = time.perf_counter()
        result = func(n)
        print(f"{func.__name__}: {time.perf_counter() - start:.4f}s")
        return result
    return wrapper

def slow_sum(n):
    return sum(range(n))

slow_sum = timer(slow_sum)   # rebind the name to the wrapped version
slow_sum(1_000_000)
slow_sum: 0.0208s
499999500000
  • timer is an ordinary function whose argument is another function.
  • Inside, it defines a local function wrapper that does the timing around func(n) and returns the original result.
  • time.perf_counter() returns a float — the count of seconds from a fixed (but arbitrary) reference point. Two readings subtracted give an elapsed wall-clock duration with sub-microsecond resolution; it’s the right primitive for “how long did this take?” timing.
  • func.__name__ reads the wrapped function’s name attribute — every function has a __name__ (a string) and a __doc__ (the docstring), among other introspection attributes.
  • timer returns wrapper. So slow_sum = timer(slow_sum) rebinds the name slow_sum to the wrapped version — the original is now captured inside the closure.
  • Calling slow_sum(1_000_000) actually calls wrapper(1_000_000), which times the call and prints the elapsed seconds.

Step 2: the @ sugar. Writing slow_sum = timer(slow_sum) after every def gets repetitive. Python’s @decorator syntax does the same rebind:

@timer
def fast_sum(n):
    return sum(range(n))

fast_sum(1_000_000)
fast_sum: 0.0206s
499999500000
  • @timer directly above def fast_sum(n): is exactly equivalent to writing fast_sum = timer(fast_sum) after the def.
  • Reading rule: every @name above a def means “pass this function through name and rebind the name to whatever comes back.”

Step 3: make it work on any signature. Our wrapper(n) only accepts one argument — useless for a decorator that should work on any function. Use *args, **kwargs to accept and forward arbitrary arguments. Add @functools.wraps(func) so the wrapped function still looks like the original to help(), tracebacks, and IDEs:

import functools

def timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        print(f"{func.__name__}: {time.perf_counter() - start:.4f}s")
        return result
    return wrapper

@timer
def greet(name, greeting="hello"):
    return f"{greeting}, {name}"

greet("Alice")
greet: 0.0000s
'hello, Alice'
  • *args, **kwargs accept any positional and keyword arguments and forward them to the original — so timer now works on functions of any signature.
  • @functools.wraps(func) copies func’s name, docstring, and attributes onto wrapper. Without it, greet.__name__ would become "wrapper" and tracebacks would point at the wrong function.

The general rule: @decorator above a def rebinds the name to decorator(original). The deep treatment of decorators — including parameterized decorators and a tour of the standard library’s decorators — is Chapter 21.

6.7 A first generator

Sometimes you want to produce a sequence of values, but materialising the whole list up front is wasteful — or impossible (the sequence is infinite). A function that uses yield instead of return is a generator. It produces values one at a time, on demand, and pauses between them. Three small steps will show how.

Step 1: define a generator and see what calling it returns.

def countdown(n):
    while n > 0:
        yield n
        n -= 1

countdown(3)
<generator object countdown at 0x7f7744dd1c00>
  • The yield keyword turns countdown into a generator function.
  • Calling countdown(3) does not execute the body. It returns a generator object — a paused computation that will run the body when you ask for values.
  • The repr <generator object countdown at 0x…> is your cue that no work has happened yet.

Step 2: pull values one at a time with next().

gen = countdown(3)
[next(gen), next(gen), next(gen)]
[3, 2, 1]
  • Each next(gen) resumes the generator until it hits the next yield, then hands that value back. Between next() calls the function is pausedn and the position in the while loop are kept intact.
  • A fourth next(gen) would raise StopIteration — the generator’s signal that the body has run off the end.

Step 3: consume the whole thing. for loops and list() already know how to call next() until StopIteration, and stop quietly:

list(countdown(5))
[5, 4, 3, 2, 1]
  • list(countdown(5)) walks the generator to exhaustion, collecting all yielded values into a real list.
  • A for x in countdown(5): loop would do the same lazily, visiting one value at a time without materialising them all.

The general rule: yield makes a function pausable; the consumer pulls one value at a time with next() (or implicitly via for). Generators are how Python expresses lazy sequences — including infinite ones. The deep treatment, including yield from, generator pipelines, and the iterator protocol underneath, is Chapter 29.

TipWhy this matters

Functions are first-class objects. That single property unlocks decorators, higher-order patterns, callbacks, and most of the standard library’s API. The mutable-default trap and the nonlocal rule are the two surprises that beginners hit first — knowing them now is half the battle.

6.8 Going deeper

This chapter scratches the surface of every concept it touches. The deep dives:

  • First-class functions, callables, and higher-order patternsChapter 19.
  • Decorators in depth, including parameterized decoratorsChapter 21.
  • Generators, the iterator protocol, and lazy pipelinesChapter 29.

6.9 Build: memoizing recursion, then replacing it with a generator

Fibonacci is the canonical small problem for “first-class functions in motion”: one definition tries to compute it recursively (and dies for medium inputs), a decorator rescues it by caching, and a generator sidesteps the recursion entirely. Three steps, three concepts.

Step 1: the naive recursive version. Direct from the math definition fib(n) = fib(n-1) + fib(n-2):

def fib(n):
    return n if n < 2 else fib(n - 1) + fib(n - 2)

[fib(0), fib(1), fib(10), fib(20)]
[0, 1, 55, 6765]

fib(20) is fine. fib(35) is already slow — the recursion recomputes the same subproblems exponentially many times. We could rewrite the function. We will leave the function alone and decorate it.

Step 2: a memoizing decorator built from a closure. Every call with the same arguments should be looked up, not recomputed. The cache lives in the closure of the decorator:

import functools

def memo(func):
    cache = {}                                # closure cell
    @functools.wraps(func)
    def wrapper(*args):
        if args not in cache:
            cache[args] = func(*args)
        return cache[args]
    return wrapper

@memo
def fib(n):
    return n if n < 2 else fib(n - 1) + fib(n - 2)

fib(100)
354224848179261915075

cache is a dict captured by wrapper — the closure pattern from the chapter. The key is args (a tuple, which is hashable, so it works as a dict key — see Section 5.2). Re-binding fib via @memo means the recursive calls inside fib also go through the wrapped version, which is what makes memoisation effective: every subproblem is solved once. fib(100) returns instantly.

Step 3: a generator that doesn’t need the recursion at all. Both fibs above compute one number per call. If you actually want a stream of Fibonacci numbers (the first 10, the first that’s above a million, etc.), yield turns the iterative formulation into an infinite sequence:

from itertools import islice

def fib_stream():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

list(islice(fib_stream(), 10))
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

fib_stream is an infinite generator — the while True never exits — but next only computes one term at a time, so the loop is harmless. itertools.islice(gen, 10) takes the first 10 values, the way [:10] would on a list (we cover itertools in Section 11.2).

Three answers to the same question. memo shows what a decorator buys you: the original recursive fib is unchanged, and the cache lives entirely in the decorator’s closure. fib_stream shows what a generator buys you: lazy production, infinite sequence, no recursion at all. Choosing between them is the kind of design judgment first-class functions enable.

6.10 Exercises

  1. The None sentinel. Write a function with_log(message, log=None) that defaults log to a fresh empty list. Verify two calls don’t share state.

  2. Closure counter. Re-implement make_counter so it accepts an initial argument: make_counter(10) should return a function whose first call returns 11.

  3. *args average. Write average(*nums) that returns the arithmetic mean. Handle average() with no arguments — what should it raise?

  4. Sort by length. Sort ["banana", "Apple", "cherry"] by length using sorted(..., key=...). Now sort case-insensitively by string value.

  5. Trace a decorator. Write a decorator @trace that prints the function name, args, kwargs, and return value before/after the call. Apply it to a small function. Read the output.

6.11 Summary

A function is a named block of reusable behavior — and, because it’s an object, it can be stored, passed, and returned. Decorators and generators are immediate practical consequences of that fact. The next chapter, Chapter 7, turns to what happens when things go wrong: try, except, and Python’s exception hierarchy.