31 Concurrency Models in Python

Core idea

Python has three concurrency models. The GIL means threads can’t run Python bytecodes in parallel — but they CAN overlap during I/O. Processes bypass the GIL entirely. asyncio handles I/O concurrency in a single thread via coroutines.

In this chapter you will learn to:

Distinguish concurrency (structure) from parallelism (simultaneous execution).
State what the GIL does and does not allow.
Write the same “spinner while computing” program three ways: threads, processes, and asyncio.
Pick the right model for I/O-bound vs CPU-bound work.

31.1 A bit of jargon

Term	Meaning
Concurrency	dealing with multiple things at once (a structural property)
Parallelism	doing multiple things at once (an execution property)
Thread	OS-scheduled unit of execution; shares memory with siblings
Process	OS-scheduled unit; separate memory
Coroutine	user-scheduled unit; single thread; cooperative `await`

CPython has a Global Interpreter Lock (GIL): only one thread executes Python bytecode at a time. The GIL is released during I/O operations — file reads, network calls, time.sleep — which is why threads still help for I/O-bound work. CPU-bound code does not benefit from threads in CPython; for that, use processes.

31.2 Three spinners

The same program — animate a spinner while a slow computation runs — written with each model. Each spinner runs in the foreground while slow() does its three-second work.

Spinner code is illustrative

These programs print to a real terminal. They run in a notebook but the spinner animation is invisible because Quarto captures only the final output. The point of the chapter is the structure, not the visual effect.

31.2.1 Threads

The threading version uses an Event as the cross-thread signal that says “we’re done, stop spinning”:

import threading, itertools, time

def spin(msg, done):
    for char in itertools.cycle(r"\|/-"):
        if done.wait(0.1):
            break
    return f"{msg}: stopped"

def slow():
    time.sleep(0.3)
    return 42

done = threading.Event()
spinner = threading.Thread(target=spin, args=("thinking", done))
spinner.start()
result = slow()
done.set()
spinner.join()
result

Walking through the threading pieces:

itertools.cycle(r"\|/-") is an infinite iterator over the spinner glyphs — \, |, /, -, \, … The loop never ends on its own; it relies on the done signal.
done.wait(0.1) blocks for up to 0.1 seconds. It returns True if done was set (the signal arrived), False on timeout. So this is both the frame delay and the cancellation check.
threading.Thread(target=..., args=...) builds a thread but does not start it. .start() schedules it with the OS.
slow() runs on the main thread while spin runs on the spinner thread — they share memory, including the done Event.
done.set() flips the event to “set”. On the next done.wait(0.1) call inside spin, that returns True and the loop breaks.
spinner.join() blocks until the spinner thread actually finishes — without it, the main thread could exit before the spinner’s last frame.

The general pattern: shared mutable state (the Event) is the coordination channel between threads. Threads share everything by default, which is what makes them cheap to start and dangerous to write.

31.2.2 Processes

import multiprocessing
def main():
    done = multiprocessing.Event()
    spinner = multiprocessing.Process(target=spin, args=("thinking", done))
    spinner.start()
    result = slow()
    done.set()
    spinner.join()
    return result

The API is almost identical to threading. The key differences: separate memory (shared state must go through multiprocessing.Queue / Value / Array), higher startup cost, no GIL contention.

31.2.3 `asyncio`

The asyncio version uses no threads and no processes — just one thread, one event loop, and two cooperating coroutines. Build it in two small steps.

Step 1: a single coroutine. async def declares a coroutine; await is the cooperative yield point; asyncio.run starts the event loop and waits for the result:

import asyncio

async def slow():
    await asyncio.sleep(3)
    return 42

asyncio.run(slow())

async def makes slow a coroutine function. Calling it returns a coroutine object — the body does not execute until the event loop awaits it.
await asyncio.sleep(3) yields control to the event loop for ~3 seconds, then resumes. Other coroutines could run during the wait.
asyncio.run(slow()) starts the loop, runs the coroutine to completion, returns the result. (In a Jupyter cell, which already has a loop running, write await slow() instead.)

Step 2: two coroutines concurrently, plus cancellation. A task is a coroutine scheduled to run on the loop. cancel() interrupts a running task by raising CancelledError inside it at the next await:

import itertools

async def spin(msg):
    for char in itertools.cycle(r"\|/-"):
        try:
            await asyncio.sleep(0.1)
        except asyncio.CancelledError:
            break

async def main():
    spinner = asyncio.create_task(spin("thinking"))
    result = await slow()
    spinner.cancel()
    return result

asyncio.run(main())

asyncio.create_task(spin(...)) schedules spin to run on the loop concurrently with main — main does not wait for it. await slow() then blocks main until slow finishes; while it waits, the event loop runs spin.
spinner.cancel() requests cancellation. At the next await inside spin, the loop raises asyncio.CancelledError instead of resuming. The try/except catches it to exit cleanly — forgetting the except would let the error propagate up.

The shape mirrors the threading version — start a worker, wait, stop it — but with create_task/cancel instead of Thread/Event, and await as the cooperative yield point instead of GIL-managed preemption.

31.3 The real impact of the GIL

Talk about the GIL is cheap; let’s measure it. We run the same CPU-bound work twice — once sequentially, once on two threads — and compare wall-clock time. If threads gave real parallelism, the threaded version would take half as long. They don’t, so it won’t.

import time, threading

def cpu_bound(n):
    count = 0
    for _ in range(n):
        count += 1
    return count

# Sequential
t0 = time.perf_counter()
cpu_bound(5_000_000)
cpu_bound(5_000_000)
seq = time.perf_counter() - t0

# Two threads
t0 = time.perf_counter()
t1 = threading.Thread(target=cpu_bound, args=(5_000_000,))
t2 = threading.Thread(target=cpu_bound, args=(5_000_000,))
t1.start(); t2.start()
t1.join(); t2.join()
threaded = time.perf_counter() - t0

f"sequential: {seq:.3f}s, two threads: {threaded:.3f}s"

'sequential: 0.427s, two threads: 0.429s'

Walking through the benchmark:

cpu_bound(n) is pure-Python arithmetic — no I/O, no sleep, just a loop. That makes it 100% CPU work, exactly the workload the GIL serializes.
time.perf_counter() is the right clock for benchmarking — high resolution, monotonic.
The sequential block runs cpu_bound twice in a row on the main thread.
The threaded block hands the same two calls to two threads. They share memory and the GIL — only one thread executes Python bytecode at a time, so the work serializes.
t1.start(); t2.start(); t1.join(); t2.join() is the spawn-and-wait pattern: start both, then wait for both.

Two threads doing CPU work take about the same time as the sequential version — the GIL serializes them. For real parallelism on CPU work you need processes.

The general pattern: threads overlap I/O waits (the GIL is released during blocking I/O), not CPU work. If your workload is CPU-bound, threads buy you nothing — reach for multiprocessing instead.

(Python 3.12 introduced per-interpreter GILs (PEP 684) and 3.13 added the experimental interpreters module — true Python parallelism in one process is on the horizon, but tooling is still maturing.)

flowchart LR
  subgraph T["Threading"]
    T1["shared memory"] --> T2["OS-scheduled"]
    T2 --> T3["GIL: serial bytecode"]
    T3 --> T4["best for I/O-bound"]
  end
  subgraph P["Multiprocessing"]
    P1["separate memory"] --> P2["OS-scheduled"]
    P2 --> P3["no GIL contention"]
    P3 --> P4["best for CPU-bound"]
  end
  subgraph A["Asyncio"]
    A1["one thread"] --> A2["cooperative await"]
    A2 --> A3["GIL irrelevant"]
    A3 --> A4["best for I/O at scale"]
  end

The picture flips for I/O-bound work. A thread blocked on a network read releases the GIL. So two threads each waiting on a download do progress in parallel — not because Python runs in parallel, but because the OS does the waiting and only one thread is ever running Python.

Why this matters

Choose the right concurrency model:

I/O-bound + simple: threading (threads release GIL during I/O).
I/O-bound + scale: asyncio (one thread, thousands of concurrent ops).
CPU-bound: multiprocessing (real parallelism, bypasses GIL).

asyncio is not for CPU work. Threads are not for CPU work. multiprocessing has overhead — worth it only for truly heavy computation.

31.4 Build: I/O-bound vs CPU-bound, measured

Talk about the GIL is cheap; the chapter showed two threads on a CPU-bound loop not speeding up. Now we’ll see the flip side: two threads on simulated I/O do speed up, because the GIL releases during the wait. Same shape — sequential baseline, threaded version, wall-clock comparison — applied to both kinds of workload.

Step 1: a sequential baseline for I/O-bound work. time.sleep is the canonical “fake I/O” — it releases the GIL while waiting, exactly like a network read or a disk read would:

import time

def slow_io(item):
    time.sleep(0.1)                      # simulates an I/O wait
    return item.upper()

items = ["alpha", "beta", "gamma", "delta", "epsilon"]

t0 = time.perf_counter()
sequential = [slow_io(item) for item in items]
seq_time = time.perf_counter() - t0
[sequential, round(seq_time, 2)]

[['ALPHA', 'BETA', 'GAMMA', 'DELTA', 'EPSILON'], 0.5]

Five items at 0.1s each = ~0.5s. Each call blocks the main thread until the sleep completes; the program is mostly waiting, not computing. That’s the situation threads can help with.

Step 2: fan out via threads. Spawn one thread per item; each thread blocks on its own time.sleep. While one thread waits, the OS schedules another — and because the GIL releases during time.sleep, the threads make progress concurrently:

import threading

results: list = [None] * len(items)

def worker(i, item):
    results[i] = slow_io(item)

t0 = time.perf_counter()
threads = [threading.Thread(target=worker, args=(i, item))
           for i, item in enumerate(items)]
for t in threads: t.start()
for t in threads: t.join()
threaded_time = time.perf_counter() - t0
[results, round(threaded_time, 2), round(seq_time / threaded_time, 1)]

[['ALPHA', 'BETA', 'GAMMA', 'DELTA', 'EPSILON'], 0.1, 4.9]

results = [None] * len(items) pre-allocates the result list so each worker can write to its own slot without locking — distinct indices don’t race. Five threads each sleep 0.1s in parallel (the OS handles the wait, no Python bytecode to serialize), so the total drops to roughly 0.1s. The speedup is close to seq_time / threaded_time = N.

Step 3: the same shape on CPU-bound work — and watch the speedup vanish. Replace the sleep with arithmetic. Now every thread holds the GIL while it runs, and the OS can’t overlap them:

def cpu_bound(n):
    count = 0
    for _ in range(n):
        count += 1
    return count

# Sequential
t0 = time.perf_counter()
cpu_bound(5_000_000)
cpu_bound(5_000_000)
cpu_seq = time.perf_counter() - t0

# Two threads
t0 = time.perf_counter()
threads = [threading.Thread(target=cpu_bound, args=(5_000_000,)) for _ in range(2)]
for t in threads: t.start()
for t in threads: t.join()
cpu_threaded = time.perf_counter() - t0

[round(cpu_seq, 2), round(cpu_threaded, 2),
 f"speedup: {cpu_seq / cpu_threaded:.2f}x (expect ~1x)"]

[0.43, 0.45, 'speedup: 0.95x (expect ~1x)']

Same code shape, opposite result. cpu_bound is pure Python arithmetic — no sleep, no I/O, the GIL is always held by whichever thread is currently executing. So the two threads serialize on the GIL and the wall-clock time barely changes. The “speedup” hovers at 1x; sometimes it’s worse than 1x because of context-switching overhead.

The build makes the chapter’s central rule visible in numbers: threads buy you concurrency on I/O-bound work (because the GIL releases during the wait) and buy you nothing on CPU-bound work (because the GIL serializes the bytecode). For real CPU parallelism you’d reach for multiprocessing (or concurrent.futures.ProcessPoolExecutor, the next chapter).

31.5 Exercises

Two CPU-bound threads. Reproduce the GIL benchmark above with n = 50_000_000. Time the sequential and threaded versions. Now do the same with multiprocessing.Process — does the time halve?
Two I/O-bound threads. Replace cpu_bound with time.sleep(1) and run two threads. The total wall time should be about 1 second, not 2. Why?
asyncio shape. In the spinner example, what would happen if slow did not await anything (e.g., time.sleep(3) instead of asyncio.sleep)? Predict and explain.
Pick the model. For each of the following workloads, name the right concurrency model: (a) crawl 10,000 URLs; (b) compute SHA-256 of 100 million keys; (c) animate a UI while saving a file; (d) run 5 ML models on the same input.
Cancellation. In the asyncio spinner, spinner.cancel() is the only way to stop the spinner. What happens if you forget to call it?

31.6 Summary

Concurrency is about structure; parallelism is about execution. Python’s three models map cleanly to three workload shapes: threads for simple I/O, asyncio for I/O at scale, processes for CPU. The GIL is the constraint that determines the boundaries.

Next, Chapter 32 covers concurrent.futures — the high-level pool API that hides almost all of the locking and process management for both threads and processes.