14  An Array of Sequences

NoteCore idea

Python sequences are deeply unified. Once you understand the one taxonomy (container vs flat, mutable vs immutable), the whole chapter of sequence features clicks into place.

In this chapter you will learn to:

  1. Place every Python sequence type into the two-by-two taxonomy of container/flat and mutable/immutable.
  2. Write list comprehensions and generator expressions, and pick between them based on memory.
  3. Use tuples as records — and unpack them, including with *rest patterns.
  4. Use structural pattern matching to dispatch on the shape of a sequence.
  5. Slice, including with named slice objects, and assign through slices.
  6. Avoid the classic [[0]*3]*3 aliasing bug.
  7. Choose between list, tuple, array.array, deque, and memoryview for a given task.

14.1 The sequence taxonomy

Python’s sequence types fall into a small grid. Memorize this grid and the rest of the chapter is consequences.

Container (any type, holds references) Flat (one type, holds values)
Mutable list, collections.deque bytearray, array.array
Immutable tuple str, bytes

A container sequence holds references to objects of any type — the list [1, "two", 3.0] is a list of three pointers. A flat sequence packs values directly into contiguous memory, so it can only hold one type, but it’s smaller and faster.

The mutable/immutable split decides which methods exist. list.append, bytearray.extend, deque.appendleft exist on the mutable types and not the immutable ones — that’s the only difference at the API level.

memoryview is the odd one out: a view (zero-copy) into the bytes of any object that exposes the buffer protocol. We’ll see it at the end of this chapter.

14.2 List comprehensions and generator expressions

Comprehensions are the most-recommended way to build a list, set, or dict from another iterable. They are expressions, so they return a value; they are deterministic, so they don’t produce side effects you have to reason about.

symbols = "$¢£¥€¤"
codes = [ord(symbol) for symbol in symbols]
codes
[36, 162, 163, 165, 8364, 164]

ord(c) is a Python built-in that takes a one-character string and returns its Unicode code point — the integer that uniquely identifies that character (ord("$") is 36, ord("€") is 8364). Its inverse is chr(n). Code points are the topic of Chapter 16; here we just use ord to get a number from each symbol.

The list comprehension reads almost as English: “for every symbol in symbols, give me its ord.” The output is a list. Filter with if:

beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
beyond_ascii
[162, 163, 165, 8364, 164]

Cartesian product comes for free with two for clauses:

colors = ["black", "white"]
sizes  = ["S", "M", "L"]
tshirts = [(color, size) for color in colors for size in sizes]
tshirts
[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

Replace the [ ] with ( ) and you get a generator expression — the same syntax, but the result is lazy. Nothing is computed until you iterate.

for tshirt in (f"{c} {s}" for c in colors for s in sizes):
    print(tshirt)
black S
black M
black L
white S
white M
white L

The genexp form is what you reach for when the intermediate list is large or when only one pass is needed. Almost every Python constructor accepts an iterable, so a genexp can flow straight into one. As an example: array.array is a flat sequence type from the standard library that packs a single numeric type into contiguous memory — leaner and faster than a list for large numeric data (we cover it properly in Section 14.8). The first argument "I" is a type code"I" means unsigned 32-bit int:

import array
arr = array.array("I", (ord(s) for s in symbols))
arr
array('I', [36, 162, 163, 165, 8364, 164])

The genexp is consumed directly by the array.array constructor — no intermediate list is built. That’s the saving: with a list comprehension, you’d materialize all six integers in a Python list, then copy them into the array. The genexp streams them in one at a time.

TipWhen to use a comprehension

If the result is the value you want, use a comprehension or genexp. If the loop is for side effects (writing to a database, printing), use a regular for. The two read differently for a reason.

14.3 Tuples are not just immutable lists

tuple plays two roles in Python: an immutable list, and a record in which position carries meaning.

lax_coordinates = (33.9425, -118.408056)
city, year, pop, chg, area = ("Tokyo", 2003, 32_450, 0.66, 8014)
year, pop
(2003, 32450)

Notice the second line: a tuple on the right, a comma-separated list of names on the left, and Python unpacks element-wise. This is the same operation as the famous swap idiom:

a, b = 1, 2
a, b = b, a
a, b
(2, 1)

Unpacking can nest:

metro_areas = [
    ("Tokyo", "JP", 36.933, (35.689722, 139.691667)),
    ("Delhi NCR", "IN", 21.935, (28.613889, 77.208889)),
]
for name, _, _, (lat, lon) in metro_areas:
    print(f"{name:15} | {lat:9.4f} | {lon:9.4f}")
Tokyo           |   35.6897 |  139.6917
Delhi NCR       |   28.6139 |   77.2089

The _ is a conventional throwaway. The (lat, lon) on the left destructures the inner tuple in the same step.

* collects excess items, on either side:

a, b, *rest = range(5)
a, b, rest
(0, 1, [2, 3, 4])
first, *body, last = range(5)
first, body, last
(0, [1, 2, 3], 4)

14.4 Pattern matching with sequences

Since Python 3.10, match/case lets you dispatch on the shape of a value rather than checking it with a chain of if isinstance. For sequences:

def handle_command(message):
    match message:
        case ["QUIT"]:
            return "Quitting"
        case ["GO", direction]:
            return f"Going {direction}"
        case ["GET", obj, location]:
            return f"Getting {obj} from {location}"
        case ["DROP", *objects]:
            return f"Dropping: {objects}"
        case _:
            return f"Unknown: {message!r}"

print(handle_command(["GO", "north"]))
print(handle_command(["DROP", "ax", "bow"]))
print(handle_command(["WHAT"]))
Going north
Dropping: ['ax', 'bow']
Unknown: ['WHAT']

Walking through the cases:

  • match message: evaluates message once, then tries each case top-to-bottom until one matches.
  • case ["QUIT"]: matches a one-element sequence whose single element equals the literal string "QUIT". Literals in patterns are matched by equality.
  • case ["GO", direction]: matches any two-element sequence whose first element equals "GO". The name direction is a binding — it captures the second element so the body can use it.
  • case ["GET", obj, location]: does the same for a three-element sequence — two captures.
  • case ["DROP", *objects]: is the variable-length case. *objects collects every element after "DROP" into a list. So ["DROP", "ax", "bow"] gives objects == ["ax", "bow"].
  • case _: is the catch-all wildcard. _ matches anything and doesn’t bind.

The general rule: each case is a pattern, not an expression — literals match by ==, bare names bind, *name collects the rest, and _ is the wildcard. We will see pattern matching applied to mappings in Chapter 15 and to classes in Chapter 30.

14.5 Slicing

Why is the stop index of a slice exclusive? Because then stop - start is the length of the slice. That single rule explains every slicing convention:

l = [10, 20, 30, 40, 50]
l[:2], l[2:], l[1:4]
([10, 20], [30, 40, 50], [20, 30, 40])

Three different elisions of the [start:stop] form:

  • l[:2] is [10, 20] — start defaults to 0, so this is “the first two elements.” The exclusive stop means stop - start = 2 - 0 = 2, exactly the length of the result.
  • l[2:] is [30, 40, 50] — stop defaults to len(l), so this is “everything from index 2 onward.” Three elements, because len(l) - 2 = 3.
  • l[1:4] is [20, 30, 40] — explicit start and stop. Length is 4 - 1 = 3. The exclusive-stop rule keeps the math trivial.

That length-equals-stop-start invariant is why slices feel right: l[:k] + l[k:] always gives back the original list, with no off-by-one to track.

slice objects let you give a slice a name:

invoice = """\
0.....6.................................40........52...55........
1909  Pimoroni PiBrella                     $17.50    3    $52.50
1489  6MM Tactile Switch x20                 $4.95    2     $9.90
"""
SKU   = slice(0, 6)
DESC  = slice(6, 40)
UNIT  = slice(40, 52)
QTY   = slice(52, 55)
PRICE = slice(55, None)

for item in invoice.split("\n")[1:-1]:
    print(item[DESC].rstrip(), "|", item[UNIT].strip(), "|", item[QTY].strip())
Pimoroni PiBrella | $17.50 | 3
6MM Tactile Switch x20 | $4.95 | 2

Mutable sequences accept assignment through a slice — the lengths don’t have to match:

l = list(range(10))
l[2:5] = [20, 30]
l
[0, 1, 20, 30, 5, 6, 7, 8, 9]

l[2:5] selects three elements ([2, 3, 4]), and we assigned a two-element list to them. The slice is replaced by the new contents — three slots become two — and the rest of the list shifts left to fill the gap. Output: [0, 1, 20, 30, 5, 6, 7, 8, 9]. Same shape works the other way: assigning a longer list grows the original. This is how list.extend(other) is really implemented under the hood — l[len(l):] = other does the same thing.

del l[5:7]
l
[0, 1, 20, 30, 5, 8, 9]

del l[5:7] removes elements at indices 5 and 6 (the slice is two-wide); everything past index 6 shifts left. Equivalent to l[5:7] = [] — assigning an empty iterable to the slice deletes it.

14.6 + and * on sequences

+ concatenates and returns a new sequence. * repeats.

[1, 2] + [3, 4], [0] * 5, "abc" * 3
([1, 2, 3, 4], [0, 0, 0, 0, 0], 'abcabcabc')

Three operations on three sequence types:

  • [1, 2] + [3, 4] — list concatenation. Returns a new list [1, 2, 3, 4]; both inputs are unchanged. (For lists, += mutates in place via __iadd__; plain + always returns a new list.)
  • [0] * 5 — list repetition. Returns [0, 0, 0, 0, 0]. Useful for pre-allocating a fixed-size list, e.g. results = [None] * len(items) so each worker can write to its own slot.
  • "abc" * 3 — string repetition. Returns "abcabcabc". Common idiom: "=" * 80 for a separator line in console output.

The classic gotcha is using * on a list of mutables:

board = [["_"] * 3] * 3
board[0][1] = "X"
board
[['_', 'X', '_'], ['_', 'X', '_'], ['_', 'X', '_']]

Walking through what just happened:

  • ["_"] * 3 builds one list: ["_", "_", "_"]. Strings are immutable, so the three slots point to the same "_" object — harmless.
  • The outer * 3 then takes that one list and makes three references to it. The result is [L, L, L] where every L is the same object.
  • board[0][1] = "X" mutates that single shared list. Every row is just a label on it, so all three rows show the change.

The fix is a comprehension, which builds a fresh inner list per iteration:

board = [["_"] * 3 for _ in range(3)]
board[0][1] = "X"
board
[['_', 'X', '_'], ['_', '_', '_'], ['_', '_', '_']]

The general rule: [expr] * n repeats the reference, not the object — safe for immutables, dangerous for anything you might mutate. Use a comprehension when the inner element is mutable. This is the first hint of the references-vs-values story we tell in full in Chapter 18.

14.7 list.sort versus sorted

There are two sort APIs in Python and they answer different questions.

fruits = ["grape", "raspberry", "apple", "banana"]
sorted(fruits, key=len)
['grape', 'apple', 'banana', 'raspberry']

sorted takes any iterable and returns a new list. list.sort sorts in place and returns None:

fruits.sort()
fruits
['apple', 'banana', 'grape', 'raspberry']

The None return value of list.sort is intentional — it’s how Python signals “this method mutates.” Compare with dict.update and list.append, which also return None.

For sorted lists, bisect is the binary-search tool:

import bisect
haystack = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
bisect.bisect(haystack, 14)
6

bisect.insort inserts and keeps the list sorted in O(n) (the search is O(log n); the insert is the slow part):

bisect.insort(haystack, 14)
haystack
[1, 4, 5, 6, 8, 12, 14, 15, 20, 21, 23, 23, 26, 29, 30]

14.8 When a list is not the answer

list is the default sequence, and it’s almost always fine. Three exceptions are worth knowing.

array.array for large flat numeric data. Stores raw C doubles or ints in contiguous memory:

import array, random
floats = array.array("d", (random.random() for _ in range(10)))
floats
array('d', [0.17165551172323512, 0.9427877682762817, 0.959683597270724, 0.07764839278078661, 0.832783262526776, 0.6679315105008564, 0.7202049110504996, 0.5410483641363905, 0.46179809628269064, 0.42339563921856516])

The first argument ("d") is the type code — a single character that picks the C type each slot holds. The common ones: "b"/"B" (signed/unsigned byte), "h"/"H" (short, 2 bytes), "i"/"I" (int, 4 bytes), "l"/"L" (long, 4 or 8), "f"/"d" (float/double, 4/8 bytes). All elements of one array share that type — that’s the price of contiguous storage. For ten million floats, this uses about a quarter of the memory a list of float would.

collections.deque for double-ended queues. Constant-time append and pop on both ends, and an optional maxlen:

from collections import deque
dq = deque(range(10), maxlen=10)
dq.appendleft(-1)
dq
deque([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8], maxlen=10)

Because maxlen=10 is set, appending at the front evicted the rightmost element.

memoryview for zero-copy byte slicing. A view into the buffer of another object:

import array
numbers = array.array("h", [-2, -1, 0, 1, 2])
memv = memoryview(numbers)
memv_oct = memv.cast("B")
memv_oct.tolist()
[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]

Walking through what each line does:

  • array.array("h", [...]) allocates a flat block of signed short integers ("h" = 2-byte signed). Five h values take 10 bytes of contiguous memory.
  • memoryview(numbers) wraps that buffer in a view — no copy, just a handle to the same memory.
  • memv.cast("B") reinterprets the same bytes as unsigned 8-bit ints ("B"). The 10 bytes that were “5 shorts” are now “10 bytes”, with no allocation.
  • tolist() materializes them so we can inspect — [254, 255, 255, 255, 0, 0, 1, 0, 2, 0] shows the little-endian layout of the underlying shorts.

The general rule: a memoryview is the cheap way to look at the same bytes through a different lens — useful for parsing binary formats and slicing without copying.

TipWhy this matters

Python’s sequence uniformity is not an accident — it’s the data model at work. Implement __len__ and __getitem__ and your class gains slicing, iteration, reversed(), and the in operator for free.

14.9 Build: a flexible scoreboard parser

A small problem that uses every tool from the chapter: read a list of lines like "Alice,95" or "Bob,87,senior,promoted", normalise each to {name, score, tags}, and rank the top two. Comprehensions to parse, *rest unpacking and match/case to classify shape, sorted + slice to rank.

Step 1: turn each line into a list of fields. Split on the comma, lowercase nothing, leave types as strings for now:

raw = """\
Alice,95
Bob,87,senior
Carol,92,junior
Dave,80,junior,intern
"""

rows = [line.split(",") for line in raw.strip().splitlines()]
rows
[['Alice', '95'],
 ['Bob', '87', 'senior'],
 ['Carol', '92', 'junior'],
 ['Dave', '80', 'junior', 'intern']]

A list comprehension reads “for every line in the splitlines, give me the comma-split list.” The result is a list of variable-length lists of strings — exactly the messy data pattern match/case was designed for.

Step 2: classify each row’s shape with match/case. The valid shapes are 2, 3, or 4 fields. Use a sequence pattern that captures name, score, and the variable-length tail with *tags:

def normalise(row):
    match row:
        case [name, score]:
            return {"name": name, "score": int(score), "tags": []}
        case [name, score, *tags]:
            return {"name": name, "score": int(score), "tags": list(tags)}
        case _:
            return None

records = [r for r in (normalise(row) for row in rows) if r is not None]
records
[{'name': 'Alice', 'score': 95, 'tags': []},
 {'name': 'Bob', 'score': 87, 'tags': ['senior']},
 {'name': 'Carol', 'score': 92, 'tags': ['junior']},
 {'name': 'Dave', 'score': 80, 'tags': ['junior', 'intern']}]

Two patterns cover every well-formed row. [name, score] matches the two-field case exactly; [name, score, *tags] matches three-or-more, collecting the tail into a list (this is where *rest from the unpacking section earns its place inside match). The wildcard case _: returns None for malformed rows; the comprehension filter if r is not None drops them. Notice we also pulled the int(score) conversion into the patterns so callers see typed records.

Step 3: rank the top two. sorted plus a key plus a slice — all three from the chapter:

def top(records, n):
    by_score = sorted(records, key=lambda r: r["score"], reverse=True)
    return by_score[:n]                             # slice the head off

top(records, 2)
[{'name': 'Alice', 'score': 95, 'tags': []},
 {'name': 'Carol', 'score': 92, 'tags': ['junior']}]

sorted(..., key=lambda r: r["score"], reverse=True) returns a new list (it doesn’t mutate records — the sort vs sorted distinction). The slice [:n] takes the first n — the highest scores, since we sorted descending. With n larger than len(records) the slice quietly returns the whole list, which is the right behaviour: no off-by-one to worry about.

The build threads the chapter together: a list comprehension splits the input (sequences as values), match/case patterns destructure each row by shape with *rest for the tail, and sorted + a slice rank the result. Same recipe scales unchanged to a 100,000-line file — the comprehension and slice are eager, but you’d swap them for a generator + heapq.nlargest and the rest of the code is unchanged.

14.10 Exercises

  1. Cartesian product, two ways. Rewrite the t-shirt example using itertools.product. Which version is more readable for two iterables? For five?

  2. Spot the bug. What does [[0] * 3] * 3 print after you do m[1][1] = 1? Why? How does the list comprehension version differ?

  3. A tiny match. Extend handle_command to accept ["LOOK"] (returns "Looking") and ["LOOK", "AT", target] (returns f"Looking at {target}"). Does the order of case clauses matter?

  4. sort vs sorted. Given nums = [3, 1, 4, 1, 5, 9, 2, 6], write two one-liners: one that prints the sorted result without mutating nums, and one that mutates nums to be sorted. Which returns None?

  5. memoryview semantics. With arr = array.array("h", [1, 2, 3]) and mv = memoryview(arr), predict what happens to arr if you do mv[0] = 100. Verify.

14.11 Summary

Sequences are unified through a small set of dunder methods, but they cleave along two practical axes: container vs. flat, and mutable vs. immutable. Comprehensions and generator expressions are the canonical builders; pattern matching is the canonical destructurer. The next chapter, Chapter 15, turns to unordered collections — and explains why every Python program is, deep down, full of dictionaries.