numbers = [1, 2, 3, 4, 5]
mixed = [1, "two", 3.0, None]
[numbers[0], numbers[-1], numbers[1:3], numbers[::-1]][1, 5, [2, 3], [5, 4, 3, 2, 1]]
Python has four built-in collection types. Each solves a different problem: list for ordered mutable sequences, tuple for fixed records, dict for key→value lookup, set for unique unordered values. Knowing which to reach for is the mark of a Python programmer.
In this chapter you will learn to:
A list is the default “ordered collection of things” in Python — when you have a sequence whose contents you’ll modify (add to, remove from, reorder), reach for a list. Lists are ordered, mutable, and accept any types.
numbers = [1, 2, 3, 4, 5]
mixed = [1, "two", 3.0, None]
[numbers[0], numbers[-1], numbers[1:3], numbers[::-1]][1, 5, [2, 3], [5, 4, 3, 2, 1]]
[1, 2, 3, 4, 5] is a list literal — square brackets, comma-separated.numbers[0] is the first element; numbers[-1] is the last (negative indices count from the end).numbers[1:3] is a slice — elements at positions 1 and 2 (stop is exclusive). Returns a new list.numbers[::-1] reverses with a step of -1, returning a new list.The mutation methods modify in place and return None:
xs = [1, 2, 3]
xs.append(4)
xs.insert(0, 0)
xs.extend([5, 6])
xs[0, 1, 2, 3, 4, 5, 6]
xs.append(4) adds 4 to the end. Mutates xs, returns None.xs.insert(0, 0) inserts 0 at position 0 — pushes existing items right.xs.extend([5, 6]) adds each element of the argument to the end — different from append, which would add the whole list as a single element.The general rule: list mutation methods return None. Never write xs = xs.append(4) — you’ll lose your list and bind xs to None.
Removal has three primitives, each with a different semantic:
xs.pop(), xs.pop(0)(6, 0)
xs.pop() (no argument) removes and returns the last element — the right tool for stack-style usage.xs.pop(0) removes and returns the first element — useful but slow for large lists, because every other element has to shift down. For frequent front-end pops, use collections.deque (chapter 11) which makes that O(1).0).xs.remove(3)
xs[1, 2, 4, 5]
xs.remove(value) removes the first occurrence of value by equality (==), not by index. Raises ValueError if value is absent.del xs[i] is faster (no scan).Sorting comes in two flavors — in-place (modifies the original) vs. returning a new list (leaves the original alone). Pick based on whether you still need the unsorted version:
xs = [3, 1, 4, 1, 5, 9, 2, 6]
sorted_new = sorted(xs) # new list; xs unchanged
xs.sort(reverse=True) # in-place; xs is now sorted
sorted_new, xs([1, 1, 2, 3, 4, 5, 6, 9], [9, 6, 5, 4, 3, 2, 1, 1])
sorted(xs) is the built-in function — works on any iterable, returns a fresh sorted list, leaves xs untouched.xs.sort(reverse=True) is the list method — mutates xs in place, returns None. reverse=True sorts descending.Sort with a key function — useful when the natural ordering isn’t what you want:
words = ["banana", "Apple", "cherry"]
sorted(words, key=str.lower)['Apple', 'banana', 'cherry']
"Apple" first because uppercase 'A' has a lower code point than lowercase 'b'.key=str.lower tells sorted to call str.lower(word) on each element and sort by that — a case-insensitive sort. The original strings are still in the result; only the comparison key is lowered.The general rule: sorted(...) returns, .sort() mutates; pass key= to sort by a derived value.
Lists are mutable — sharing is dangerous. b = a does not copy. Both labels point to the same list. Use a.copy() or list(a) for a shallow copy, or copy.deepcopy(a) for a deep one. The full treatment is in Chapter 18.
a = [1, 2, 3]
b = a
b.append(4)
a[1, 2, 3, 4]
The output [1, 2, 3, 4] — for both a and b — is the aliasing demo: b = a doesn’t copy the list, it makes b a second name for the same object. b.append(4) mutates that one list; reading a shows the mutation because there’s only one list to read. The fix when you want a separate copy is b = a.copy() (or the equivalent b = list(a)); for nested data, use copy.deepcopy.
When the items you’re grouping form a fixed record — an (x, y) coordinate, an RGB triple, a (name, age) pair — a list is the wrong choice: you don’t want anyone modifying the slots, and you’d like the structure to be hashable so it can serve as a dict key. Tuples are immutable lists for exactly this case.
point = (3, 4)
single = (42,) # trailing comma! (42) is just 42
empty = ()
point[0], single, empty(3, (42,), ())
(3, 4) is a 2-tuple. Indexing works exactly like a list (point[0] is 3).(42,) — the trailing comma is required. Without it, (42) is just the integer 42 in parentheses.() is the empty tuple..append, .pop, or .sort — they’re frozen at construction.The most powerful feature of tuples is unpacking — taking a tuple apart into named variables in one statement:
x, y = point
first, *rest = (1, 2, 3, 4, 5)
[x, y, first, rest][3, 4, 1, [2, 3, 4, 5]]
x, y = point matches the 2-tuple’s two slots to two names. After this, x is 3, y is 4.first, *rest = (1, 2, 3, 4, 5) uses the starred target — first binds to the first element, *rest collects everything else into a list [2, 3, 4, 5].head, *middle, tail = ....The general rule: any tuple (or list) can be unpacked into matching names; *name greedily collects the leftovers as a list. The full unpacking treatment is in Chapter 14.
Tuples can be dict keys (lists cannot, because they’re mutable and unhashable). A hashable object is one whose value can be reduced to a fixed integer summary (the hash) — that integer is what lets a dict or set find the slot for the key in O(1). Mutable objects like lists are deliberately unhashable: if their hash could change after insertion, lookups would silently fail. Tuples (with hashable contents) are hashable, hence valid keys:
locations = {(40.7, -74.0): "New York", (51.5, -0.1): "London"}
locations[(40.7, -74.0)]'New York'
The dict has two entries; each key is a 2-tuple of latitude and longitude. Looking up (40.7, -74.0) returns "New York" — and the lookup works because Python first hashes the tuple (which it can, because tuples-of-floats are hashable), uses that hash to find the bucket, and then does a == check against the candidate keys. Trying [40.7, -74.0] (a list) as a key would raise TypeError: unhashable type: 'list' — that’s the rule the previous paragraph just stated, made concrete.
When you need to look something up by name — a user record by username, a count by word, a config value by key — a list is wrong (linear scan) and a class is overkill. A dict maps keys to values with O(1) lookup, insertion, and deletion on average.
person = {"name": "Alice", "age": 30}
person["name"], person.get("missing"), person.get("missing", "N/A")('Alice', None, 'N/A')
{"name": "Alice", "age": 30} is a dict literal — curly braces, key: value pairs.person["name"] looks up "name" and returns "Alice" — the bracket syntax raises KeyError if the key is absent.person.get("missing") returns None instead of raising — useful when absence is expected.person.get("missing", "N/A") returns "N/A" as the fallback — a default of your choosing.The general rule: d[key] is loud (raises on miss); d.get(key, default) is quiet (returns the default). Modify with assignment (sets/updates), the del statement (removes a key — del d[k] is built-in syntax, not a method call), or .pop(key) (removes and returns the value):
person["city"] = "London"
del person["age"]
person.pop("city")
person{'name': 'Alice'}
Iterate keys, values, or pairs:
person = {"name": "Alice", "age": 30, "city": "London"}
list(person.keys()), list(person.values()), list(person.items())(['name', 'age', 'city'],
['Alice', 30, 'London'],
[('name', 'Alice'), ('age', 30), ('city', 'London')])
for key, value in person.items():
print(f"{key}: {value}")name: Alice
age: 30
city: London
Two helpers worth memorizing — both spare you the “check if the key exists, create the value if it doesn’t, then update it” boilerplate that comes up constantly:
from collections import defaultdict, Counter
groups = defaultdict(list)
for name, team in [("Alice", "eng"), ("Bob", "design"), ("Carol", "eng")]:
groups[team].append(name)
dict(groups){'eng': ['Alice', 'Carol'], 'design': ['Bob']}
defaultdict(list) is a dict that, on a missing-key lookup, auto-creates the value by calling list() (i.e. []).groups["eng"] is accessed, defaultdict creates an empty list and stores it; .append("Alice") then adds to that list.defaultdict you’d write if team not in groups: groups[team] = []; groups[team].append(name) every iteration.dict(groups) converts it back to a plain dict for display — purely cosmetic.text = "the cat sat on the mat".split()
Counter(text).most_common(2)[('the', 2), ('cat', 1)]
Counter(text) walks the iterable and counts how many times each element appears — {"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}..most_common(2) returns the top 2 most-frequent items as (item, count) tuples, sorted by count.The general rule: defaultdict(factory) for “missing keys auto-fill via factory()”; Counter for frequency tables. Both live in collections.
Two dicts can be merged with the union operator | (Python 3.9+). Right-hand keys win on conflict — the idiom for “defaults plus overrides”:
defaults = {"host": "localhost", "port": 5432, "ssl": False}
overrides = {"port": 6543, "ssl": True}
defaults | overrides{'host': 'localhost', 'port': 6543, 'ssl': True}
The deep dive — including hashing, dict ordering, and dict views — is in Chapter 15.
When you need to ask “have I seen this?” repeatedly, or take the intersection/union/difference of two collections, a list is the wrong tool — x in big_list is O(n). A set holds unique values with O(1) membership testing. Order is not preserved.
fruits = {"apple", "banana", "cherry"}
empty = set() # NOT {} — that's a dict
"apple" in fruits, "grape" in fruits(True, False)
{"apple", "banana", "cherry"} is a set literal — curly braces, no key:value pairs.set() (not {}) is the empty set — {} is the empty dict, which is the one common gotcha."apple" in fruits is the membership test — True, and constant-time regardless of set size.The set-algebra operators are pleasant — and exactly what you’d write on paper:
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
[a | b, a & b, a - b, a ^ b] # union, intersection, difference, symmetric difference[{1, 2, 3, 4, 5, 6}, {3, 4}, {1, 2}, {1, 2, 5, 6}]
a | b — union: everything in either set.a & b — intersection: only the items in both.a - b — difference: in a but not b.a ^ b — symmetric difference: in exactly one of the two, not both.The general rule: reach for a set when you need fast membership, deduplication, or the algebra above.
A common one-liner: deduplicate a list while preserving uniqueness (order is not preserved):
duplicates = [1, 2, 2, 3, 3, 3, 4]
list(set(duplicates))[1, 2, 3, 4]
If you need order-preserving deduplication, use dict.fromkeys(duplicates) (since Python 3.7, dict iteration order matches insertion order).
The “build an empty collection, loop, append” pattern is so common that Python has a one-line syntax for it. A comprehension is a compact, readable way to build a collection from another collection. There are four kinds: list, dict, set, and generator.
squares = [x ** 2 for x in range(10)]
evens = [x for x in range(20) if x % 2 == 0]
[squares, evens][[0, 1, 4, 9, 16, 25, 36, 49, 64, 81], [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]]
[x ** 2 for x in range(10)] reads “the list of x ** 2 for each x in range(10)” — equivalent to the four-line loop:
squares = []
for x in range(10):
squares.append(x ** 2)The general shape is [expression for variable in iterable] — the expression in front, the source loop behind.
[x for x in range(20) if x % 2 == 0] adds an if filter — only items satisfying the condition reach the output.
The same pattern with curly braces builds dicts and sets:
square_map = {x: x ** 2 for x in range(5)}
unique_lengths = {len(w) for w in "the cat sat on the mat".split()}
[square_map, unique_lengths][{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}, {2, 3}]
{x: x ** 2 for x in range(5)} is a dict comprehension — key: value instead of just an expression.{len(w) for w in ...} is a set comprehension — single expression, but in braces with no colon. Duplicates collapse: "the" appears twice in the source but 3 appears once in the result.The general rule: [...] builds a list, {k: v for ...} a dict, {...} a set; add if cond for filtering, change the expression for mapping.
A list comprehension materialises the whole list at once — fine for thousands of items, wasteful for millions. A generator expression is a comprehension in (...) instead of [...]. It does not build a list — it computes values on demand:
total = sum(x ** 2 for x in range(1_000_000))
total333332833333500000
sum(...) walks the generator, adding each squared value to a running total.The general rule: when the result is fed straight into a consumer like sum, max, any, or "".join, drop the brackets to a generator and save the intermediate list. Generator expressions are how you scale comprehensions to large or infinite data; the deep dive is in Chapter 29.
The walrus operator := (Python 3.8+) lets a comprehension capture a per-item value once and reuse it — useful when a filter and a map both need the same expensive computation:
import re
lines = ["score: 95", "noise", "score: 87", "noise", "score: 72"]
nums = [int(m.group(1)) for line in lines if (m := re.match(r"score: (\d+)", line))]
nums[95, 87, 72]
m is bound only when the regex matches; the body then uses m.group(1) without re-running the match. Without the walrus you’d either match twice or fall back to a regular for loop.
| Need | Use |
|---|---|
| Ordered, modifiable items | list |
| Ordered, fixed record | tuple |
| Lookup by key | dict |
Fast in test, dedupe, set algebra |
set |
| Counting occurrences | collections.Counter |
| Auto-create missing keys | collections.defaultdict |
| Fast append/pop both ends | collections.deque |
The choice of collection drives the complexity of your code. x in some_list is O(n); x in some_set is O(1). Looking up by key in a dict is O(1); searching a list of pairs is O(n). The right collection often turns “make this faster” into “rewrite the data structure” — and the rewrite is one line.
The full treatment of sequences (lists, tuples, slicing, comprehensions, the sequence protocol) is Chapter 14. The full treatment of dicts and sets — hashing, ordering, performance — is Chapter 15. Generator expressions and the iterator protocol they ride on are in Chapter 29.
A common starter problem in data work: given some text, report how many words there are, how many of those are unique, what the most common are, and which appeared exactly once. Every collection in this chapter — list, dict, set, Counter — earns a slot.
Step 1: tokenize and count. Lower-case the text, strip simple punctuation, split on whitespace, and feed the resulting list into Counter:
from collections import Counter
text = """The quick brown fox jumps over the lazy dog.
The dog sleeps. The fox watches."""
words = text.lower().replace(".", "").split()
freq = Counter(words)
freq.most_common(3)[('the', 4), ('fox', 2), ('dog', 2)]
text.lower() returns a new string with every letter lower-cased (so “The” and “the” merge); .replace(".", "") strips periods so “dog.” and “dog” merge too; .split() (no argument) splits on any whitespace including the embedded newline. Counter(words) walks the list once and returns a dict-subclass mapping each word to its count. .most_common(3) returns the three highest counts as (word, count) tuples — sorted descending.
Step 2: pull out the once-only words. A hapax legomenon (a word that appears exactly once) is interesting in linguistics and search-quality work. We can build it as a set comprehension over the Counter, since Counter is also a dict so .items() yields pairs:
hapax = {word for word, count in freq.items() if count == 1}
unique_words = set(words)
[len(unique_words), len(hapax)][10, 7]
set(words) is the simplest deduplication — turning the word list into a set drops duplicates in one step. The set comprehension {word for word, count in freq.items() if count == 1} filters Counter’s pairs and keeps only the words whose count is 1.
Step 3: assemble a report. Wrap everything in a function that returns a single dict — the kind of structure you’d render as JSON or pass to a template:
def analyze(text):
words = text.lower().replace(".", "").split()
freq = Counter(words)
return {
"total_words": len(words),
"unique_words": len(set(words)),
"top3": freq.most_common(3),
"hapax": sorted({w for w, c in freq.items() if c == 1}),
}
analyze(text){'total_words': 15,
'unique_words': 10,
'top3': [('the', 4), ('fox', 2), ('dog', 2)],
'hapax': ['brown', 'jumps', 'lazy', 'over', 'quick', 'sleeps', 'watches']}
sorted(set) returns a list — sets aren’t ordered, but the report’s reader probably expects deterministic output. Wrapping the set in sorted is the idiomatic conversion.
The build exercises every collection from the chapter: list (tokens), dict (the report), set (deduplication and one-shot words), tuple (Counter’s (word, count) pairs), and a comprehension in each variety we covered.
Dedupe preserving order. Write a one-liner that removes duplicates from a list while preserving the original order. (Hint: dict.fromkeys.)
Word frequency. Given a string of text, return the three most common words using Counter.
Group by first letter. Given a list of names, return a defaultdict(list) mapping each first letter to the names starting with it.
Comprehension trade-offs. Time sum(x*x for x in range(1_000_000)) against sum([x*x for x in range(1_000_000)]). Which is faster? Which uses less memory?
The {} trap. Predict type({}) and type(set()). Verify with type().
Lists, tuples, dicts, sets — and the comprehensions that build them — cover almost every collection task. The next chapter, Chapter 6, makes these collections flow through code by introducing functions: parameters, return values, scope, and a first taste of decorators and generators.