3  Strings and Text

NoteCore idea

Strings in Python are immutable Unicode sequences. f-strings are the modern way to format. Slicing, joining, and the standard methods cover almost every text task you’ll meet.

In this chapter you will learn to:

  1. Create strings with single, double, triple, and raw quoting — and choose the right one.
  2. Format values with f-strings and the {value:spec} mini-language.
  3. Index, slice, search, and split strings.
  4. Use .join() to combine many strings efficiently.

3.1 Creating strings

Python gives you four quoting forms because text comes in four shapes: short and clean, full of one quote character, multi-line, or full of backslashes. Picking the right form keeps the literal readable.

a = "hello"
b = 'hello'
c = """multi-
line string"""
d = r"C:\Users\Alice"   # raw string — backslashes are literal
[a, b, c, d]
['hello', 'hello', 'multi-\nline string', 'C:\\Users\\Alice']
  • "hello" and 'hello' are equivalent — Python doesn’t distinguish single and double quotes. Use whichever lets you avoid escaping ("don't" is cleaner than 'don\'t').
  • """...""" (triple-quoted) preserves embedded newlines, no \n needed. Used for docstrings and any multi-line literal.
  • r"C:\Users\Alice" is a raw string — the leading r switches off backslash escaping, so \U doesn’t try to be a Unicode escape. Essential for Windows paths and regex patterns.

The general rule: pick the form that minimises escapes. Single/double for one-liners, triple for multi-line, raw whenever you have many backslashes.

# Without raw, you'd write \\d+ to mean \d+
import re
pattern = r"\d+"
re.findall(pattern, "abc 12 def 345")
['12', '345']

re is Python’s regular-expression module (treated properly in Section 11.5); \d+ is the regex pattern “one or more digits”, and re.findall returns every non-overlapping match. The point here isn’t the regex itself — it’s that without the leading r, the string "\d+" would still work by accident (the \d happens not to be a known string escape), but r"\n" to mean a literal backslash-n would silently be interpreted as a newline. Raw strings remove that hazard.

3.2 f-strings

Building output from variables used to mean either string concatenation ("Hello, " + name + "!") or %-style formatting ("Hello, %s!" % name). Both get unwieldy fast. f-strings (“formatted string literals”, introduced in 3.6) put the expression inside the string with {...} — always prefer them.

name = "Alice"
score = 95.6789
[
    f"Hello, {name}!",
    f"Score: {score:.2f}",        # 2 decimal places
    f"Padded: {42:5d}",            # width 5
    f"Thousands: {1_000_000:,}",
    f"Debug: {name=}",             # shows name AND value
    f"Repr: {name!r}",             # shows quotes
    f"Expr: {2 + 2}",              # any expression
]
['Hello, Alice!',
 'Score: 95.68',
 'Padded:    42',
 'Thousands: 1,000,000',
 "Debug: name='Alice'",
 "Repr: 'Alice'",
 'Expr: 4']
  • The leading f turns the literal into an f-string; everything in {...} is evaluated as a Python expression.
  • {score:.2f} uses the format spec after the colon — .2f means “fixed-point, 2 decimals”.
  • {42:5d} pads to width 5 with leading spaces; {1_000_000:,} adds thousands separators.
  • {name=} (Python 3.8+) is the debug form — it prints both the expression text and its value, which is gold for logging.
  • {name!r} applies repr() to the value before formatting — adds the quotes you’d see in a debugger.
  • {2 + 2} shows the slot accepts any expression, not just a name.

The general rule: the slot is {expression[!conversion][:format_spec]}. The format spec is {value:[align][width][.precision][type]}:

[
    f"{score:>10.2f}",   # right-align, width 10, 2 decimals
    f"{score:<10.2f}",   # left-align
    f"{score:^10.2f}",   # center
    f"{255:x}",          # hex
    f"{255:b}",          # binary
]
['     95.68', '95.68     ', '  95.68   ', 'ff', '11111111']

Since Python 3.12 (PEP 701) the parser handles f-strings with the same quote character nested inside — and arbitrary expressions of any depth. You no longer need to swap quote styles to nest f-strings:

vals = [1.234, 5.678, 9.012]
f"{','.join([f'{x:.1f}' for x in vals])}"
'1.2,5.7,9.0'

The inner f-string and the outer f-string can both use single quotes, and the comprehension can span the expression slot freely. Pre-3.12 this was a syntax error.

3.3 Indexing and slicing

You constantly need to grab one character or a substring — the first character, the last, the file extension, the prefix. Python gives you indexing for one element and slicing for a contiguous range, both with one syntax.

s = "Hello, World!"
[s[0], s[-1], s[7:12], s[:5], s[::-1]]
['H', '!', 'World', 'Hello', '!dlroW ,olleH']
  • s[0] is the first character — strings are zero-indexed.
  • s[-1] is the last character — negative indices count back from the end, so you don’t need s[len(s) - 1].
  • s[7:12] is a slice: characters at positions 7 through 11 (the stop index is exclusive). Result: "World".
  • s[:5] omits the start, defaulting to 0 — the first 5 characters.
  • s[::-1] uses the optional step of -1 — walk the string backwards, which reverses it.

The general rule: s[start:stop:step]start defaults to 0, stop to len(s), step to 1. The deep treatment of slicing — and how it generalizes to all sequences — is in Chapter 14.

len(s) is the character count; in is a membership test:

[len(s), "World" in s, "world" in s]
[13, True, False]

3.4 Joining and splitting

You constantly need to glue a list of strings together with a separator — comma, space, newline, nothing — or do the reverse and chop one string into pieces. Concatenation works with +, but for many strings, .join() is dramatically faster — and more idiomatic:

words = ["Hello", "World", "Python"]
[" ".join(words), ", ".join(words), "".join(words)]
['Hello World Python', 'Hello, World, Python', 'HelloWorldPython']
  • " ".join(words) glues the list together with a single space between elements — the separator is the string .join is called on.
  • ", ".join(words) uses comma-space — the standard CSV-style separator.
  • "".join(words) uses no separator at all — pure concatenation.

The general rule: the separator string is the receiver, the iterable of strings is the argument. split() reverses it:

[
    "a,b,c".split(","),
    "  spaces   between   ".split(),   # default: split on any whitespace
    "line1\nline2\nline3".splitlines(),
]
[['a', 'b', 'c'], ['spaces', 'between'], ['line1', 'line2', 'line3']]

Three subtly different split shapes worth knowing:

  • "a,b,c".split(",") returns ['a', 'b', 'c'] — splits on the exact separator, even if it appears multiple times in a row ("a,,b".split(",") is ['a', '', 'b'], with an empty string in the middle).
  • " spaces between ".split() (no argument) is the whitespace-aware form: it collapses consecutive runs of any whitespace (spaces, tabs, newlines) and ignores leading/trailing whitespace entirely, so the result is ['spaces', 'between'] — no empty strings. This is the form you want for parsing words out of text.
  • "line1\nline2\nline3".splitlines() is the line-aware split — splits on \n, \r\n, \r, and other line terminators, and omits the trailing empty string that .split("\n") would leave on a string ending with a newline. The right choice for reading file contents into a list of lines.

The rule: never use += in a loop to build a string. Each += allocates a new string. Use .join(list_of_pieces) instead.

3.5 String methods

Almost every string task is built from a small set of methods: trim whitespace, change case, replace a substring, ask whether a string starts/ends/contains something. Strings are immutable, so every method returns a new string — none of these mutate in place.

s = "  Hello, World!  "
[
    s.strip(),
    s.lower(),
    s.upper(),
    s.replace("World", "Python"),
    s.startswith("  Hello"),
    s.find("World"),    # -1 if not found
    "abc123".isalnum(),
]
['Hello, World!',
 '  hello, world!  ',
 '  HELLO, WORLD!  ',
 '  Hello, Python!  ',
 True,
 9,
 True]
  • s.strip() returns a new string with leading/trailing whitespace removed; s itself is unchanged.
  • s.lower() and s.upper() return case-converted copies.
  • s.replace(old, new) swaps every occurrence — useful but be aware it’s substring-based, not word-based.
  • s.startswith(prefix) returns True/False.
  • s.find(sub) returns the first index of sub, or -1 if not found. (Use s.index(sub) if you’d rather it raise ValueError on a miss.)
  • "abc123".isalnum() returns True only if every character is alphanumeric.

The general rule: every method that “modifies” a string actually returns a new one — assign the result if you want to keep it. Useful predicates: .isdigit(), .isalpha(), .isalnum(), .isspace(), .startswith(prefix), .endswith(suffix).

[
    "  hello  ".strip(),
    "xxhelloxx".strip("x"),    # strip a custom set of chars
    "abc".count("a"),
    "abc".replace("a", "A").replace("b", "B"),
]
['hello', 'hello', 1, 'ABc']

Four useful variations on the basic methods:

  • " hello ".strip() — no argument, strips whitespace. Produces 'hello'.
  • "xxhelloxx".strip("x") — a one-char argument is a set of characters to strip from both ends. Not a substring: "abxabc".strip("ab") is "xabc", because strip keeps removing any of a/b until it hits a character not in the set (x). To strip a specific substring, use removeprefix/removesuffix (Python 3.9+) instead.
  • "abc".count("a") — counts non-overlapping occurrences. Returns 1 here (one a in "abc").
  • "abc".replace("a", "A").replace("b", "B") — chained calls each return a new string; the second .replace operates on the result of the first. Output: "ABc". Chaining is fine for two or three; for many substitutions, build a translation table with str.maketrans plus s.translate.

3.6 Escape sequences

Inside a non-raw string, \ introduces an escape:

[
    "line1\nline2",
    "tab\there",
    "quote: \"hi\"",
    "backslash: \\",
]
['line1\nline2', 'tab\there', 'quote: "hi"', 'backslash: \\']

Each item shows the four escapes you’ll meet daily:

  • "line1\nline2"\n is a single character, the newline (ASCII code point 10). The literal in source is two characters \ and n; Python turns that into one character at parse time.
  • "tab\there"\t is the tab character (code point 9). Same idea: two source characters become one runtime character.
  • "quote: \"hi\""\" lets you put a double-quote inside a double-quoted string. The alternative is to use single quotes around the whole string ('quote: "hi"') and avoid the escape entirely.
  • "backslash: \\" — to put a literal backslash in the string, you write \\. One backslash to escape, one as the actual character. So the runtime string ends with one backslash, not two.

Other escapes worth knowing: \xHH for a byte (or code point in a str) by hex, \uXXXX for a Unicode code point, \0 for the null character. A raw string (r"...") disables all of this — \n stays two characters — useful for paths, regex, and any string with many backslashes.

TipWhy this matters

String handling in Python is uniform: every string is Unicode, every method returns a new string (because strings are immutable), and f-strings give you a single, consistent way to format any value. The mini-language inside {value:spec} covers width, precision, alignment, and number bases — once learned, it replaces almost every other formatting tool.

3.7 Going deeper

This chapter treats str as a workable text type. The deeper questions — what is a Unicode code point, how do str and bytes differ, and what does normalization mean — are covered in Chapter 16. Slicing is a small example of the sequence protocol that runs through Chapter 14.

3.8 Build: a banner formatter

A common need in CLI tools and reports: print a title surrounded by a horizontal rule, like a section header. We’ll build banner(title) that exercises the chapter’s f-string format spec, string multiplication, and .join().

Step 1: a fixed-width banner. Take a title and frame it with = rules above and below, centered to a fixed width:

def banner(title, width=40, char="="):
    rule = char * width
    centered = f"{title:^{width}}"
    return f"{rule}\n{centered}\n{rule}"

print(banner("Sales Report Q1"))
========================================
            Sales Report Q1             
========================================

Two new tricks here. char * width multiplies a string — Python repeats it that many times, so "=" * 40 is a 40-character rule. And f"{title:^{width}}" is a nested format spec: the inner {width} is evaluated first (to 40), then the outer slot becomes f"{title:^40}" — center-align in 40 columns. ^ is centre, < is left, > is right.

Step 2: grow the width if the title is too long. A title longer than 40 characters would overflow the rule. Compute the actual width as the larger of the requested width and the title length plus a 4-character margin:

def banner(title, width=40, char="="):
    width = max(width, len(title) + 4)
    rule = char * width
    centered = f"{title:^{width}}"
    return f"{rule}\n{centered}\n{rule}"

print(banner("A very long sales report title that exceeds forty characters"))
================================================================
  A very long sales report title that exceeds forty characters  
================================================================

max(...) is a built-in that returns the largest of its arguments. + 4 leaves two spaces of margin on each side.

Step 3: optional subtitle. Allow a second line under the title, only when supplied. This is where .join() and the empty-string-is-falsy rule combine cleanly:

def banner(title, subtitle="", width=40, char="="):
    width = max(width, len(title) + 4, len(subtitle) + 4)
    rule = char * width
    rows = [rule, f"{title:^{width}}"]
    if subtitle:
        rows.append(f"{subtitle:^{width}}")
    rows.append(rule)
    return "\n".join(rows)

print(banner("Sales Report", subtitle="Q1 2024"))
print()
print(banner("No Subtitle"))
========================================
              Sales Report              
                Q1 2024                 
========================================

========================================
              No Subtitle               
========================================

if subtitle: is the truthy check — the empty string "" (the default) is falsy, so the row is skipped. "\n".join(rows) glues the lines with newlines: cheap, idiomatic, and avoids the += trap from earlier in the chapter.

The build exercises every section: quoting (the docstring-style """...""" would be the natural fourth step), f-string format spec with dynamic width, string multiplication, .join(), and truthiness on strings.

3.9 Exercises

  1. Format spec. Write an f-string that prints a float with three decimals, padded to 12 characters, right-aligned.

  2. Reverse with slicing. Write reverse(s) using only slicing.

  3. Title case the hard way. Implement your own title() that capitalizes the first letter of each word using .split(), capitalization, and .join().

  4. Count vowels. Count how many vowels appear in a string using a generator expression and sum().

  5. Why never +=? Time the difference between building a 100,000-piece string with += and with "".join(pieces). Use time.perf_counter().

3.10 Summary

Strings are immutable Unicode sequences with a deep but consistent method set. f-strings are the format mechanism; slicing, .join(), and .split() cover most text manipulation. The next chapter, Chapter 4, turns to the structures that make a program do things: if, for, while, and the loop-control keywords.