10 Modules and Packages

Core idea

A module is a .py file. A package is a directory with an __init__.py. The import statement finds, runs (once), and exposes a module as a namespace. Use import module for clarity, from module import name only when the prefix clutters the code, and never from module import * in real code.

In this chapter you will learn to:

Distinguish modules from packages and explain how import finds them.
Use the import, from, and as forms — and when each is appropriate.
Recognize and use the if __name__ == "__main__": idiom.
Lay out a real Python project’s directory structure.
Use a virtual environment to isolate dependencies.

This chapter has more operational detail than most — it’s the boundary between writing scripts and writing programs. Beazley’s Python Distilled (Ch. 8) is the companion reference; we cite specific sections where its treatment is canonical.

10.1 What `import` actually does

import math looks like one operation but is actually three: find the file, run it once, bind the result to a name. Knowing this is the difference between guessing why an import fails and reading the error.

When Python executes import module:

It searches a list of directories — sys.path — for module.py (or a module/ package directory).
It runs the file top to bottom, exactly once per process.
It binds the resulting module object to the name module in the current namespace.

import sys
sys.path[:3]   # first three search directories

['/opt/hostedtoolcache/Python/3.13.13/x64/lib/python313.zip',
 '/opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13',
 '/opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/lib-dynload']

sys.path is a regular list of strings — the search directories Python checks for imports, in order.
We slice the first three to keep the output short. The first entry is usually the directory of the entry-point script (or "" for an interactive session).

Subsequent import module calls find the already-loaded module in sys.modules and return it without re-running:

import sys
import math
"math" in sys.modules

True

sys.modules is a dict cache keyed by module name. Once a module is loaded, it lives here for the rest of the process.
The check shows math is already cached — earlier code (or Python startup) imported it. A second import math would consult the cache and return immediately, not re-execute math.py.

The general rule: a module runs exactly once per process; subsequent imports just rebind a name to the cached object. That “run once” property is why a top-level print("loading") in a module fires only the first time. It also means modules can hold singleton state (like a configured logger) safely.

10.2 Import styles

Python gives you four import forms because there are four reasonable ways to bring a name into scope: keep the namespace, alias the namespace, pull a specific name out, or alias that specific name. Each has a place — and a default to prefer.

import math                    # whole module as a namespace
math.sqrt(16)

4.0

import math binds the module to the name math. You access its contents via the math. prefix — the source of every symbol stays visible at the call site.

import math as m               # with an alias
m.pi

3.141592653589793

import math as m is the same, with a shorter local name. Used widely for libraries with long names — numpy as np, pandas as pd. Stick to the established convention so other readers recognise it.

from math import sqrt, pi      # specific names — no namespace prefix
sqrt(16), pi

(4.0, 3.141592653589793)

from math import sqrt, pi reaches into the module and binds sqrt and pi directly into the current namespace — no prefix needed.
Trade-off: cleaner call sites, but the source of sqrt is less obvious to a reader scanning the file.

from math import sqrt as sq    # specific name with alias
sq(16)

4.0

Combines the two: pull out a specific name and rename it. Useful when the original name clashes with something in your file.

The conventions:

Default to import module. It keeps the source of every name visible at the call site.
from module import name is fine when the prefix would clutter the code and the names are unambiguous (from pathlib import Path, from collections import Counter).
from module import * loads “everything public” — but in real code it makes name origins invisible. Avoid it.

Universal aliases everyone uses: import numpy as np, import pandas as pd, import matplotlib.pyplot as plt. Stick to the convention.

10.3 The `if name == "main":` idiom

Every Python file has a __name__ attribute. When the file is run directly, __name__ is "__main__". When it’s imported, __name__ is the module’s name.

# in mymodule.py:
def compute(x):
    return x * 2

def main():
    print(f"Result: {compute(21)}")

if __name__ == "__main__":
    main()

python3 mymodule.py → runs main().
import mymodule → loads the functions but does not run main().

Always structure scripts this way. It makes a file simultaneously importable as a library and runnable as a script. It’s also what testing tools rely on — they import your module without triggering its CLI.

10.4 Packages

A package is a directory tree with __init__.py files marking each subdirectory.

myapp/
    __init__.py
    core.py
    utils.py
    data/
        __init__.py
        reader.py
        writer.py

Importing from this layout:

import myapp.core
from myapp import utils
from myapp.data import reader
from myapp.data.reader import read_csv

Inside the package, relative imports use leading dots:

# inside myapp/core.py:
from . import utils                 # same package
from .data import reader            # sub-package
from ..other_pkg import something   # sibling package (rarely used)

Relative imports work only inside packages — never inside scripts. If you see from .foo import bar in a top-level script, it’ll fail.

__init__.py is run when the package is first imported. Use it to expose a clean API:

# myapp/__init__.py:
from .core import process
from .data.reader import read_csv

__all__ = ["process", "read_csv"]

Now consumers can write from myapp import process, read_csv without knowing the internal layout. The __all__ list does two things: it documents the public names of the package, and it controls what from myapp import * would actually import — only the names listed in __all__ come along. (Real code shouldn’t use import *, but __all__ is still the conventional declaration of what a package considers public.)

10.5 Project layout

For any project beyond a single file, start with this structure:

myproject/
    src/
        myapp/
            __init__.py
            main.py
            utils.py
            models.py
    tests/
        test_main.py
        test_utils.py
    pyproject.toml      # project config (PEP 518)
    README.md

The src/ layout prevents accidental imports of half-broken in-development code (it forces you to install the package, which catches missing files). pyproject.toml is the modern (PEP 518) standard for declaring project metadata, dependencies, and build configuration.

A minimal pyproject.toml:

[project]
name = "myapp"
version = "0.1.0"
dependencies = ["httpx", "pydantic"]

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

Since Python 3.11, tomllib reads TOML files from the standard library. That means a tool — your own script, a release helper, a CI step — can introspect the project’s own metadata without any third-party dependency:

import tomllib

pyproject = """
[project]
name = "myapp"
version = "0.1.0"
dependencies = ["httpx", "pydantic"]
"""

config = tomllib.loads(pyproject)
config["project"]["name"], config["project"]["version"], config["project"]["dependencies"]

('myapp', '0.1.0', ['httpx', 'pydantic'])

tomllib.loads(...) parses a TOML string and returns a nested dict (the trailing s mirrors json.loads).
The [project] table becomes a sub-dict; name/version/dependencies become its keys.
The result is plain Python data — strings, numbers, lists, dicts — so you index it like any dict.

The general rule: tomllib.loads(text) for strings, tomllib.load(f) for files (open in binary mode, open(path, "rb")). tomllib is read-only by design; for writing TOML you reach for tomli-w or tomlkit.

10.6 Virtual environments

A virtual environment is a per-project isolated Python installation. Without one, every pip install pollutes the system Python — and projects can’t have conflicting dependencies. Always use one.

# Create:
python3 -m venv .venv

# Activate (bash/zsh):
source .venv/bin/activate

# Activate (Windows):
.venv\Scripts\activate

# Install dependencies:
pip install httpx pandas

# Freeze:
pip freeze > requirements.txt

Modern alternatives — uv, poetry, pipx, hatch — handle this for you with one command. They all create venvs under the hood.

Why this matters

The import system is what turns a folder of .py files into a library. Get the structure right at the start (src layout, __init__.py files, if __name__ == "__main__":, virtual environment) and the project scales without churn. Get it wrong and every refactor becomes an import-path treasure hunt.

10.7 Build: a tiny package, on disk and live

Reading about __init__.py and sys.path is one thing; seeing them turn three files into an importable namespace is another. We’ll lay out a tiny geom package in a temp directory, import from it, and then refactor the package’s public surface with __init__.py.

Step 1: create the package layout on disk. Use pathlib.Path (from chapter 8) to write three files: a marker __init__.py plus two submodules:

import tempfile
from pathlib import Path

root = Path(tempfile.mkdtemp())
pkg = root / "geom"
pkg.mkdir()

(pkg / "__init__.py").write_text("")          # marker — empty for now
(pkg / "circle.py").write_text(
    "import math\n"
    "def area(r): return math.pi * r ** 2\n"
)
(pkg / "rectangle.py").write_text(
    "def area(w, h): return w * h\n"
)

sorted(p.name for p in pkg.iterdir())

['__init__.py', 'circle.py', 'rectangle.py']

A package is just a directory with __init__.py plus normal .py files. The __init__.py can be empty — its presence alone is the marker that says “this directory is a package.”

Step 2: put the package on sys.path and import. Python only finds modules that live on its search path. Inserting root (the temp directory containing geom) makes geom importable:

import sys
sys.path.insert(0, str(root))

import geom.circle
import geom.rectangle

[geom.circle.area(5), geom.rectangle.area(3, 4)]

[78.53981633974483, 12]

sys.path.insert(0, str(root)) puts our temp directory at the front of the search list — Python finds geom there before checking elsewhere. import geom.circle does the three-step dance from earlier in the chapter: find geom/circle.py on sys.path, run it once (caching it in sys.modules), and bind geom.circle as a name. The two submodules expose unrelated area functions; the package prefix keeps them apart.

Step 3: flatten the public API with __init__.py. Asking callers to type geom.rectangle.area(3, 4) is fine until you have a dozen submodules. Rewrite __init__.py to re-export the names you want to be the public surface:

import importlib

(pkg / "__init__.py").write_text(
    "from .circle import area as circle_area\n"
    "from .rectangle import area as rectangle_area\n"
    '__all__ = ["circle_area", "rectangle_area"]\n'
)

import geom
importlib.reload(geom)
[geom.circle_area(5), geom.rectangle_area(3, 4), geom.__all__]

[78.53981633974483, 12, ['circle_area', 'rectangle_area']]

from .circle import area as circle_area is the relative import we covered earlier — .circle means “the circle module in this same package.” We rename area to circle_area and rectangle_area so both can live at the top level without colliding. __all__ declares them as the public names.

importlib.reload(geom) is the cell-friendly cousin of “restart the interpreter” — Python had already cached the empty __init__.py from step 2, so without reload the new contents wouldn’t take effect. In a fresh process you’d never need this; here it lets us iterate without restarting.

The build makes every operational concept from the chapter visible at once: the package marker, sys.path, the run-once cache, relative imports, __init__.py as the API surface, and __all__ as the public declaration. Same recipe scales unchanged from this 3-file demo to a 100-file project.

10.8 Exercises

Top-level print. Add print("loading mymodule") at the top of a module. Import it twice in a single script. How many lines print? Why?
Dual-mode file. Write square.py with a square(x) function and a main() that prints square(int(sys.argv[1])). Use if __name__ == "__main__": so python3 square.py 7 works and from square import square works.
Package with __init__.py. Create a directory mathx/ with __init__.py, linear.py (functions dot, norm), and stats.py (functions mean, stdev). In __init__.py, expose all four at the top level so callers can write from mathx import dot, mean.
Relative vs. absolute. Inside a package, what changes if you replace from .utils import helper with from myapp.utils import helper? Which is preferred and why?
sys.path mystery. Write a script that prints sys.path and then runs import some_local_module. Now move the script and re-run. Why did the import behavior change?

10.9 Summary

Modules are files; packages are directories. The import system caches modules, the __name__ idiom enables dual-mode files, and pyproject.toml plus a virtual environment make a project portable. The next chapter, Chapter 11, surveys the most useful corners of Python’s “batteries included” standard library.