Pandas for Python Developers
Overview
A 45-minute talk delivered at the Portland Python Users Group on May 28, 2026. The talk was a practical introduction to Pandas aimed at experienced Python developers who had not previously worked with the library.
Background
The Portland Python Users Group needed speakers and was specifically looking for non-AI content. I proposed pandas because it addresses a gap I have observed consistently in professional settings: software engineers with solid Python experience routinely arrive at data-adjacent work without any exposure to pandas. The library's data model is different enough from standard Python idioms that experienced developers still need a structured introduction to use it effectively.
The talk was designed to give that introduction in a single session, covering the concepts and patterns that account for most practical pandas work, presented in terms that would make sense to someone who already knows Python well.
Format
The talk uses a single Jupyter notebook that serves as both the presentation and the reference material. Markdown cells are written as prose and read directly during the talk. Code cells are executed live. The same notebook functions as a take-home reference after the session.
The dataset is a synthetic web server access log generated with a fixed random seed (random.seed(42)), giving the audience a domain they are likely to recognize while keeping the focus on Pandas operations rather than the data itself. It contains 5,000 rows with columns for timestamp, IP address, HTTP method, endpoint, status code, response time in milliseconds, bytes sent, and log level.
Content
The talk starts with the four concepts a Python developer needs before individual pandas methods make sense, then moves through the operations that come up in real analysis work:
- Mental model — Series, DataFrame, Index, and dtypes. The talk starts here because Pandas methods do not make sense without a clear picture of what these objects are and how they relate to each other.
- Loading data — Reading CSV and JSON with
pd.read_csvandpd.read_json, with explicit comparison to how the same task is done in plain Python. - Inspection —
shape,dtypes,head,info, anddescribeas the standard first pass on a new dataset. - Selection — Label-based indexing with
.loc, boolean masks, and why chained indexing produces unreliable results. - Missing data — Detecting, dropping, and filling nulls with
isna,dropna, andfillna. - Grouping and aggregation —
groupbywithaggfor summarising data across categories. - String operations — Vectorised string methods via the
.straccessor. - Adding data — Why
DataFrame.appendwas removed in pandas 3.x and the correct patterns using a list accumulator withpd.concat.
Development Environment
- Python 3.11+
- Pandas 3.x
- Jupyter