Pandas for Python Developers

Overview

A 45-minute talk delivered at the Portland Python Users Group on May 28, 2026. The talk was a practical introduction to Pandas aimed at experienced Python developers who had not previously worked with the library.

Background

The Portland Python Users Group needed speakers and was specifically looking for non-AI content. I proposed pandas because it addresses a gap I have observed consistently in professional settings: software engineers with solid Python experience routinely arrive at data-adjacent work without any exposure to pandas. The library's data model is different enough from standard Python idioms that experienced developers still need a structured introduction to use it effectively.

The talk was designed to give that introduction in a single session, covering the concepts and patterns that account for most practical pandas work, presented in terms that would make sense to someone who already knows Python well.

Format

The talk uses a single Jupyter notebook that serves as both the presentation and the reference material. Markdown cells are written as prose and read directly during the talk. Code cells are executed live. The same notebook functions as a take-home reference after the session.

The dataset is a synthetic web server access log generated with a fixed random seed (random.seed(42)), giving the audience a domain they are likely to recognize while keeping the focus on Pandas operations rather than the data itself. It contains 5,000 rows with columns for timestamp, IP address, HTTP method, endpoint, status code, response time in milliseconds, bytes sent, and log level.

Content

The talk starts with the four concepts a Python developer needs before individual pandas methods make sense, then moves through the operations that come up in real analysis work:

  • Mental model — Series, DataFrame, Index, and dtypes. The talk starts here because Pandas methods do not make sense without a clear picture of what these objects are and how they relate to each other.
  • Loading data — Reading CSV and JSON with pd.read_csv and pd.read_json, with explicit comparison to how the same task is done in plain Python.
  • Inspectionshape, dtypes, head, info, and describe as the standard first pass on a new dataset.
  • Selection — Label-based indexing with .loc, boolean masks, and why chained indexing produces unreliable results.
  • Missing data — Detecting, dropping, and filling nulls with isna, dropna, and fillna.
  • Grouping and aggregationgroupby with agg for summarising data across categories.
  • String operations — Vectorised string methods via the .str accessor.
  • Adding data — Why DataFrame.append was removed in pandas 3.x and the correct patterns using a list accumulator with pd.concat.

Development Environment

  • Python 3.11+
  • Pandas 3.x
  • Jupyter