Practice EDA on a list of dicts—the same row shape Pandas uses as DataFrame records. We compute counts, missing revenue, and session statistics with Python stdlib.
Dataset
Each dict is one user with session count and optional revenue. One revenue is missing; one session count is an extreme value worth noting.
What the code does
- Print row count and column keys
- Count missing
revenuevalues - Compute mean and median sessions
- Summarize revenue for non-missing rows
Extend locally: load CSV with pandas read_csv, then describe() and histograms.
Next steps
# With pandas locally:
# import pandas as pd
# df = pd.DataFrame(rows)
# print(df.info())
# print(df.describe())
Important interview questions and answers
- Q: Why list of dicts?
A: Matches JSON/API rows and converts cleanly to DataFrame—good mental model before pandas. - Q: Median vs mean here?
A: If one user has huge sessions, median sessions is often more representative than mean.
Self-check
- What does the script count for missing revenue?
- Which statistic is robust to one very large session count?
- What pandas function loads CSV rows into a table?
Tip: Run the preview script and compare group medians by hand.
Interview prep
- Group median?
Summarize by category without pandas using loops or statistics.