Many Pandas operations mirror SQL: SELECT ≈ column selection, WHERE ≈ boolean filtering, GROUP BY ≈ groupby, JOIN ≈ merge. Pandas shines in-memory on moderate datasets; SQL scales on servers.
Side-by-side mapping
| SQL | Pandas |
|---|---|
SELECT col | df['col'] or df[['col']] |
WHERE price > 10 | df[df['price'] > 10] |
ORDER BY price DESC | df.sort_values('price', ascending=False) |
GROUP BY dept, SUM(sales) | df.groupby('dept')['sales'].sum() |
JOIN | pd.merge(left, right, on='key') |
When to use each
- SQL — large tables in databases, transactional queries, shared team data warehouse
- Pandas — notebook EDA, one-off transforms, ML pipelines, files on disk
- Both — pull with SQL (
read_sql), wrangle in Pandas, push results back
Same filter, two styles
# SQL mental model:
# SELECT name, price FROM products WHERE price > 10 ORDER BY price
import pandas as pd
df = pd.DataFrame({'name': ['A','B','C'], 'price': [5, 15, 20]})
result = df.loc[df['price'] > 10, ['name', 'price']].sort_values('price')
print(result)
Important interview questions and answers
- Q: Can Pandas replace SQL?
A: No—they complement each other. SQL aggregates at scale; Pandas flexes in Python workflows. - Q: read_sql?
A: Loads query results directly into a DataFrame—bridges database and notebook.
Self-check
- Map SQL WHERE to a Pandas expression.
- When would you prefer SQL over in-memory Pandas?
Tip: Sketch SQL first, then translate to Pandas—helps in interviews and at SQL handoff.
Interview prep
- When SQL?
Large data in databases, shared warehouse, transactional queries at scale.
- When Pandas?
Notebook EDA, file wrangling, ML feature pipelines in Python.