Normalization reduces redundancy by splitting data into related tables. First normal form (1NF) through third normal form (3NF) are interview staples and guide practical schema design.
Problem: duplication
Storing customer name on every order row duplicates data—update anomalies if Ada renames and only some rows change.
Split tables
-- Normalized: customer data once
CREATE TABLE customers (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
total REAL
);Practice: Run DDL in a fresh SQLite file or DB Fiddle schema pane. Drop test tables when experimenting.
Denormalize deliberately
Reporting caches, read replicas, and materialized views sometimes denormalize for speed—start normalized, denormalize with measured need. Data Science pipelines may flatten tables for ML features.
Normal forms sketch
- 1NF — atomic columns, no repeating groups
- 2NF — no partial dependency on composite keys
- 3NF — no transitive dependency on non-key attributes
Important interview questions and answers
- Q: Always 3NF?
A: OLTP apps usually yes; analytics may denormalize star schemas in warehouses. - Q: JOIN cost vs duplication?
A: JOINs cost CPU; duplication costs consistency—normalize first, optimize with indexes and caches.
Self-check
- What update anomaly does duplication cause?
- Name the first three normal forms in order.
Tip: Normalize OLTP schemas first; denormalize reporting only with measured need.
Interview prep
- Why normalize?
Reduce duplication and update anomalies.
- 3NF sketch?
No non-key attribute depends on another non-key attribute.