You rarely see entire populations—you analyze samples and infer with uncertainty. Bias enters when samples are not representative.
Definitions
- Population — all units you care about (all customers ever)
- Sample — observed subset (last month's signups)
- Statistic — number from sample (sample mean)
- Parameter — unknown population truth (true mean)
Sampling bias examples
- Survey only active users — overestimates engagement
- Train on one region — model fails elsewhere
- Survivorship bias — only seeing successes
Important interview questions and answers
- Q: Sample mean vs population mean?
A: Sample mean estimates population mean with error. - Q: Survivorship bias?
A: Analyzing only entities that lasted—ignoring failures.
Self-check
- Define population vs sample.
- One source of sampling bias?
Tip: Write what population your sample represents—explicitly.
Interview prep
- Statistic vs parameter?
Statistic from sample; parameter is population truth.
- Bias?
Sample not representative of population.