Statistical bias skews estimates; social bias unfairly disadvantages groups. Training data reflects history—including discriminatory policies—so models can reproduce or amplify harm unless you measure and mitigate.
Sources of bias
- Underrepresentation of demographics in data
- Historical decisions encoded in labels (hiring, lending)
- Measurement bias (different error rates by group)
- Feedback loops (model affects future training data)
Detection mindset
Slice metrics by group (region, language, age band where ethical). Compare false positive/negative rates—not only overall accuracy.
Mitigation preview
- Better data collection and labeling guidelines
- Reweighting or resampling (careful with trade-offs)
- Human review for high-impact decisions
- Policy limits on automated use cases
Ethics module goes deeper on fairness and accountability.
Important interview questions and answers
- Q: Accuracy parity enough?
A: No—equal accuracy can hide disparate error rates on minorities. - Q: Feedback loop?
A: Deployed model changes user behavior which becomes tomorrow's training data.
Self-check
- Name two bias sources in historical labels.
- Why slice metrics by group?
Tip: Slice metrics by group; overall accuracy hides disparate harm.
Interview prep
- Slice metrics why?
- Overall accuracy can hide worse error rates on minority groups.
- Feedback loop?
- Deployed model changes behavior which becomes future training data.