Supervised learning uses labeled examples (input → known output). Unsupervised learning finds structure in unlabeled data—clusters, embeddings, anomalies—without explicit targets.
Supervised tasks
- Classification — spam vs not spam, disease yes/no
- Regression — predict numeric value (price, demand)
- Ranking — order items by relevance (often supervised from clicks)
Unsupervised tasks
- Clustering — group similar customers
- Dimensionality reduction — compress features for visualization
- Anomaly detection — flag unusual transactions
- Representation learning — embeddings for search (sometimes self-supervised)
Label cost matters
# Supervised needs labels
labeled = [("email text", "spam"), ("email text", "ham")]
# Unsupervised: only inputs
unlabeled = ["email text", "email text", "email text"]
print(len(labeled), "labeled vs", len(unlabeled), "unlabeled rows")Practice: Optional pseudocode only—run locally in Jupyter if helpful. No model training required for this literacy track.
Semi-supervised and self-supervised methods blend both when labels are scarce.
Important interview questions and answers
- Q: Clustering supervised?
A: No—no fixed label column; you interpret groups afterward. - Q: Self-supervised?
A: Creates labels from data itself (e.g., predict masked words)—bridges supervised and unsupervised.
Self-check
- Give one supervised and one unsupervised task.
- Why are labels expensive?
Tip: Label cost often decides supervised vs clustering/self-supervised paths.
Interview prep
- Supervised example?
- Spam classification with labeled emails.
- Unsupervised example?
- Customer clustering without predefined segments.