Optimize retrieval before blaming the LLM—if the right chunk never arrives, generation cannot fix it.
Metrics
- Recall@k — is the gold passage in top k results?
- MRR — how high is the first relevant hit?
- Context precision — % of retrieved tokens actually useful
Eval dataset
Curate question → relevant doc IDs pairs from support tickets or docs team. Run nightly when the corpus changes.
Reranking
A cross-encoder reranker scores query–passage pairs more accurately than bi-encoder retrieval alone—worth the latency for top 20 → top 5.
Important interview questions and answers
- Q: Recall@k meaning?
A: Fraction of queries where at least one gold doc appears in top k retrieved.
Self-check
- Define Recall@k.
- Why rerank after vector search?
Pitfall: Optimizing answer prose while Recall@5 is 40%—retrieval owns the failure.
Interview prep
- Recall@k?
Whether gold document appears in top k retrieved chunks.
- Rerank?
Cross-encoder rescores candidates—better precision before LLM.