RAG retrieves relevant documents at query time, injects them into the prompt, then generates an answer—grounding the model in your data instead of parametric memory alone.
Pipeline diagram (conceptual)
- Ingest documents → chunk → embed → store in vector index
- On user query → embed query → nearest-neighbor search
- Build prompt with top chunks + question
- LLM generates answer citing or quoting sources
When RAG wins
- Private or frequently updated knowledge (policies, tickets, repos)
- Need citations for trust and compliance
- Cheaper than fine-tuning for every doc change
When RAG struggles
Poor chunking, stale index, wrong embeddings, or questions needing global reasoning across thousands of pages—may need graph RAG, SQL, or agents with tools.
Important interview questions and answers
- Q: RAG vs fine-tuning?
A: RAG updates with index refresh; fine-tuning bakes style and format into weights—often combined.
Self-check
- List the four RAG steps.
- When is RAG preferable to fine-tuning alone?
Tip: Fix retrieval recall before tweaking the LLM model name.
Interview prep
- RAG steps?
Chunk, embed, index, retrieve, prompt, generate with optional citations.
- RAG vs fine-tune?
RAG updates with corpus refresh; fine-tuning encodes behavior in weights—often combined.