We see too many teams default to fine-tuning 'because it's more serious'. Spoiler: in 80% of cases, RAG is the better answer — cheaper, faster, more maintainable.
The real difference
RAG (Retrieval-Augmented Generation) injects context retrieved on the fly into the prompt. Fine-tuning modifies the model's weights so it 'knows' new information.
The crucial difference: with RAG, updating the knowledge base takes 30 seconds. With fine-tuning, it takes hours of GPU compute and a full new evaluation.
When to choose RAG
Choose RAG if:
- Your knowledge base changes regularly (product docs, FAQ, articles, etc.)
- You want verifiable citations in responses
- You manage multiple domains or clients (each client has its own index)
- You have less than 100 GB of text data to index
- You want an MVP in less than 2 weeks
When to choose fine-tuning
Choose fine-tuning if:
- You want to change the model's output style (tone, format, specific language)
- You have thousands of (input, expected output) example pairs
- Your knowledge base is fixed and hyper-specialised (genetics, niche tax law)
- Latency is critical: a smaller fine-tune can run faster than RAG + large model
Hybrid approaches
In production, we often use both. A small model fine-tuned on style + RAG for up-to-date facts often gives the best results at controlled cost.
Client example: Claude Haiku fine-tuned on their brand tone (2,000 examples), with RAG over 8,000 product documentation articles. Latency < 800 ms, cost divided by 4 vs. Claude Sonnet zero-shot.
By default, start with RAG. It's simpler to debug, faster to iterate, and the inference cost is often comparable. Move to fine-tuning only when you've proven RAG alone isn't enough.