RAG gives AI your data at runtime. Fine-tuning teaches it permanently.
In 30 minutes, you'll understand when to use which — and build both.
For absolute beginners. No machine learning PhD required.
You want AI to know your stuff. There are two paths:
| RAG | Fine-Tuning | |
|---|---|---|
| What it does | Gives AI a cheat sheet every time | Teaches AI permanently |
| When to use | Knowledge & facts | Style & format |
| Speed | Setup in minutes | Training takes hours/days |
| Cost | Pay per query | Pay for training + inference |
| Updates | Add new docs instantly | Retrain for new data |
Most people who think they need fine-tuning actually need RAG. Fine-tuning is for changing behavior, not adding knowledge.
// RAG is perfect for:
❓ "What's our vacation policy?"
❓ "Who attended the Q3 meeting?"
❓ "What are the tax implications of..."
→ Knowledge-based questions
// Fine-tuning is perfect for:
❓ Making AI write like your CEO
❓ Following your company's report format
❓ Responding in your brand voice
❓ Using your specific terminology consistently
→ Style and behavior changes
**Think of RAG as giving the AI a textbook, and fine-tuning as changing its personality.**
Fine-tuning needs examples. Lots of them. You show the AI input-output pairs: "When you see this, say that."
The format is always: **input → desired output**. The AI learns to mimic your examples.
Scenario: Training a customer support bot for a coffee shop
The quality of your examples IS the quality of your model. Garbage in, garbage out.
// Each line is a complete training example in JSON format:
{"messages": [
{"role": "user", "content": "What are your hours?"},
{"role": "assistant", "content": "We're open Monday-Friday 6am-8pm, weekends 7am-9pm! Our baristas are ready to craft your perfect cup. ☕"}
]}
{"messages": [
{"role": "user", "content": "Do you have decaf options?"},
{"role": "assistant", "content": "Absolutely! We offer decaf versions of all our signature drinks. Our decaf espresso is Swiss Water processed for the cleanest taste. What can I make for you? ☕"}
]}
{"messages": [
{"role": "user", "content": "I want to complain about my order"},
{"role": "assistant", "content": "I'm so sorry to hear that! We want every cup to be perfect. Can you tell me what happened? I'm here to make this right. 💛"}
]}
**Each example teaches the model one pattern.** You need 50-100 examples minimum, 500-1000 for best results.
OpenAI Fine-Tuning Guide — Best practices for preparing training data
What actually happens during fine-tuning? The model looks at your examples and adjusts its **weights** (the numbers that control its behavior) to match your desired outputs.
It's like adjusting millions of knobs until the output matches your examples.
Learning rate is crucial: Too high and the model goes crazy. Too low and it learns nothing. The sweet spot is usually 0.0001-0.001.
// Simplified training step:
for epoch in range(num_epochs):
for example in training_data:
# 1. Model makes a prediction
prediction = model(example.input)
# 2. Calculate how wrong it is
loss = calculate_loss(prediction, example.target)
# 3. Adjust weights to reduce the error
gradients = calculate_gradients(loss)
weights = weights - learning_rate * gradients
# 4. Repeat for every example
# After thousands of examples, the model learns the pattern
**Training is just adjusting millions of knobs until the output matches your examples.** Each training example nudges the weights in the right direction.
LLM Training 101 — Deep dive into how language models learn
Full fine-tuning is expensive. GPT-3 has 175 billion parameters. Training all of them costs thousands of dollars and takes days.
**LoRA (Low-Rank Adaptation) is a clever shortcut:** freeze most weights and only train tiny "adapter" layers. You get 90% of the quality with 1% of the parameters.
The math trick: Instead of training a huge matrix, train two small matrices that multiply together to approximate the same result.
// Without LoRA: Train the full weight matrix W (huge)
output = input @ W // W is 4096x4096 = 16M parameters
// With LoRA: Keep W frozen, train A and B (tiny)
output = input @ W + input @ A @ B
Where:
- W: 4096x4096 (FROZEN, 16M params)
- A: 4096x16 (trainable, 65K params)
- B: 16x4096 (trainable, 65K params)
- Total trainable: 130K vs 16M (99% reduction!)
// The key insight: most fine-tuning changes are low-rank
// You don't need to adjust every weight, just the important patterns
**LoRA trains 1% of the parameters and gets 90% of the quality.** It's the standard for modern fine-tuning.
LoRA Paper — The original Low-Rank Adaptation research
Your fine-tuned model might be great or might have memorized garbage. How can you tell?
**Evaluation** is the only way to know if your fine-tuning worked. You need test data that the model has never seen before.
The fine-tuned model knows it's a coffee shop and responds in the right style! This is what good fine-tuning looks like.
// Common evaluation approaches:
1. HUMAN EVALUATION (gold standard)
- Show responses to humans
- Rate on helpfulness, accuracy, style
- Most reliable but expensive
2. AUTOMATED METRICS
- BLEU: How similar to reference answers?
- ROUGE: How much content overlap?
- Perplexity: How "surprised" is the model?
3. A/B TESTING
- Deploy both models to subset of users
- Measure real business metrics
- Engagement, satisfaction, task completion
// Rule of thumb: if you can't measure it, you can't improve it
**If you can't measure it, you can't improve it.** Always hold out test data that the model never sees during training.
You now understand the complete fine-tuning pipeline!
You learned:
The decision framework:
You're now equipped to make the right choice for your use case and execute it properly!
Evaluating Large Language Models — Comprehensive guide to LLM evaluation
Parameter-Efficient Fine-Tuning (PEFT) — Advanced techniques beyond LoRA
Test yourself. No peeking. These questions cover everything you just learned.
1. When should you choose fine-tuning over RAG?
2. What is the correct JSONL format for a fine-tuning training example?
3. What happens if your learning rate is too high during training?
4. What is the key advantage of LoRA (Low-Rank Adaptation)?
5. Why is evaluation crucial in fine-tuning?