// CHAPTER 04 OF 05

Training &
Optimization

Fine-tuning, RLHF, LoRA, Quantization — how raw AI gets polished, specialized, and made to run on your laptop.

11
Fine-Tuning

Fine-tuning is taking a pretrained model and continuing to train it on a smaller, more focused dataset. The model already understands general language — you're guiding it toward a specialty.

Base Model vs Fine-Tuned — Same question, very different answers
🌍 BASE MODEL
Q: "What are the indemnification clauses in section 4?"
A general response about contracts and legal language in a broad sense, likely not precise enough for actual legal work...
⚖️ FINE-TUNED (Legal)
Q: "What are the indemnification clauses in section 4?"
Section 4.2 establishes mutual indemnification obligations. The indemnitor must defend against third-party claims arising from breach of warranties, with carve-outs for gross negligence...
🎓
Like specialization: A general doctor knows medicine broadly. A cardiologist has the same foundation, but trained further on heart conditions. Fine-tuning works the same way.
💡 Fine-tuning is powerful but costly — you're updating billions of parameters. That's where LoRA (concept 13) comes in!
12
RLHF

Why do modern chatbots feel helpful, polite, and safe? Without any guidance, a model just continues patterns — it'd say anything that sounds probable, even if harmful.

RLHF (Reinforcement Learning from Human Feedback) fixes this by injecting human judgment into training. Humans rank AI responses, and the model learns to prefer what people actually like.

You're the human trainer! Pick the better response
Prompt: "How do I get better at coding?"
Coding is the act of writing instructions for a computer to execute. There are many programming languages such as Python, JavaScript, C++, Java, and others. Each has syntax and semantics. You should choose one and practice.
Start with one language (Python is great for beginners), then build small real projects — a to-do app, a simple game. Don't just follow tutorials; break things and fix them. That's where real learning happens! 🚀
👆 Which response would you prefer as a user?

Over thousands of these comparisons, the model learns a sense of preference — what helpful, clear, and safe answers look like. That's why modern AI feels very different from raw language models.

💡 RLHF is the reason chatbots decline harmful requests, follow instructions properly, and generally feel like they're trying to help. Without it, models would be far more unpredictable.
13
LoRA

Fine-tuning a huge model means updating billions of parameters — expensive and hard to manage. LoRA (Low-Rank Adaptation) is the clever shortcut.

Instead of modifying the entire model, LoRA keeps the original frozen and adds tiny trainable components on top — often less than 1% of the total parameters.

Full Fine-Tuning vs LoRA
🏋️
Full Fine-Tuning
Modifies all ~70B parameters. Needs multiple high-end GPUs. Each task = saving a full model copy.
💸 $$$$ per run
🪶
LoRA
Only trains ~0.1% of parameters. Runs on a single GPU. Each task = saving a tiny adapter file (a few MB).
💚 $ per run
🔌
Think of it like plug-ins: The base model is your operating system. LoRA adapters are like small plug-ins that change how it behaves for a specific task. You don't reinstall the OS — you just load the plug-in.
💡 LoRA makes fine-tuning accessible to regular developers, not just big labs. That's why you can now customize AI models on a laptop!
14
Quantization

As models get bigger, running them gets harder. Quantization solves this by storing model weights with less precision — using fewer bits per number.

A full-precision model stores each weight in 32 bits. Quantizing to 4 bits makes it 8× smaller. Quality drops a tiny bit, but the model becomes much more practical to run.

Model size vs precision tradeoff
32-bit (FP32)
~140 GB 😱
16-bit (FP16)
~70 GB 😓
8-bit (INT8)
~35 GB 😌
4-bit (INT4)
~17 GB 🎉

Example: A 70B parameter model at different precisions. 4-bit lets you run it on a desktop GPU!

💡 When you see "llama-3-8B-Q4_K_M.gguf" — that's a 4-bit quantized model. The quality drop is surprisingly small, and it's the reason large AI models can run locally on a laptop.
Back to
Ch 3: LLMs
Chapter 4 done! 💪
14 concepts down
Final chapter
Building AI Systems