Chapter 3: Large Language Models 🤖

7

LLM (Large Language Model)

An LLM is just a transformer trained on a massive amount of text. Books, websites, code — hundreds of billions of tokens. And the training goal?

Surprisingly simple: predict the next token. That's it. Over trillions of examples, it learns patterns, reasoning, language structure — and starts to look like understanding.

Play: Predict the Next Word

This is what an LLM does billions of times during training. What word comes next?

🔢

What does "large" mean? It refers to parameters — the internal values the model learns during training. GPT-3 has 175 billion. GPT-4 likely has over a trillion. Training one of these costs millions of dollars!

💡 When you use ChatGPT, Claude, or Gemini — you're talking to a model that learned language by doing one simple thing over and over: predicting what comes next.

8

Context Window

Every AI model has a limit to how much it can "remember" at once — the context window. It's the model's short-term working memory. Everything — your messages and its responses — has to fit inside.

See what happens when the context fills up

Context window size: 4 messages

CONTEXT WINDOW

Early models could only handle a few thousand tokens. Modern models handle entire books. But bigger context = more compute and slower responses.

💡 The "lost in the middle" problem: models tend to remember the beginning and end of a conversation better than the middle. So if you ask about something from way back — the AI might "forget" it.

9

Temperature

When a model picks the next token, it calculates probabilities for every option. Temperature controls how strictly it follows those probabilities.

Drag the temperature — watch the output change

🎯 Precise (0.1) ✨ Creative (1.5)

Loading...

Temperature: 0.5 — Balanced

🌡️

Low temp (0.1): Always picks the most likely next word. Great for code, facts, summaries.
High temp (1.5): Takes wild risks with less likely words. Great for creative writing, brainstorming.

10

Hallucination

Sometimes AI gives you an answer that sounds totally confident… but it's completely made up. That's called a hallucination.

Why? Because the model isn't trying to tell the truth. It's trying to generate the most probable next piece of text. If a false statement looks like what should come next — it'll say it with full confidence.

Spot the hallucination — click the suspicious parts

The transformer architecture was introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. The lead author Ashish Vaswani wrote the paper while working at Google Brain. The paper introduced self-attention and was cited over 100,000 times by 2024. It also won the "Best Paper of the Decade" award at NeurIPS 2023.

💡 This is why you should never blindly trust AI for important facts, code logic, or decisions. The model is incredibly good at sounding right — but it still needs a human to check if it actually is right.

LargeLanguageModels

Large
Language
Models