An LLM is just a transformer trained on a massive amount of text. Books, websites, code β hundreds of billions of tokens. And the training goal?
Surprisingly simple: predict the next token. That's it. Over trillions of examples, it learns patterns, reasoning, language structure β and starts to look like understanding.
This is what an LLM does billions of times during training. What word comes next?
Every AI model has a limit to how much it can "remember" at once β the context window. It's the model's short-term working memory. Everything β your messages and its responses β has to fit inside.
Early models could only handle a few thousand tokens. Modern models handle entire books. But bigger context = more compute and slower responses.
When a model picks the next token, it calculates probabilities for every option. Temperature controls how strictly it follows those probabilities.
High temp (1.5): Takes wild risks with less likely words. Great for creative writing, brainstorming.
Sometimes AI gives you an answer that sounds totally confident⦠but it's completely made up. That's called a hallucination.
Why? Because the model isn't trying to tell the truth. It's trying to generate the most probable next piece of text. If a false statement looks like what should come next β it'll say it with full confidence.
The transformer architecture was introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al. The lead author Ashish Vaswani wrote the paper while working at Google Brain. The paper introduced self-attention and was cited over 100,000 times by 2024. It also won the "Best Paper of the Decade" award at NeurIPS 2023.