AI Glossary 📖 — AI Basics

FUNDAMENTALS

Neural Network

A system of connected layers made of neurons that processes data and learns by adjusting the strength of connections (weights) between neurons.

→ Chapter 1

FUNDAMENTALS

Neuron (Artificial)

The basic unit of a neural network. Takes in numbers, multiplies them by weights, adds them up, then decides whether to "fire" a signal using an activation function.

→ Chapter 1

FUNDAMENTALS

Weight

A number stored in a connection between neurons that controls how much influence one neuron has on another. Training = adjusting millions (or billions) of weights until outputs are accurate.

→ Chapter 1

FUNDAMENTALS

Transfer Learning

Using a model pre-trained on a large dataset as the starting point for a new task. Like starting a new job already knowing how to type — you adapt existing skills instead of starting from scratch.

→ Chapter 1

ARCHITECTURE

Tokenization

Breaking text into smaller units (tokens) that a model can process. "unhappy" might become ["un", "happy"]. Most LLMs use subword tokenization — roughly 1 token ≈ ¾ of a word.

→ Chapter 2

ARCHITECTURE

Embedding

Converting tokens (or images, or audio) into vectors of numbers in a high-dimensional space where similar things sit close together. "King" and "Queen" are far from "Table" but close to each other.

→ Chapter 2

ARCHITECTURE

Attention / Self-Attention

A mechanism that lets every token look at every other token and decide what's most relevant. When processing "it" in a sentence, attention figures out what "it" refers to by weighing all the other words.

→ Chapter 2

ARCHITECTURE

Transformer

The architecture behind almost every modern AI model. Introduced in "Attention is All You Need" (2017). Stacks of attention layers that process entire sequences in parallel — far more efficient than older RNNs.

→ Chapter 2

LARGE LANGUAGE MODELS

LLM

Large Language Model. A transformer trained on massive text datasets to predict the next token. GPT, Claude, Gemini, Llama are all LLMs. "Large" refers to the number of parameters (weights).

→ Chapter 3

LARGE LANGUAGE MODELS

Context Window

The maximum amount of text an LLM can "see" at once (its working memory). GPT-4 has ~128K tokens; Claude 3 up to 200K. Text outside the window is forgotten entirely.

→ Chapter 3

LARGE LANGUAGE MODELS

Temperature

A setting (0–2) that controls how random an LLM's outputs are. Temperature 0 = deterministic (always picks the most likely next token). Temperature 1+ = creative but sometimes chaotic.

→ Chapter 3

LARGE LANGUAGE MODELS

Hallucination

When an LLM confidently states something false. Happens because LLMs predict plausible text, not verified facts. The model has no internal "truth checker" — it generates the most statistically likely continuation.

→ Chapter 3

TRAINING

Fine-Tuning

Taking a pre-trained model and continuing to train it on a smaller, task-specific dataset. You keep most of the pre-trained knowledge but specialize the model for your use case.

→ Chapter 4

TRAINING

RLHF

Reinforcement Learning from Human Feedback. Humans rank model outputs, those rankings train a reward model, and then the LLM is tuned to maximize that reward. How ChatGPT became helpful instead of just predicting text.

→ Chapter 4

TRAINING

LoRA

Low-Rank Adaptation. A parameter-efficient fine-tuning method that adds small trainable matrices to each layer instead of updating all billions of weights. 100× cheaper than full fine-tuning, nearly the same result.

→ Chapter 4

TRAINING

Quantization

Reducing the precision of a model's weights (e.g. 32-bit floats → 4-bit integers) to shrink memory usage and speed up inference. Makes it possible to run large models on a laptop GPU.

→ Chapter 4

SYSTEMS

Prompt Engineering

The practice of crafting inputs to AI models to get better outputs. Techniques include few-shot examples, chain-of-thought instructions, role assignment, and output format specification.

→ Chapter 5

SYSTEMS

RAG

Retrieval-Augmented Generation. Instead of relying solely on training data, the model retrieves relevant documents at inference time and uses them as context. Reduces hallucination and adds up-to-date knowledge.

→ Chapter 5

SYSTEMS

Vector Database

A database optimized for storing and searching embeddings (high-dimensional vectors). Used in RAG pipelines to find the most semantically similar documents to a query. Examples: Pinecone, Weaviate, pgvector.

→ Chapter 5

SYSTEMS

AI Agent

An LLM given access to tools (search, code execution, APIs, file system) and a feedback loop so it can plan multi-step tasks, take actions, observe results, and adjust — rather than just responding once.

→ Chapter 5

SYSTEMS

Diffusion Model

An image generation technique that adds random noise to training images step-by-step, then trains a model to reverse that process. At inference, start from pure noise and denoise toward a coherent image. Powers DALL-E, Midjourney, Stable Diffusion.

→ Chapter 5

ETHICS

AI Bias

When an AI system produces systematically unfair or prejudiced results for certain groups. Usually caused by biased training data or flawed objective functions. Can affect hiring, lending, healthcare, and more.

→ Chapter 6

ETHICS

AI Alignment

The challenge of making AI systems do what humans actually intend, not just what was literally specified. A misaligned AI might achieve its stated goal in ways that violate human values or cause unintended harm.

→ Chapter 6

ETHICS

Red Teaming

Deliberately trying to break or find harmful outputs in an AI system before it's deployed. Teams attempt jailbreaks, adversarial prompts, and edge cases to surface safety failures. Standard practice at major AI labs.

→ Chapter 6

ETHICS

Constitutional AI

Anthropic's approach where the AI critiques its own outputs against a set of written principles (the "constitution") before responding. Reduces the need for human labellers while still aligning behaviour.

→ Chapter 6

FUTURE

Multimodal AI

AI that can process and generate multiple types of data simultaneously — text, images, audio, and video. GPT-4V, Gemini, and Claude 3 are multimodal. The transformer architecture scales to all of these.

→ Chapter 7

FUTURE

Foundation Model

A very large model trained on broad data that can be adapted (fine-tuned) for many downstream tasks. GPT-4, Claude, Gemini are all foundation models. The "general" model that specialists are built on top of.

→ Chapter 7

FUTURE

AGI

Artificial General Intelligence — a hypothetical AI that can perform any cognitive task a human can, at or above human level. No consensus on whether or when it will exist. The timeline is genuinely contested among experts.

→ Chapter 7

FUTURE

Agentic AI

AI systems that act autonomously over long time horizons — using tools, browsing the web, writing code, spawning sub-agents — to complete complex multi-step tasks with minimal human oversight.

→ Chapter 7

The AI Glossary