Plain-English definitions for the concepts behind modern AI systems.
Every entry has a simple definition and a "why builders care" note. From model internals to agents, MCP, and safety - no jargon left unexplained.
30 terms
Activation
InterpretabilityThe live numbers flowing through a model as it processes a specific input. Weights are fixed; activations change with every prompt.
Why builders care · Almost all interpretability work studies activations, because that's where the model's in-the-moment 'thoughts' live.
Activation patching
InterpretabilityAn experiment that copies a saved internal activation from one run into another to test which component actually causes a behavior.
Why builders care · It separates correlation from cause - the difference between guessing and knowing which part of the model is responsible.
Agent
Agents & MCPAn AI system that doesn't just answer - it takes actions in a loop: calling tools, reading results, and deciding what to do next.
Why builders care · Agents touch files, code, and APIs. Their power and their risk both come from that ability to act.
Attention
AI SystemsThe mechanism that lets a model decide which earlier tokens to focus on when processing the current one.
Why builders care · Attention is where a lot of interpretable behavior lives - including induction heads and context tracking.
Circuit
InterpretabilityA connected subgraph of features inside a model that together implement one computation - the model's learned 'code path' for a task.
Why builders care · Circuits are where interpretability gets concrete: a verified circuit is a piece of the model you actually understand.
Context window
AI SystemsThe maximum amount of text (in tokens) a model can consider at once - its working memory for a single request.
Why builders care · Overflow the window and the model silently forgets earlier content. It's a top cause of confusing agent behavior.
Data leakage
SafetyWhen sensitive information escapes through a model - into a prompt, a log, a retrieval index, or an outbound tool call.
Why builders care · Agents move data between systems. Without scoping and review, secrets and customer data can quietly walk out.
Embedding
AI SystemsA vector of numbers representing the meaning of a token, sentence, or document so that similar things sit close together.
Why builders care · Embeddings power semantic search and RAG. Understanding them explains why retrieval sometimes grabs the wrong thing.
Evaluation / evals
AI SystemsStructured tests that measure whether an AI system produces correct, safe, and useful outputs across many cases - not just a lucky demo.
Why builders care · Evals turn 'it seemed to work' into evidence. They're how you catch regressions before your users do.
Explainability
InterpretabilityMethods that explain a model from the outside - which inputs drove an output - without opening the internal mechanism.
Why builders care · Useful for accountability, but it describes correlations, not the actual computation. Don't confuse it with mechanistic understanding.
Feature
InterpretabilityA concept the model represents internally - a direction in activation space that activates for some human-meaningful thing (a topic, tone, or pattern).
Why builders care · Features are the model's vocabulary. Reading them is the closest thing we have to reading its mind.
Function calling
Agents & MCPThe API-level mechanism that lets a model output a structured request to run a named function with specific arguments.
Why builders care · It's the plumbing under tool calling and MCP. Understanding it helps you design safe, well-scoped tools.
Guardrails
SafetyConstraints around a model - input/output filters, allowed-action lists, schemas, and policies - that keep behavior within safe bounds.
Why builders care · Guardrails are how you turn 'the model usually behaves' into enforced limits a model can't talk its way past.
Hallucination
SafetyWhen a model produces confident, fluent output that is simply false or unsupported by any real source.
Why builders care · Fluency is not accuracy. Grounding (RAG), citations, and evals are how you keep hallucinations from reaching users.
Human-in-the-loop
SafetyA design where a person reviews or approves consequential AI actions before they execute.
Why builders care · For anything that sends, deletes, pays, or publishes, a human checkpoint is the cheapest insurance you can buy.
Induction head
InterpretabilityA well-understood attention circuit that completes patterns: if 'A then B' appeared earlier and A shows up again, predict B.
Why builders care · It's a core engine of in-context learning - how models pick up a format from your prompt without retraining.
Interpretability
InterpretabilityThe broad goal of making a model's behavior understandable to humans. Mechanistic Interpretability is the specific subfield that opens up the internals.
Why builders care · Trust claims about AI ('aligned', 'steerable', 'safe') ultimately cash out in interpretability work.
MCP
Agents & MCPModel Context Protocol - an open standard for connecting AI assistants to external tools and data, often described as 'a USB-C port for AI'.
Why builders care · MCP is how assistants gain real powers. Each server you add expands what the model can touch - for better and worse.
Mechanistic Interpretability
InterpretabilityThe study of how AI models work internally - reverse-engineering a neural network into human-understandable parts like features, circuits, and mechanisms.
Why builders care · It is the most rigorous path to trusting a model: not just observing what it does, but understanding why it does it from the inside.
Model behavior
InterpretabilityWhat a model does from the outside - its outputs across inputs. Studied without reference to internal mechanism.
Why builders care · Behavior is what your users experience, but two models with identical behavior can fail very differently inside.
Model internals
InterpretabilityThe actual machinery inside a model - weights, activations, features, and circuits - that produces its behavior.
Why builders care · Mechanistic Interpretability lives here. It's the difference between trusting a black box and reading its mechanism.
Observability
AI SystemsSeeing what an AI system does in production: inputs, tool calls, retrievals, outputs, latency, and errors - usually through logs and traces.
Why builders care · You cannot debug or secure an agent you cannot observe. Observability is the operational half of understanding.
Prompt injection
SafetyAn attack where malicious instructions hidden in input or retrieved data trick a model into doing something it shouldn't.
Why builders care · It's the #1 risk for LLM apps. Any agent that reads untrusted content is a target - especially one with write access.
RAG
AI SystemsRetrieval-Augmented Generation - fetching relevant documents and feeding them to the model so answers are grounded in real sources.
Why builders care · Done well, RAG reduces hallucination and adds citations. Done badly, it leaks data or retrieves the wrong context confidently.
Sparse autoencoder / SAE
InterpretabilityA small network trained on a model's activations to re-express them as a much larger but mostly-inactive set of cleaner, more interpretable features.
Why builders care · SAEs are the breakthrough that started un-mixing superposition, turning 'stare at neurons' into 'extract a dictionary of concepts'.
Superposition
InterpretabilityThe way a model packs more concepts than it has neurons by storing them along overlapping, non-perpendicular directions.
Why builders care · It's why you can't just read one neuron and know its meaning - and why SAEs are needed to pull features apart.
Token
AI SystemsThe chunk of text a model actually reads - often a word piece, not a whole word. Models predict one token at a time.
Why builders care · Token counts drive cost, latency, and context limits. Thinking in tokens explains a lot of practical constraints.
Tool calling
Agents & MCPWhen a model invokes an external function - search, code execution, a database query - and uses the result to continue.
Why builders care · Tool calls are where an agent's intentions become real-world actions. They're the events you most need to log and gate.
Transformer
AI SystemsThe neural-network architecture behind modern LLMs. It processes sequences using attention and passes information through a shared residual stream.
Why builders care · Almost every AI tool you use is a transformer underneath. Knowing its shape demystifies a lot of behavior.
Vector database
AI SystemsA database that stores embeddings and finds the most semantically similar items to a query.
Why builders care · It's the retrieval engine behind most RAG systems. Its tuning directly affects answer quality and data exposure.