AI Systems Glossary

Plain-English definitions for the concepts behind modern AI systems.

Every entry has a simple definition and a "why builders care" note. From model internals to agents, MCP, and safety - no jargon left unexplained.

30 terms

Activation

Interpretability

The live numbers flowing through a model as it processes a specific input. Weights are fixed; activations change with every prompt.

Why builders care · Almost all interpretability work studies activations, because that's where the model's in-the-moment 'thoughts' live.

RelatedFeatureActivation patchingModel internals

Activation patching

Interpretability

An experiment that copies a saved internal activation from one run into another to test which component actually causes a behavior.

Why builders care · It separates correlation from cause - the difference between guessing and knowing which part of the model is responsible.

RelatedActivationCircuitMechanistic Interpretability

Agent

Agents & MCP

An AI system that doesn't just answer - it takes actions in a loop: calling tools, reading results, and deciding what to do next.

Why builders care · Agents touch files, code, and APIs. Their power and their risk both come from that ability to act.

RelatedTool callingMCPHuman-in-the-loop

Attention

AI Systems

The mechanism that lets a model decide which earlier tokens to focus on when processing the current one.

Why builders care · Attention is where a lot of interpretable behavior lives - including induction heads and context tracking.

RelatedTransformerInduction headContext window

Circuit

Interpretability

A connected subgraph of features inside a model that together implement one computation - the model's learned 'code path' for a task.

Why builders care · Circuits are where interpretability gets concrete: a verified circuit is a piece of the model you actually understand.

RelatedFeatureInduction headMechanistic Interpretability

Context window

AI Systems

The maximum amount of text (in tokens) a model can consider at once - its working memory for a single request.

Why builders care · Overflow the window and the model silently forgets earlier content. It's a top cause of confusing agent behavior.

RelatedTokenRAGAgent

Data leakage

Safety

When sensitive information escapes through a model - into a prompt, a log, a retrieval index, or an outbound tool call.

Why builders care · Agents move data between systems. Without scoping and review, secrets and customer data can quietly walk out.

RelatedRAGPrompt injectionGuardrails

Embedding

AI Systems

A vector of numbers representing the meaning of a token, sentence, or document so that similar things sit close together.

Why builders care · Embeddings power semantic search and RAG. Understanding them explains why retrieval sometimes grabs the wrong thing.

RelatedVector databaseRAGToken

Evaluation / evals

AI Systems

Structured tests that measure whether an AI system produces correct, safe, and useful outputs across many cases - not just a lucky demo.

Why builders care · Evals turn 'it seemed to work' into evidence. They're how you catch regressions before your users do.

RelatedHallucinationObservabilityGuardrails

Explainability

Interpretability

Methods that explain a model from the outside - which inputs drove an output - without opening the internal mechanism.

Why builders care · Useful for accountability, but it describes correlations, not the actual computation. Don't confuse it with mechanistic understanding.

RelatedInterpretabilityObservabilityModel behavior

Feature

Interpretability

A concept the model represents internally - a direction in activation space that activates for some human-meaningful thing (a topic, tone, or pattern).

Why builders care · Features are the model's vocabulary. Reading them is the closest thing we have to reading its mind.

RelatedActivationSparse autoencoder / SAESuperposition

Function calling

Agents & MCP

The API-level mechanism that lets a model output a structured request to run a named function with specific arguments.

Why builders care · It's the plumbing under tool calling and MCP. Understanding it helps you design safe, well-scoped tools.

RelatedTool callingMCP

Guardrails

Safety

Constraints around a model - input/output filters, allowed-action lists, schemas, and policies - that keep behavior within safe bounds.

Why builders care · Guardrails are how you turn 'the model usually behaves' into enforced limits a model can't talk its way past.

RelatedHuman-in-the-loopPrompt injectionEvaluation / evals

Hallucination

Safety

When a model produces confident, fluent output that is simply false or unsupported by any real source.

Why builders care · Fluency is not accuracy. Grounding (RAG), citations, and evals are how you keep hallucinations from reaching users.

RelatedRAGEvaluation / evalsGuardrails

Human-in-the-loop

Safety

A design where a person reviews or approves consequential AI actions before they execute.

Why builders care · For anything that sends, deletes, pays, or publishes, a human checkpoint is the cheapest insurance you can buy.

RelatedGuardrailsAgentTool calling

Induction head

Interpretability

A well-understood attention circuit that completes patterns: if 'A then B' appeared earlier and A shows up again, predict B.

Why builders care · It's a core engine of in-context learning - how models pick up a format from your prompt without retraining.

RelatedCircuitAttentionTransformer

Interpretability

Interpretability

The broad goal of making a model's behavior understandable to humans. Mechanistic Interpretability is the specific subfield that opens up the internals.

Why builders care · Trust claims about AI ('aligned', 'steerable', 'safe') ultimately cash out in interpretability work.

RelatedMechanistic InterpretabilityExplainabilityObservability

MCP

Agents & MCP

Model Context Protocol - an open standard for connecting AI assistants to external tools and data, often described as 'a USB-C port for AI'.

Why builders care · MCP is how assistants gain real powers. Each server you add expands what the model can touch - for better and worse.

RelatedAgentTool callingPrompt injection

Mechanistic Interpretability

Interpretability

The study of how AI models work internally - reverse-engineering a neural network into human-understandable parts like features, circuits, and mechanisms.

Why builders care · It is the most rigorous path to trusting a model: not just observing what it does, but understanding why it does it from the inside.

RelatedInterpretabilityCircuitFeature

Model behavior

Interpretability

What a model does from the outside - its outputs across inputs. Studied without reference to internal mechanism.

Why builders care · Behavior is what your users experience, but two models with identical behavior can fail very differently inside.

RelatedModel internalsExplainabilityEvaluation / evals

Model internals

Interpretability

The actual machinery inside a model - weights, activations, features, and circuits - that produces its behavior.

Why builders care · Mechanistic Interpretability lives here. It's the difference between trusting a black box and reading its mechanism.

RelatedModel behaviorActivationCircuit

Observability

AI Systems

Seeing what an AI system does in production: inputs, tool calls, retrievals, outputs, latency, and errors - usually through logs and traces.

Why builders care · You cannot debug or secure an agent you cannot observe. Observability is the operational half of understanding.

RelatedEvaluation / evalsTool callingGuardrails

Prompt injection

Safety

An attack where malicious instructions hidden in input or retrieved data trick a model into doing something it shouldn't.

Why builders care · It's the #1 risk for LLM apps. Any agent that reads untrusted content is a target - especially one with write access.

RelatedMCPData leakageGuardrails

RAG

AI Systems

Retrieval-Augmented Generation - fetching relevant documents and feeding them to the model so answers are grounded in real sources.

Why builders care · Done well, RAG reduces hallucination and adds citations. Done badly, it leaks data or retrieves the wrong context confidently.

RelatedVector databaseEmbeddingHallucination

Sparse autoencoder / SAE

Interpretability

A small network trained on a model's activations to re-express them as a much larger but mostly-inactive set of cleaner, more interpretable features.

Why builders care · SAEs are the breakthrough that started un-mixing superposition, turning 'stare at neurons' into 'extract a dictionary of concepts'.

RelatedSuperpositionFeatureActivation

Superposition

Interpretability

The way a model packs more concepts than it has neurons by storing them along overlapping, non-perpendicular directions.

Why builders care · It's why you can't just read one neuron and know its meaning - and why SAEs are needed to pull features apart.

RelatedSparse autoencoder / SAEFeatureModel internals

Token

AI Systems

The chunk of text a model actually reads - often a word piece, not a whole word. Models predict one token at a time.

Why builders care · Token counts drive cost, latency, and context limits. Thinking in tokens explains a lot of practical constraints.

RelatedContext windowEmbeddingTransformer

Tool calling

Agents & MCP

When a model invokes an external function - search, code execution, a database query - and uses the result to continue.

Why builders care · Tool calls are where an agent's intentions become real-world actions. They're the events you most need to log and gate.

RelatedFunction callingAgentMCP

Transformer

AI Systems

The neural-network architecture behind modern LLMs. It processes sequences using attention and passes information through a shared residual stream.

Why builders care · Almost every AI tool you use is a transformer underneath. Knowing its shape demystifies a lot of behavior.

RelatedAttentionTokenEmbedding

Vector database

AI Systems

A database that stores embeddings and finds the most semantically similar items to a query.

Why builders care · It's the retrieval engine behind most RAG systems. Its tuning directly affects answer quality and data exposure.

RelatedRAGEmbedding