AI Agent Glossary | Quantilus | Agentic AI, MCP, RAG, LoRA, Guardrails & more

Core concepts

AI Agent

A software system that uses an LLM, a set of tools, and a reasoning loop to accomplish goals. The agent plans, calls tools, reads results, decides what to do next, and repeats until done. Distinct from a chatbot, which makes one LLM call and stops. See how agents work.

Agentic AI

AI systems built around goal-directed, tool-using agents, as opposed to single-turn generative AI. Uses generative AI under the hood but adds tools, memory, planning, and a runtime loop. See our primer.

LLM (Large Language Model)

A neural network trained on enormous text corpora to predict the next token. Examples: GPT-5, Claude 4, Gemini 2.5, LLaMA 3, Qwen 2, Mistral. The "reasoning" core of an agent.

Foundation Model

A large, general-purpose AI model trained on broad data that can be adapted to many downstream tasks. GPT-class, Claude, Gemini, LLaMA, Mistral, Qwen, Phi are foundation models.

Frontier Model

The state-of-the-art models at any given moment. Currently (mid-2026): Claude 4 Opus, GPT-5, Gemini 2.5 Pro, and their nearest open-weight counterparts. The label moves as new models ship.

How agents work

Tool Use

The agent's ability to invoke external functions (API calls, database queries, code execution, sending email) during a reasoning loop. The model receives tool definitions, decides which to call with what arguments, and reads results to decide next steps.

ReAct

A pattern for agent reasoning: Reason about the situation, Act with a tool, observe results. Coined in a 2022 paper; foundational to modern tool-using agents.

Chain-of-Thought (CoT)

A prompting technique that asks the model to reason step-by-step before producing a final answer. Improves accuracy on math, logic, and multi-step tasks. Modern reasoning models (o3, Claude with extended thinking, Gemini Thinking) bake this in.

Memory

What the agent remembers between turns. Three scopes typically used in Quantilus agents: conversation memory (this thread), entity memory (this customer/case/account), and organizational memory (the company's accumulated knowledge).

Sub-agent / Specialist Agent

A specialized agent invoked by a parent agent for a focused task. E.g., a "rights research" sub-agent inside a publishing workflow, or a "KYC check" sub-agent inside financial onboarding. Composes the way functions compose.

MCP (Model Context Protocol)

An open standard for how AI agents discover and call tools, resources, and prompts. Released by Anthropic in late 2024, now widely supported. Defines a uniform way for an agent to interface with external systems, eliminating bespoke per-integration code.

Knowledge & retrieval

RAG (Retrieval-Augmented Generation)

A pattern where the agent retrieves relevant documents from a knowledge base before generating its response. Avoids hallucination by grounding output in your actual content. Typically combines an embedding model with a vector database.

Embedding

A dense vector representation of text (or image, audio) where semantically similar inputs sit close together in vector space. Used to retrieve relevant content from a knowledge base for RAG.

Vector Database

A database optimized for storing and querying high-dimensional embedding vectors. Common choices: pgvector (Postgres extension), Pinecone, Weaviate, Qdrant, Milvus. Powers retrieval in RAG pipelines.

Grounding

Tying model output to verifiable sources, retrieved documents, tool results, authoritative APIs. A grounded agent cites where its answer came from and surfaces uncertainty when the source doesn't fully answer the question.

Hallucination

When a model produces output that sounds confident but is factually wrong or invented. Mitigated by grounding (RAG), citation requirements, low-confidence routing, and evals that catch hallucinations in test cases before production.

Model adaptation

Prompt

The instruction or input given to a model at inference time. Includes the system prompt, conversation history, retrieved context, and the current user input.

System Prompt

Persistent instructions placed at the start of every conversation that define the agent's role, behavior, tone, and constraints. The system prompt is where most of a Quantilus agent's policy and personality lives.

Fine-tuning

Continuing the training of a base model on your specific data to specialize it. Different from prompting (leaves weights unchanged) and from RAG (adds retrieved context at inference). Useful when you need consistent stylistic or domain behavior at scale.

LoRA (Low-Rank Adaptation)

An efficient fine-tuning technique that trains a small set of additional parameters rather than updating the full model. Reduces fine-tuning cost by 10–100× and lets you swap adapters per use case without storing multiple full models.

Multimodal

A model that processes more than one input type, text, images, audio, video, structured data. GPT-4o, Claude 3.5 Sonnet, and Gemini 2.x are all multimodal. Important for agents that read screens, parse PDFs with diagrams, or handle voice input.

Operations & cost

Inference

The act of running a trained model to produce output (as opposed to training). For production agents, inference cost (paid per token to OpenAI/Anthropic/Google, or per GPU-hour for self-hosted) is a real ongoing line item. See pricing.

Token

The unit a language model reads and produces, typically a sub-word fragment. English averages roughly 0.75 words per token. Model pricing is in dollars per million tokens.

Context Window

The maximum number of tokens a model can consider at once (prompt + history + retrieved context + output). Modern frontier models offer 200K–2M token windows. Bigger isn't always better, accuracy can degrade in long contexts ("lost in the middle").

Evals (Evaluation Harness)

A versioned set of test cases for an agent that runs automatically on every change. Measures task completion, factual accuracy, latency, cost, and human-override rates. Quantilus agents ship with evals baked in.

Eval Harness

The full evaluation infrastructure: test datasets, scoring functions, regression alarms, dashboards. Run automatically on every prompt change, tool change, or model upgrade.

Drift

Quality degradation over time as upstream data, user behavior, or model versions change. Caught by ongoing eval runs and monitoring; addressed by retraining, prompt tuning, or model upgrade.

Trust, safety, governance

Guardrails

Runtime constraints on what an agent is allowed to do or say: input validation, output filtering, permission checks, approval gates, PII/PHI redaction, factual-claim verification. Essential for production agents that touch real systems.

Human-in-the-loop (HITL)

A design pattern where humans approve, review, or correct agent actions at defined points. Used for high-stakes decisions (pricing, contracts, regulated communications) and to keep the agent improving over time.

Prompt Injection

An attack where a malicious instruction is hidden inside content the agent reads (an email, document, or web page), attempting to hijack what it does next. Defended with input validation, least-privilege tool permissions, human approval gates, and evals that test known injection patterns. See /security.

Data Poisoning

An attack where corrupted or planted data is introduced into an agent's knowledge sources or training set to skew its behavior over time. Defended with vetted, access-controlled sources, data provenance tracking, grounding with citations, and drift monitoring.

Private AI

AI deployment where your data never leaves your environment. Three patterns: open-weight in your VPC, frontier via your cloud account (Bedrock/Azure OpenAI/Vertex), or fully air-gapped. Required for regulated workloads. See /security.

Open-Weight Model

A model whose trained weights are publicly downloadable, allowing fully private deployment on your own hardware. Examples: LLaMA 3.x, Mistral, Qwen 2.x, Gemma 2, Phi 3. Distinct from "open-source" which would imply training data + recipe are also open.

Closed-Weight Model

A model accessed only via API (Anthropic Claude, OpenAI GPT, Google Gemini). Higher quality at the frontier, but requires sending data to the provider or via a hyperscaler gateway (Bedrock, Azure OpenAI, Vertex).

BAA (Business Associate Agreement)

A HIPAA contract between a covered entity (healthcare org) and a business associate (vendor handling PHI). Quantilus signs BAAs for healthcare engagements. Required for any agent that touches PHI.

BYOK (Bring Your Own Key)

Encryption arrangement where the customer holds and rotates the keys used to encrypt their data, even when stored in a vendor environment. Standard pattern for SOC 2 Type 2 customers.

AI agent terms, in plain English.