Blog · 9 min read · May 16, 2026
Private AI vs. SaaS AI.
Which deployment fits your data? Three private patterns, when SaaS is fine, and the questions legal will ask before signing off.
"Just use ChatGPT" is the right answer for plenty of workflows. It is also the wrong answer for plenty of workflows, and the difference shows up in legal review long before it shows up in production. This piece is the architecture conversation we have with every enterprise client in the first week of an engagement: where can your data actually go, and which deployment posture matches that constraint?
First, what "SaaS AI" actually means
"SaaS AI" is shorthand for using a model provider's hosted product directly. ChatGPT, Claude.ai, Gemini consumer/enterprise, GitHub Copilot, the public Anthropic and OpenAI APIs from your own server, anything where the prompt flows to the provider's infrastructure for inference. Some of these (e.g., the Anthropic and OpenAI enterprise APIs) come with strong contractual protections; some (the consumer products) don't.
The key question isn't "is this product safe", it's "where does our data go, who can see it, who can retain it, and what does the provider's contract actually allow?" Different providers, different tiers, different answers.
When SaaS AI is fine
For non-sensitive data, marketing copy generation, public-content summarization, internal brainstorming, code assistance on public repos, SaaS AI is fine and probably overkill to architect around. Pick the best tool, sign the enterprise tier with reasonable data terms, and move on. The architectural overhead of private deployment for non-sensitive work is wasted effort.
The trouble starts when the workflow needs to touch one of these:
- PHI (Protected Health Information) under HIPAA
- FERPA-protected student records
- Material non-public financial information
- Customer PII you've contractually committed to keep scoped
- Pre-publication editorial work, manuscripts, contracts
- Source code, internal IP, trade secrets
- Privileged legal communications
- Government-controlled information (CUI, classified)
- Data in jurisdictions with strict residency rules (EU, India, China)
For any of these, "we just use ChatGPT" doesn't survive legal review, and shouldn't. Private deployment is the requirement.
Three private-AI deployment patterns
Pattern 1: Open-weight models in your VPC
What it is. Open-weight models, LLaMA, Mistral, Qwen, Gemma, Phi, deployed inside your own AWS / Azure / GCP environment or on-prem cluster. Model weights live on your hardware. Inference happens on your hardware. No external API calls. No telemetry to a model provider.
When it fits. Highly regulated workloads. Workloads that require complete data isolation. Use cases where the operational profile is steady enough to amortize GPU rental.
The trade-offs. Open-weight models have closed most of the quality gap with frontier closed-weight models, especially on well-defined tasks, but the very top of the frontier still belongs to Claude, GPT, and Gemini. For complex reasoning at scale, open-weight may take more careful prompt engineering. Operationally, you carry the infrastructure burden, GPU sizing, autoscaling, model serving, which is real engineering work.
The cost shape. GPU rental dominates. Roughly $5K–$50K/month for production capacity depending on workload size. Often cheaper than SaaS at very high volume, more expensive at low volume.
Pattern 2: Frontier models via your cloud account (Bedrock / Azure OpenAI / Vertex)
What it is. Anthropic Claude via AWS Bedrock. OpenAI GPT-class via Azure OpenAI Service. Google Gemini via Google Vertex AI. Your contract is with AWS, Microsoft, or Google. Data flows within your own cloud account. The hyperscaler's data terms apply (much stronger than the model providers' direct consumer terms), and data stays inside your account boundary.
When it fits. You want frontier model quality without direct SaaS exposure. You're already on AWS, Azure, or GCP and have established procurement and security review processes with the hyperscaler. Most regulated industries, healthcare, financial services, government, accept this posture with appropriate BAAs/agreements.
The trade-offs. You're still dependent on a model provider's roadmap (Anthropic's, OpenAI's, Google's), but the contractual chain runs through the hyperscaler you already have a relationship with. Cost is per-token, similar to direct API. Quality is current frontier.
The cost shape. Same per-token pricing as direct API (no hyperscaler markup at this writing). $2K–$8K/month typical for production agents at mid volume. $10K–$40K at high volume.
Pattern 3: Fully air-gapped
What it is. No internet egress from the agent's deployment environment. No model-provider relationship. Open-weight models on customer hardware in an isolated network.
When it fits. Defense, intelligence, classified workloads. Critical-infrastructure deployments where leakage has safety implications. Some pharma R&D environments. Some financial-services workloads with cross-border restrictions strict enough that even Pattern 1 doesn't pass.
The trade-offs. You lose access to frontier models. You take on the full operational burden. You have a smaller set of model upgrades available (only open-weight models that ship in formats you can self-host). Updates require a controlled in-network distribution process.
The cost shape. Highest. Custom infrastructure, hardened deployment, slower update cadence, smaller talent pool. Reserved for workloads where the alternative is "no AI at all."
A decision tree
Three questions in order:
- Can data leave your cloud account at all? If no, go to Pattern 1 or 3.
- Can data go to a hyperscaler-mediated frontier model under their data terms? If yes, Pattern 2 is usually the right answer, frontier quality, established procurement chain.
- Is air-gapped operation required? If yes, Pattern 3. If you can have internet egress with strict scope controls, Pattern 1 is sufficient and operationally easier.
Most regulated enterprises end up on Pattern 2 for the bulk of workloads, with Pattern 1 for the most sensitive subset and Pattern 3 reserved for narrow exceptions.
The questions legal will ask
Five questions that come up in every security review we've worked through:
- "Where exactly does the data go?" Have an architecture diagram ready. Show the boundary clearly. Name the cloud account.
- "What does the model provider's data-retention policy say?" For Pattern 2, this is "your hyperscaler contract terms apply, not the model provider's direct consumer terms." Have the relevant Bedrock / Azure OpenAI / Vertex agreement excerpts ready.
- "Will the model be trained on our data?" For all three patterns: no. Bedrock / Azure OpenAI / Vertex enterprise terms explicitly prohibit this; open-weight models don't have training pipelines you connect to.
- "How do we audit what the agent did with our data?" Every Quantilus agent ships with structured audit logs, every prompt, every tool call, every output, every approval. The log is the audit trail.
- "What's our exit story?" Pattern 1 is portable across clouds. Pattern 2 is portable across hyperscalers (your prompts and tools work the same way; the gateway changes). Pattern 3 is most locked in by definition, but to your hardware, not anyone else's.
Mixing patterns
Mature deployments often mix patterns within one agent. High-volume / low-sensitivity workloads run on Pattern 2 frontier models for quality and cost. Sensitive workloads route to Pattern 1 open-weight models in the customer VPC. The agent's routing layer picks per request, transparently. This is the pattern that scales, full data control where it matters, full frontier quality where it's allowed.
What this means for your buying process
The deployment posture decision should happen in the first week of an engagement, not at security review three months in. If you're talking to an AI vendor who hasn't asked about data residency or compliance posture by the second meeting, that's information about how they handle production. We do this on every engagement up front. Our security page has the full deployment-model breakdown if you want the longer read.
More reading: private AI & security, What Is Agentic AI?, How Much Does an Enterprise AI Agent Cost?.