Designing AI Applications That Survive Production

The 80/20 of production AI

Almost any team can wire an LLM to a chat box and demo it on a Friday. The hard part starts on Monday, when real users send unexpected inputs, costs balloon, and someone in legal asks where the data goes.

After shipping multiple AI applications into production, I keep returning to the same five-layer architecture. It is not glamorous. It works.

The five layers

Layer	Purpose
1. Intent	Decide what the user is asking for. Classify, route, refuse if out of scope.
2. Context	Fetch the right private data via RAG, structured queries, or tool calls.
3. Reasoning	Call the LLM with a structured prompt. Constrain the output format strictly.
4. Verify	Check the output: schema valid, no PII, no hallucinated entities, within policy.
5. Act	Return to the user, call a tool, or escalate to a human. Log everything.

Evals are your unit tests

In deterministic software, unit tests are how you change code without fear. In AI applications, evals do the same job. An eval is a small dataset of input/expected-output pairs that grades a new prompt, model, or chain.

Build the eval set before the prompt. Treat it as a permanent artifact. Run it in CI on every PR that touches a prompt, a model, or a chain.

📊 What a good eval set looks like

Cover three categories: happy path, edge cases (empty, ambiguous, adversarial), and refuse cases (out-of-scope, unsafe). Aim for 50–200 examples to start, grow as you ship.

Guardrails, not gates

A guardrail is something the system applies on every request. A gate is a one-time approval. Production AI lives or dies on guardrails.

Input guardrails: PII redaction, length limits, prompt-injection patterns.
Output guardrails: schema validation, refusal detection, jailbreak detection.
Cost guardrails: token caps per user, per session, per day. Yes, every layer.
Human-in-the-loop: require approval for irreversible actions (delete, send, pay).

Graceful degradation

Your model provider will have an outage. Your eval scores will drop. Plan for it.

Fall back to a cheaper or smaller model when the primary times out.
Cache common answers with a vector-similarity cache; users tolerate "served from cache" far better than "service unavailable."
Show a clear "AI temporarily unavailable" state instead of a half-broken UI.

Architect tip: Treat the LLM as an unreliable external service from sprint one. The teams who do not, find out at 3 AM when their primary provider rate-limits them.

Observability that helps

For every AI request, log: the input, the resolved context, the prompt template version, the model name, the output, the eval scores, the cost. When something goes wrong (and it will), the trace from input to output is what saves you.

For the bigger paradigm shift, see Thinking in AI. For the developer-tool side, see The AI Pair Programmer.

Designing AI Applications That Survive Production

The 80/20 of production AI

The five layers

Evals are your unit tests

📊 What a good eval set looks like

Guardrails, not gates

Graceful degradation

Observability that helps

Sujith PS

Want to ship something like this on your product?

Table of Contents

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

NestJS Authentication Deep Dive

Claude Code in 2026: What Is Actually New

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

NestJS Authentication Deep Dive

Claude Code in 2026: What Is Actually New

Designing AI Applications That Survive Production

The 80/20 of production AI

The five layers

Evals are your unit tests

📊 What a good eval set looks like

Guardrails, not gates

Graceful degradation

Observability that helps

Related reading

Sujith PS

Want to ship something like this on your product?

Table of Contents

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

NestJS Authentication Deep Dive

Claude Code in 2026: What Is Actually New

Related Insights

Thinking in AI: From Deterministic to Probabilistic Systems

NestJS Authentication Deep Dive

Claude Code in 2026: What Is Actually New