Aaron Erickson’s QCon AI talk shows how deterministic guardrails and purpose‑built multi‑agent hierarchies can turn fuzzy AI into a reliable production service. The session introduces the Llo11yPop framework, time‑series foundation models, and practical integration patterns that let you blend deterministic tooling with stochastic agents while keeping costs and failure modes under control.

Designing AI Platforms for Reliability – Tools for Certainty, Agents for Discovery

In the June 25 2026 QCon AI session, Aaron Erickson (Applied AI Lab, NVIDIA) walked the audience through a pragmatic roadmap for turning experimental AI projects into production‑grade platforms. His thesis is simple: reliability comes from deterministic guardrails that enforce business rules, while discovery comes from agentic layers that explore the unknown. The talk is a deep dive into the Llo11yPop framework, time‑series foundation models, and a set of integration patterns that let you mix and match tools without blowing up your budget.

1. Service Update – From “Vibe‑Checking” to Autonomous Reliability

Service / Component	What changed	Pricing impact
Llo11yPop (NVIDIA internal)	Re‑architected as a multi‑agent platform that separates retrieval agents (question → API call) from analyst agents (determine which question to ask). Added a deterministic off‑ramps layer that forces any LLM‑generated query to pass a whitelist before execution.	Internal cost model shifted from $0.12 / GPU‑hour to $0.09 / GPU‑hour because the off‑ramps reduce wasted compute on failed queries.
Time‑Series Foundation Model (Tessera)	Open‑sourced under the NVIDIA AI Commons license. Includes a pre‑trained transformer for anomaly detection and forecasting on any numeric series.	Free for research; commercial usage billed at $0.001 / inference‑second, a 30 % reduction from the previous $0.0014 rate.
Claude Skills / Tool Selector Agents	Introduced a skill‑registry API that lets you register any internal micro‑service as a “skill”. The registry automatically scores each skill for relevance and cost before an agent picks it.	New tiered pricing: Basic (up to 10 skills, $0.02 / call) and Enterprise (unlimited, $0.015 / call).

These updates illustrate a broader trend: pricing is being tied to deterministic usage. By forcing agents to choose from a curated set of tools, you avoid the runaway compute costs that pure LLM‑only pipelines can incur.

2. Use Cases – Where the Architecture Pays Off

2.1 GPU Fleet Governance

NVIDIA’s internal GPU allocation system mirrors a classic HR workflow: a request for resources, a review, and a provisioning step. Erickson showed how retrieval agents translate a natural‑language request ("I need 1,000 H100s for a month") into a structured API call to the GPU‑Allocator service. An analyst agent then validates the request against budget constraints and policy guardrails. If the request passes, a task agent creates a Jira ticket; if it fails, the system automatically suggests a cheaper configuration.

Integration pattern: Command‑Query Separation – retrieval agents only query, analyst agents only decide, task agents only act. This keeps each LLM interaction small and deterministic, dramatically reducing hallucination risk.

2.2 Scaled Anomaly Detection on Time‑Series Data

Using the newly open‑sourced Tessera model, teams can run a single inference pass over millions of sensor streams to flag outliers. Erickson demonstrated a pipeline where a worker agent iterates over each cluster, applies the model, and pushes any anomaly into a Slack‑Notifier skill. The notifier is a deterministic micro‑service that formats the alert and adds a ticket link.

Pricing note: Because Tessera’s inference cost is per‑second, batch‑processing 100 k streams in a 5‑minute window costs roughly $30, far cheaper than a comparable LLM‑only solution that would have required multiple prompt‑to‑SQL calls.

2.3 Content Organization via Template‑RAG

The Codex system (not to be confused with OpenAI’s Codex) consumes Teams meeting transcripts, runs them through a template‑RAG pipeline, and writes a structured wiki page. A consultant agent checks the output for compliance with corporate style guides, then a director agent decides whether to publish automatically or route for human review.

Integration pattern: RAG‑Driven Generation – retrieve relevant snippets, augment with a prompt template, and finally guard the result with a deterministic validator.

3. Trade‑offs – What to Watch When Mixing Determinism and Stochasticity

Aspect	Deterministic Guardrails	Stochastic Agents
Predictability	High – outcomes are repeatable and easy to audit.	Variable – depends on model temperature and prompt quality.
Cost	Lower on average because off‑ramps stop expensive bad calls early.	Higher if you allow free‑form LLM calls without validation.
Flexibility	Limited to pre‑registered skills; new use‑cases require code changes.	Very flexible – agents can invent new queries or combine tools on the fly.
Failure Modes	Mostly policy violations (e.g., exceeding refund limits).	Hallucinations, malformed SQL, or unsafe tool usage.
Observability	Easy – each guardrail logs a pass/fail event.	Harder – need evaluation pyramids and human feedback loops.

Erickson stressed that the sweet spot lies in a layered approach: start with a deterministic core (guardrails, off‑ramps, eval pyramids) and then sprinkle agentic discovery where the problem space is truly fuzzy (e.g., root‑cause analysis, novel pattern detection). He also warned against “agent sprawl”: having 50 similar agents dramatically raises classification error rates, a phenomenon he likened to a restaurant menu with too many items.

4. Practical Integration Blueprint

Define the tool layer – expose each internal service (GPU allocator, ticketing, monitoring) via a skill‑registry API. Tag each skill with cost, latency, and required permissions.
Build retrieval agents – lightweight LLM wrappers that map user intent to a skill call. Keep prompts short and include examples for the desired API shape.
Add analyst agents – deterministic validators that enforce business rules (budget caps, compliance checks). Use a rule engine like OPA to keep the logic outside the LLM.
Deploy task agents – orchestrators that execute the approved skill calls, handle retries, and emit audit logs.
Implement an evaluation pyramid – unit tests for each skill, integration tests for agent chains, and end‑to‑end synthetic workloads that simulate real traffic.
Feedback loop – expose an up/down button on every UI, feed the signal back into a re‑training pipeline for the retrieval agents.

5. Looking Ahead – From “Tools for Certainty” to “Agents for Discovery”

Erickson closed with a forward‑looking view: the next five years will see world‑model agents that combine time‑series reasoning, protein‑language models, and classic reinforcement learning to navigate complex, multi‑modal environments. The key architectural lesson remains the same – deterministic scaffolding enables stochastic exploration to be safe and cost‑effective.

Takeaway: Treat AI platforms as hybrid stacks. Deterministic services give you the reliability you need for production; agentic layers give you the discovery power you need for innovation. When both are wired together with clear integration patterns and transparent pricing, you finally have a platform that can scale from a single GPU‑allocation request to a fleet‑wide autonomous observability system.

For more details on the open‑source time‑series model, see the NVIDIA AI Commons repository: https://github.com/NVIDIA/ai‑commons/tessera

The skill‑registry API spec is published here: https://docs.nvidia.com/ai/skill‑registry/api

Read the full transcript of Erickson’s talk (including slides) on InfoQ: https://www.infoq.com/presentations/designing-ai-platforms-reliability

#AI Platforms #Agentic Systems #Deterministic Guardrails #Time-Series Models #Nvidia

Designing AI Platforms for Reliability – Tools for Certainty, Agents for Discovery

Designing AI Platforms for Reliability – Tools for Certainty, Agents for Discovery

1. Service Update – From “Vibe‑Checking” to Autonomous Reliability

2. Use Cases – Where the Architecture Pays Off

2.1 GPU Fleet Governance

2.2 Scaled Anomaly Detection on Time‑Series Data

2.3 Content Organization via Template‑RAG

3. Trade‑offs – What to Watch When Mixing Determinism and Stochasticity

4. Practical Integration Blueprint

5. Looking Ahead – From “Tools for Certainty” to “Agents for Discovery”

Comments