Build agents, not pipelines

A deep dive into when to structure LLM‑driven applications as deterministic pipelines versus flexible agents, covering predictability, context handling, cost, and future‑proofing.

What happened

Developers have been building two kinds of LLM‑backed programs:

Pipelines – the control flow is hard‑coded in your own code. You gather data, feed it to a model, parse the response, and act on it.
Agents – you expose a set of tools (read‑file, send‑email, web‑search, etc.) and let the LLM decide when and how to call them.

Both approaches can solve a simple "gather context → summarize → email" task, but the choice becomes critical once the problem grows beyond a few paragraphs of input or requires iterative reasoning.

Why developers care

Predictability vs. flexibility

Pipelines are deterministic. One prompt → one model call → one output. That makes latency and cost easy to bound, which is essential for high‑throughput services.
Agents keep the LLM in the loop until it decides it’s done. The number of turns can vary from a handful to hundreds, so latency and price can swing dramatically.

If you’re building a SaaS that must handle thousands of requests per second, the unpredictable tail of an agentic loop can become a nightmare. On the other hand, when the task is hard – think code generation, complex data analysis, or multi‑step troubleshooting – the extra reasoning steps an agent can take often mean the difference between a useful answer and a dead end.

Context gathering

A pipeline gets all the context up front because the model only sees a single prompt. That forces you to solve a hard problem: what data is actually relevant? In practice teams resort to:

AST walks to locate code fragments.
Retrieval‑augmented generation (RAG) with semantic embeddings.
Hand‑crafted heuristics.

These tricks are brittle and frequently miss the mark. An agent can simply call a read_file or search tool when it realizes it needs more information, mirroring how a human would work.

Multi‑model pipelines

Pipelines let you cherry‑pick models per step – a cheap model for summarisation, a larger one for reasoning. Agents, at least today, stay tied to a single model for the whole loop. The trade‑off is subtle: many teams try to offload cheap work to a smaller model, but the real signal often lives in the raw data, making the extra indirection unnecessary.

Local‑model constraints

Frontier APIs now support 200k‑token windows, so an agent can afford to fetch more data without blowing up cost. Local deployments, however, are usually limited to 6k‑32k tokens. In that regime an agent’s ever‑growing context quickly exhausts VRAM, pushing you back toward a pipeline.

Future‑proofing

Agents are positioned to benefit the most from model improvements. A new model that’s better at tool use or planning can instantly boost an existing agent without any architectural changes. Pipelines get incremental gains – the same prompt runs on a newer model, but the surrounding orchestration stays the same.

Community response

Hacker News threads are split. Some users praise agents for their "human‑like" problem solving, citing tools like Claude Code, GitHub Copilot, and Cursor that already ship as agents. Others warn about runaway costs and suggest hybrid designs: a cheap pipeline for bulk triage plus a fleet of agents for deep‑dive cases.
r/programming discussions often bring up safety. The consensus is that both designs need input sanitisation and explicit guardrails; agents are not uniquely vulnerable to prompt injection – any data fed to the model can be poisoned.
Open‑source projects such as LangChain and AutoGPT are adding more declarative pipeline‑style primitives, reflecting a desire for the best of both worlds.

Practical guidelines

When to pick a pipeline	When to pick an agent
Strict latency or cost budgets (e.g., processing millions of emails per day)	Problems where the required context cannot be known ahead of time
Need to run on local models with limited context windows	Tasks that benefit from iterative reasoning (coding assistants, complex troubleshooting)
You already have reliable data‑assembly pipelines (e.g., mature RAG index)	You want the system to adapt to new data sources without rewiring the code
You want maximum traceability of each step	You prefer a simpler implementation that lets the LLM orchestrate itself

In practice many teams start with an agent for rapid prototyping, then extract stable sub‑flows into pipelines once the pattern solidifies.

A concrete example

Imagine a security team that must flag suspicious emails. A pipeline could:

Pull the last 10 kB of each email.
Feed it to a classification model.
If the score exceeds a threshold, write a row to a database.

An agent‑augmented system would run the same cheap pipeline for the bulk of traffic, but for any flagged message it would spin up an agent that:

Calls a search_web tool for recent phishing trends.
Retrieves the sender’s recent correspondence via read_email_thread.
Summarises findings and drafts a response for a human analyst to approve.

The pipeline guarantees throughput; the agents provide depth where it matters.

Bottom line

Use pipelines when you need hard guarantees on latency, cost, or local‑model resource usage.
Use agents when the problem space is too fuzzy for a static prompt and you can tolerate some variability in execution time.
When in doubt, start with an agent – it’s often easier to carve out deterministic sub‑steps later than to retrofit an agentic loop into a rigid pipeline.

The distinction between pipelines and agents was popularised by Anthropic’s “Building effective agents” (Dec 2024). Since then, advances in tool‑calling have made agents more practical, but the core trade‑offs remain the same.

#LLMs #AI #Machine Learning