The Agent Becomes a Primitive: Why Startups Are Deleting Their Custom Harnesses
#AI

The Agent Becomes a Primitive: Why Startups Are Deleting Their Custom Harnesses

Trends Reporter
7 min read

A growing chorus of builders argues that hand-rolling an agent harness is wasted effort, and that hosted runtimes like Hermes, LangChain's Managed Deep Agents, and Claude Managed Agents now ship the features teams used to spend weeks reinventing. The pitch is seductive. The counter-arguments are worth hearing too.

A pattern is forming in how developers talk about building AI agents, and it cuts against a lot of the engineering pride that defined the last two years. The argument, made bluntly in a recent essay from the team behind prismvideos.com, goes like this: stop building your own agent harness. Host an existing one, hand it tools, skills, and a system prompt, and spend your time on the parts of the product only you can build.

Featured image

It is a striking reversal from the period when every serious AI team wrote its own orchestration loop, its own memory layer, its own tool-calling glue. The essay's author describes shipping a media generation agent on the Vercel AI SDK, feeling good about it, and then watching a competitor, Higgsfield, launch an agent called Supercomputer with cross-session memory, skills, automations, a sandboxed computer, and a persistent filesystem. The kicker: Supercomputer was not built on any of the popular SDKs. It ran on Hermes, described in the post as an open-source personal agent with more than 185,000 GitHub stars. The realization that followed is the whole thesis. If a single open runtime hands you session management, built-in tools, skills, self-learning, and automations, then re-implementing those things is not differentiation. It is overhead.

The trend: agents as infrastructure, not application code

The specific mechanics in the prismvideos account are worth reading because they show how thin the new wrapper can be. The team deleted their existing agent, stood up an EC2 instance running a Hono server, and had that server spin up a Hermes agent inside a Docker container per customer. The server doubles as a reverse proxy, shuttling messages between the app and the agent over a WebSocket. What used to be the hard part, the agent itself, became a managed dependency. What remained was the work that actually belongs to prismvideos: the system prompt, the MCP tools for choosing and invoking media models, the skills files describing how to make UGC videos and storyboards, and the connectors to Meta Ads Manager, Google Drive, and Resend.

The author frames the developer experience as a single API call. You POST a deployment describing the customer, the runtime, the model, the system prompt, a sandbox configuration, your MCP servers, your skills, your secrets, and a set of feature flags, and you get back a deployment ID plus an SSE endpoint to chat with. Skills can be supplied inline, by file path, or by URL. The response hands you a workspace path and an event stream. Bring a prompt, some tools, and connectors, get a running agent. The line that captures the mood: harness-engineering should not be one of the chores of building an agent people use.

This is not one company talking to itself. The same conclusion shows up in the product roadmaps of larger players, which is usually the clearest adoption signal there is. LangChain has launched Managed Deep Agents, a hosted runtime where developers bring a system prompt, MCP tools, skills, and subagent definitions and receive a chat-ready agent. Anthropic has launched Claude Managed Agents, which similarly bundles the agent and its infrastructure into one API call. When the framework vendors themselves start offering the managed runtime, the message is that the orchestration loop is becoming a commodity layer rather than a moat.

The evidence underneath the claim

Strip away the marketing and there is a real engineering observation here. Most agent applications need the same seven or eight things: session management, tool calling, memory, some form of self-learning, automations, a persistent filesystem, sandboxed deployment, skills, and MCP server wiring. The prismvideos post lists these and points out that numbers one through seven are present in essentially every agent product. If that inventory is universal, then building it yourself is, by definition, building the undifferentiated part of your stack.

The comparison table in the essay is the most concrete piece of evidence, and also the place to apply the most skepticism, because it was written by a vendor ranking itself first. By their accounting, the Hermes-based offering supports image and video input, automations, dreaming, a persistent-goal loop they call the Ralph Wiggum loop, and steering. LangChain Managed Deep Agents, they say, lacks automations, built-in self-learning, and persistent goals. Claude Managed Agents has self-learning in research preview but, per the post, does not expose automations or persistent goals and cannot accept video inputs, a limitation the author attributes to the underlying models rather than the runtime. Provider lock-in is the one axis where the open-runtime and LangChain options claim an advantage over the Anthropic-hosted version, since a model-agnostic harness lets you swap the model underneath.

The broader signal feeding all of this is consumer expectation. The author makes the point that as Claude, ChatGPT, and Manus add memory and longer-horizon behavior, business customers start asking for the same capabilities in B2B software. When the Claude app gains memory, a CEO sees it and wants it in the internal tool. That ratchet is real, and it explains why teams feel they cannot afford to fall behind on harness features that arrive every few months.

The counter-perspectives

The consensus forming around managed agents deserves the same scrutiny the author applies to custom harnesses. A few objections are worth taking seriously.

First, there is the risk you are trading one dependency for a deeper one. Deleting your harness and routing every customer through per-container Hermes instances means your product's reliability, latency, and security posture now ride on an open-source project and your own orchestration of it. The author celebrates getting memory, skills, and automations "for free," but free features you do not understand are features you cannot debug at three in the morning. A custom harness is more work, and it is also a thing you fully control.

Second, the lock-in argument cuts both ways. The post praises model-agnostic runtimes for avoiding provider lock-in, which is fair. But standardizing on a particular harness, its skill format, its memory model, and its automation primitives is its own kind of lock-in. The essay even predicts that a new harness will arrive after Hermes with some must-have feature, and that the current hot item is a built-in learning loop. If the harness layer churns that fast, betting your architecture on this season's winner carries the same obsolescence risk the author warns about for teams that build their own.

Third, the per-customer container model has real cost and operational weight. Running a Docker container per user, each holding a persistent filesystem and a long-lived agent, is a different scaling problem than a stateless API. For a media generation product where customers run occasional heavy jobs it may pencil out. For a high-volume, low-margin chat feature it might not. The single-API-call demo hides a fleet of containers someone has to keep warm, patch, and isolate.

The strongest version of the author's case is not actually about any specific runtime. It is the claim that an AI startup almost never gets rich by having the best harness for a narrow use case, and that durable value comes from integrating with a customer's proprietary data and learning their preferences. That argument stands on its own regardless of whether you adopt Hermes, LangChain, Claude Managed Agents, or keep your hand-built loop. If the harness is genuinely becoming a commodity, the rational move is to spend your scarce engineering attention above it, on the connectors, the domain skills, and the data relationships a competitor cannot copy by shipping a new feature flag.

That is the part of the essay that will age well even if the named products do not. The specific table of who supports dreaming and steering today will be stale within a quarter. The underlying observation, that the agent is becoming a primitive you assemble rather than a system you author from scratch, looks like a direction the whole field is moving, with the major framework vendors now validating it through their own managed offerings. The open question the community has not settled is how much control teams are willing to surrender to get there, and whether the answer changes the first time a managed runtime has a bad week.

Comments

Loading comments...