William Cotton argues that new domain-specific languages face a chicken-and-egg problem: LLMs write good code in Python and Rust because there is decades of training data and mature tooling, and new languages have neither. His proposed fix leans on documentation, browser runtimes, and language servers rather than waiting for a corpus to accumulate.
William Cotton's essay lays out a problem that anyone shipping a new programming language in 2026 has to confront: large language models are very good at writing code in languages they have seen millions of examples of, and almost useless at languages they have not. The gap is widening, not closing, and that creates a structural disadvantage for anything new.
What's claimed
The core argument is that LLMs have gotten dramatically better at established languages for two reasons that compound each other. First, there is the sheer volume of Python, Rust, Ruby, and similar code scraped into training sets. Second, and Cotton emphasizes this more, there is the tooling that surrounds those languages. Type checkers, linters, language servers, compilers, interpreters, and test harnesses all give an agent immediate feedback. A type checker catches a hallucinated method call before the code ever runs, so the model gets corrected inside the loop rather than producing something that silently fails.
The consequence is a feedback loop. Models write more code in well-supported languages, that code becomes training data, and the next generation of models gets even better at those same languages. A brand-new DSL starts with none of this: no corpus, no tooling, no feedback signal. Cotton's question is direct. What does a new language do to become viable when the dominant way people now write code, prompting an agent, structurally favors the incumbents?
What's actually new
The honest answer in the piece is that the fundamentals have not changed much. Good documentation, good marketing, and good tooling mattered before LLMs and they still matter. What is new is how each of those needs to be shaped to serve an agent as a user, not just a human.
The most concrete idea is generating an agent-facing context file directly from the language binary. Cotton points to a command like webpipe init --codex, which emits an AGENTS.md template that an agent reads before writing any code. This is the same pattern that has spread across the ecosystem over the past year, where projects ship a markdown file describing conventions and APIs so a coding agent has ground truth instead of guessing from a stale training snapshot.
Cotton is candid that his own example, an experimental web application DSL called Web Pipe, starts with an unfair advantage. Web Pipe embeds languages the models already know, including jq, Lua, JavaScript, and SQL. The pipeline syntax wraps familiar pieces, so the model is learning a composition layer rather than an entirely foreign language. He reports getting working demo applications from one-shot prompts in Codex using only the generated AGENTS.md as guidance. That result is worth reading skeptically, since a DSL built mostly from languages the model already understands is close to a best case, and it says little about how a language with genuinely novel semantics would fare.
The tooling argument
The technical heart of the piece is about diagnostics and where they live. Cotton's recommendation is to put compile-time errors, runtime errors, and lint feedback everywhere an agent or human might encounter them, and to make that feedback identical across surfaces.
The pattern he describes is a single binary that acts as both the runtime and the language server. Keeping those in one place means the diagnostic feedback stays consistent between what you see when you run code and what the editor reports while you type. He goes a step further and suggests separating the diagnostic engine from the LSP layer itself. If the diagnostics are a standalone component, you can compile them to WebAssembly and drop them into a browser editor like Monaco. The payoff is that the same red squiggles show up whether you are in a terminal, an IDE, or a web page, with no second implementation to drift out of sync.
This connects to his point about landing pages. WASM has made it cheap to ship a real runtime in the browser, so Cotton argues new languages should target a browser runtime alongside the CLI rather than treating the command line as the only home. An interactive editor at the top of a project page lets a visitor run code in seconds, which he uses on another of his projects, Datafarm. The faster someone can execute something, the lower the barrier to adoption, and the same WASM diagnostics that power the browser editor are the ones the agent context depends on.
Limitations
The essay is a set of tactics from one practitioner, not evidence that the strategy works at scale. Every example comes from Cotton's own projects, and both happen to be unusually well suited to LLM assistance because they reuse familiar embedded languages. A DSL with original control flow, an unusual type system, or domain semantics the model has never encountered would not get the same free lift from an AGENTS.md file, and the piece does not test that harder case.
There is also an unresolved tension. Cotton's whole framing says LLMs reinforce languages that already have large corpora and mature tooling. His proposed countermeasures, an agent context file and a shared diagnostic engine, help an agent use a language in the moment, but they do not generate the training corpus that gives incumbents their durable edge. A model still has not internalized the language; it is being handed a cheat sheet at runtime. Whether that is enough to reach the volume of real-world usage that eventually feeds back into training is exactly the open question, and the article cannot answer it yet.
What the piece does well is reframe the work. For a new language, the agent is now a first-class user, and the documentation and diagnostics you build for humans need a second audience in mind. Cotton expects an explosion of new DSLs over the next few years precisely because the cost of covering these bases keeps dropping. That prediction is plausible, though the same forces that make launching a language easier also make it easier for a hundred others to launch alongside yours, which means surviving may end up being a marketing and ecosystem problem as much as a technical one.
Comments
Please log in or register to join the discussion