System Definition Brings Software Engineering to AI Coding

AI assistants ship code that runs and still gets the engineering wrong. Sebastian Martinez Torregrosa argues the fix is borrowed from before the AI era: define the system first, then let the machine fill it in, and test against the definition rather than the output.

featured image - System Definition Brings Software Engineering to AI Coding

The demo always works. You describe a feature, the assistant returns forty lines, you paste them in, and the thing runs. That is exactly the problem. Running is the lowest bar a piece of software can clear, and it is the only bar most AI coding tools are built to hit.

Sebastian Martinez Torregrosa, a principal product marketing manager at SUSE with a background spanning engineering, sales, and SAP, makes a sharp distinction in his recent writing: working code and correct engineering are not the same thing. An AI can produce a function that passes its visible behavior while quietly violating every constraint that matters at scale. The code compiles. The architecture rots.

The gap between output and intent

Large language models generate code by predicting what plausibly comes next given a prompt. That is a statement about plausibility, not about correctness inside a specific system. When a model writes a database query, it has no durable knowledge of your connection pooling rules, your tenant isolation boundaries, or the retry semantics your team agreed on six months ago. It produces something that looks like the right shape.

For a throwaway script, the shape is enough. For a service that other services depend on, the shape is a liability. The model optimizes locally, one prompt at a time, while the cost of a bad decision is global and shows up weeks later as a production incident no one can trace back to the autocomplete that caused it.

This is the failure mode Martinez Torregrosa is pointing at. Teams measure AI coding tools by whether the generated snippet runs, then act surprised when the accumulated snippets form a codebase nobody understands and nobody can safely change.

System definition as the missing layer

The proposed answer is not a better model or a longer prompt. It is a discipline that predates generative AI entirely: define the system before you build it, in terms precise enough to be checked.

A system definition states what the software is supposed to be, not just what it should do in one case. It captures the contracts between components, the invariants that must always hold, the boundaries that data is not allowed to cross. In conventional software engineering this lived in design documents, interface specifications, and the heads of senior engineers who reviewed every pull request. AI coding skips that layer because the prompt feels like a complete specification. It almost never is.

When you have an explicit system definition, the relationship with the AI changes. The model is no longer the author of the system. It becomes an implementer working inside boundaries someone else set. You can hand it a contract and ask it to satisfy that contract, and you can verify the result against the contract rather than against your gut feeling that the code looks fine.

Testing the definition, not the snippet

The practical mechanism Martinez Torregrosa describes is what he calls system-definition tests. Ordinary unit tests check that a function returns the expected value for an input. System-definition tests check that the generated code respects the structure it was supposed to live in.

Does the new module talk to the database only through the approved access layer, or did the assistant open a raw connection because that was the most probable next token? Does the API handler enforce the authorization boundary, or did it return data because the prompt did not happen to mention authorization? These are not questions about output values. They are questions about whether the engineering matches the intended architecture.

Sebastian Martinez Torregrosa

This reframing matters because it gives AI-generated code a verification surface that scales. You cannot eyeball ten thousand lines of machine-written code per day. You can write executable rules that say what the system must never do, and run every generated change against them. The tests encode the senior engineer's judgment so the judgment does not have to be present for every commit.

Why this lands now

The timing is not accidental. Adoption of AI coding assistants has moved past novelty into daily workflow for a large share of working developers, and the volume of generated code is now high enough that informal review cannot keep up. Organizations that bolted these tools onto existing processes are discovering that velocity without structure produces technical debt faster than any team in history could have managed by hand.

The broader pattern here is familiar to anyone who watched earlier waves of automation. A new tool removes a constraint, output explodes, and then the discipline that the old constraint quietly enforced has to be rebuilt deliberately. Compilers did not eliminate the need for design. Continuous integration did not eliminate the need for tests. AI code generation does not eliminate the need to define what you are building. It makes that definition more urgent, because the speed of generation outruns the speed of human comprehension.

There is a skeptical reading worth holding onto. System definition is more work upfront, and the entire pitch of AI coding has been that it removes upfront work. Teams chasing the demo-grade speedup will resist writing contracts and definition tests precisely because it feels like friction the tool was supposed to delete. The argument only wins if the cost of undefined systems becomes visible enough to hurt, and for many teams that bill arrives quietly, as slowed-down feature work and rising incident rates that no one attributes to the real cause.

What Martinez Torregrosa is really proposing is that software engineering, the actual discipline rather than the act of typing code, becomes more valuable as code generation gets cheaper. When producing a plausible implementation costs almost nothing, the scarce skill is knowing what the system should be and being able to check that the machine built that and not something that merely runs. The definition is the work now. The code was always the easy part.

For teams trying this in practice, the starting point is modest: pick the contracts that already cause the most pain when violated, write them down as executable checks, and run AI-generated changes against them before review. The definition does not have to be complete to be useful. It has to be enforced.