LLM‑generated snippets can spin up a login page in minutes, but without the systematic decisions that engineers make—defining invariants, handling edge cases, and modeling failure modes—those snippets quickly become fragile. This article breaks down what engineering adds beyond raw code, shows where AI‑generated output typically fails, and offers a practical workflow for integrating LLM assistance without sacrificing system integrity.

Why “Vibe Coding” Falls Short of Real Engineering

What the hype promises

A large language model can produce a Flask app that lets users sign up and log in in a handful of seconds. The output looks complete: routes, a SQLite schema, and a few HTML templates. For a demo, that is impressive.

What engineering actually delivers

Before the first line of Python is typed, engineers answer a set of questions that shape the whole system:

Phase	Typical questions
Problem framing	Who are the users? What are the business goals?
Requirements engineering	Which invariants must hold? What are the edge cases?
System modelling	How does data flow? What state transitions exist?
Architectural design	Where are the boundaries? What are the failure modes?
Non‑functional specs	What latency, reliability, and security levels are required?
Risk identification	Which dependencies could break the system?
Interface design	What contracts do components expose?
Planning	How is work broken into deliverable units?

These steps happen before any code appears. They are the decisions that keep a system coherent when it meets real traffic, regulatory scrutiny, or evolving feature sets.

The engineering work that “vibe coding” skips

Decision area	Engineer’s answer	LLM output
Invariants	Email addresses must be unique across the user table.	No check for uniqueness; duplicate rows are possible.
Identity rules	A user is identified by a UUID, not by mutable fields.	Uses email as the sole identifier.
Constraints	Passwords must meet complexity requirements.	Generates a plain‑text password field without validation.
Failure modes	Account lockout after repeated failed logins.	No handling of brute‑force attempts.
Coupling & sequencing	Registration must precede email verification.	Sends a welcome email before the account is persisted.
State transitions	Account can be active, suspended, or deleted.	Only a single boolean `active` flag.
Interfaces & contracts	API returns standardized error objects.	Returns raw exception messages.
Boundaries	System never stores passwords in clear text.	Stores raw passwords in the SQLite DB.
Error handling	All database errors are logged and mapped to user‑friendly responses.	Uncaught exceptions crash the Flask process.

Missing any of these decisions can turn a working demo into a production nightmare.

A concrete example: missing uniqueness

Prompt: “Add user accounts to a website so people can log in.”
The model returns a Flask app with a users table that has a plain email column but no UNIQUE constraint. If two users sign up with the same address, the database stores both rows. Later, a password‑reset request that filters by email will affect both accounts, potentially locking out legitimate users and violating privacy regulations. The problem only surfaces when the system is used, not when the code is first run.

Why AI‑generated code stalls in production

Hidden assumptions – The model assumes a “happy path” where inputs are well‑formed and unique.
No domain model – It does not build an internal representation of entities, relationships, or business rules.
No risk analysis – Threats such as injection attacks, race conditions, or data loss are never considered.
No contract enforcement – Generated APIs lack versioning, schema validation, or clear error semantics.
No performance budgeting – The code may make a blocking DB call on every request, which becomes a bottleneck under load.

When these gaps are later filled by hand, developers spend time retrofitting checks, rewriting data models, and adding layers of validation—exactly the work that should have been done up front.

A practical workflow that combines LLM speed with engineering rigor

Define invariants and constraints in a short checklist before you ask the model to write code. Capture them in a Markdown file or a ticket.
Prompt with explicit requirements – include the invariants, identity rules, and failure‑mode expectations in the prompt. Example: “Generate a Flask registration endpoint that enforces unique email addresses, stores passwords with bcrypt, and returns JSON error objects on validation failure.”
Review the generated schema – verify that database constraints (e.g., UNIQUE, NOT NULL) match the checklist.
Add automated tests that target the edge cases you listed. Tests become the safety net for anything the model missed.
Integrate static analysis (e.g., bandit for security, mypy for type safety) into the CI pipeline.
Iterate – if the model fails to satisfy a requirement, refine the prompt or write the missing piece manually.

By treating the LLM as a code‑completion tool rather than a substitute for design, teams keep the speed advantage while preserving system integrity.

Bottom line

LLM‑generated snippets are great for prototypes, learning, or scaffolding. They do not replace the systematic decisions that make a system safe, maintainable, and scalable. The real value of AI in software lies in augmenting engineers—handling boilerplate, suggesting patterns, and surfacing alternatives—while the engineer remains responsible for defining invariants, constraints, and failure handling.

Further reading

The Big AI Gains Come From Teams, Not Individuals – explains how collaborative workflows amplify productivity.
Agents Cannot Maintain Systems – a look at why autonomous agents still need human oversight.
Latency Is Architectural – discusses how performance considerations belong in the design phase, not after code is written.

If you found this analysis useful, consider subscribing to the Phroneses newsletter for more deep dives into AI‑assisted engineering.

#AI-assisted coding #Software Engineering #LLM #Flask #best practices

Why “Vibe Coding” Falls Short of Real Engineering

Why “Vibe Coding” Falls Short of Real Engineering

What the hype promises

What engineering actually delivers

The engineering work that “vibe coding” skips

A concrete example: missing uniqueness

Why AI‑generated code stalls in production

A practical workflow that combines LLM speed with engineering rigor

Bottom line

Comments