A developer’s sigh of relief after a successful deployment often hides hidden fragility. This article explains why “it works” is a weak guarantee, illustrates common patterns of incidental correctness, and shows how systematic testing and refactoring can turn shaky code into dependable services.
When “It Works” Is Not Enough

The Problem: “It works” masks hidden brittleness
A teammate announces it works and the team breathes easy. The bug is gone, the feature is live, the demo passes. That moment feels like a win, but the statement hides a critical gap: works only describes a single observation, not a contract.
Reliability, predictability, and testability are properties that survive variation—different loads, data shapes, or deployment environments. Without them, the code is incidentally correct: it produces the right output for the inputs it has seen, but nothing guarantees it will keep doing so when conditions change.
Typical manifestations of incidental correctness
| Symptom | Why it looks correct | What breaks under pressure |
|---|---|---|
| Two bugs cancel each other | Function A returns a wrong value that Function B expects | Fix either bug and the whole flow collapses |
| Edge‑case blind spot | Current test set covers only happy‑path inputs | Unexpected formats, larger payloads, or different locales produce silent errors |
| Implicit index reliance | Query returns rows in the expected order because a specific index exists | Dropping the index changes ordering, causing downstream failures |
| Timing assumptions | A retry loop usually succeeds within three attempts | Spike in downstream latency exhausts retries, leading to timeouts |
| Eventual‑consistency window | Reads typically happen after writes have propagated | High write traffic makes stale reads visible to users |
Each of these cases passes a manual “run‑it‑once” check, yet none offers a guarantee that the behavior will hold when traffic grows, data evolves, or the environment shifts.
The danger of building on shaky foundations
When a module behaves incidentally, developers downstream treat it as a stable primitive. They compose new functions, add features, and embed the original code deep into the call graph. The hidden assumption becomes a structural strut; any change that alters the accidental behavior can cause cascading failures.
Because the original contract was never explicit, the cost of fixing the problem later explodes:
- Discovery often follows an incident – customers notice the failure before the team does.
- Refactoring touches many dependents – every caller must be examined, tested, and possibly rewritten.
- Knowledge loss – the engineers who understood the quirk may have moved on, leaving a knowledge gap.
The result is a module that everyone avoids touching, not because it is inherently complex, but because its true behavior is opaque.
The Solution: Characterization tests followed by intentional refactoring
- Capture current behavior – Write characterization tests that lock in what the code does do today, even if the behavior is undocumented. These tests become a safety net for any later changes.
- Identify the intended contract – Decide what the function should guarantee (e.g., idempotent, order‑preserving, locale‑independent).
- Replace accidental tricks with explicit logic
- Swap a lucky regex for a well‑named parser.
- Add an explicit
ORDER BYclause instead of relying on an index. - Introduce a proper money type that handles currency precision rather than floating‑point arithmetic.
- Use a deterministic lock or a version token to eliminate race conditions.
- Run the test suite – The characterization tests confirm that the refactor did not break existing callers, while new unit/integration tests verify the intended contract.
- Iterate – As the codebase stabilizes, replace the temporary tests with more focused specifications.
With this workflow, the code evolves from it works to it works by design.
Trade‑offs and when “it works” may be acceptable
| Situation | Reason to defer refactor | Risk level |
|---|---|---|
| Low‑traffic internal tool | Limited impact, no immediate budget | Low – but documentation of the fragility is still needed |
| Critical customer‑facing service | High load, many downstream dependencies | High – invest in tests and refactor now |
| Prototype for a short‑lived hackathon | Time constraints outweigh long‑term stability | Medium – clearly label the code as experimental |
Even when resources are tight, the cost of ignorance should be recorded. A brief note such as “relies on index ordering; add ORDER BY before scaling” makes the hidden assumption visible to future maintainers.
Turning the defensive “it works” into an engineering decision
When a teammate says it works as a reason to avoid change, ask for the missing context:
- What exact guarantees does the code provide?
- Which edge cases have been verified?
- Are there tests that would fail if the accidental property changed?
- What is the expected cost of a future failure?
If the answer is “no tests, no contract, just a lucky run,” the appropriate response is to allocate time for characterization tests before any refactor.
Bottom line
It works is the floor of software quality. A system built only on that floor will surprise you when the hidden assumptions are violated. By investing in tests that capture current behavior and then refactoring toward explicit contracts, teams raise the ceiling—delivering software that not only runs today but continues to run under the unknown conditions of tomorrow.
If you’re interested in practical steps for adding characterization tests to an existing codebase, check out the Testing Pyramid guide and the property‑based testing library Hypothesis.

Comments
Please log in or register to join the discussion