The Illusion of Correctness: Why Passing Tests Doesn't Guarantee Your Numerical Software Works
Share this article
The Hidden Perils of Numerical Software Validation
In a provocative blog post titled "Ship broken?", computer scientist and performance optimization expert Daniel Lemire delivers a sobering critique of modern software development practices. His central thesis challenges a fundamental assumption: that software passing test suites is functionally correct. Lemire argues this false sense of security is particularly dangerous for numerical software handling floating-point operations.
"We are stuck with the uncomfortable truth: we cannot fully test our numerical code," Lemire states, highlighting the mathematical impossibility of exhaustively testing even moderately complex functions across all possible floating-point inputs. This limitation stems from the astronomical number of potential input combinations—far exceeding what any test suite could realistically cover.
The consequences ripple across domains:
- AI/ML systems risk producing invalid results due to untested edge cases in matrix operations
- Scientific computing faces reproducibility crises when undiscovered numerical errors skew results
- Financial software could generate catastrophic miscalculations from unhandled rounding behaviors
- Safety-critical systems (aerospace, medical devices) inherit hidden mathematical vulnerabilities
Lemire identifies three systemic failures:
- False equivalence: Developers wrongly assume passing tests equals correctness
- Combinatorial impossibility: The 64-bit floating-point space contains 2⁶⁴ possible values—making exhaustive testing unfeasible
- Toolchain limitations: Compiler optimizations and hardware variations can unexpectedly alter floating-point behaviors
Instead of relying solely on testing, Lemire advocates for:
// Example of problematic floating-point comparison
if (a == b) { // Dangerous in floating-point
// Use threshold-based comparison instead
}
- Formal verification methods for critical algorithms
- Community-maintained databases of edge cases and failure modes
- Specialized tools like AFL, libFuzzer, and Valgrind for numerical fuzzing
- Conservative compiler flags (
-ffloat-store) to prevent optimization-induced errors
The article serves as a wake-up call for developers working with mathematical computations: "We must acknowledge that our software might be broken even when all tests pass." As numerical software underpins increasingly critical systems, Lemire's analysis elevates the conversation from mere testing to verifiable correctness—a crucial distinction with far-reaching implications for software reliability.
Source: Ship broken? by Daniel Lemire