Advent of Code 2023: A Deep Dive into Extreme Performance Optimization

When a coding challenge's complexity explodes from 20 inputs to 2 billion, developers must rethink brute-force. One engineer's 10-language odyssey reveals the critical role of algorithmic design, parallelization, and language trade-offs in solving computational extremes.

When faced with an algorithmic problem that scales from manageable to astronomical, developers often hit a wall. Advent of Code 2023's Day 5 challenge delivered precisely this scenario: a deceptively simple seed-mapping problem where Part A processed 20 seeds, while Part B transformed those into 1,975,502,102 seed ranges. At one calculation per second, brute-force would take over 60 years. This computational cliff forced a deep reevaluation of approaches, revealing profound lessons in performance engineering.

The Problem: Seeds and Transformations

At its core, the problem involves transforming input values through layered mappings. Each map defines a source range, target range, and size. If an input value falls within a source range, it's mapped to the corresponding position in the target range; otherwise, it passes through unchanged. The challenge involves applying seven such layers sequentially.

Part A processes individual seeds (e.g., 79 14 55 13), while Part B interprets these as ranges: [79-93], [55-68]. The explosion from 4 values to 27 ranges (in sample data) or billions (in full data) creates a computational nightmare. As the author noted, "whatever way you came up for solving this for part A is probably going to take a very long time to compute the final value."

Brute-Force: The Obvious Trap

The initial approach is straightforward: iterate through each seed, apply all transformations, and track the minimum result. This works for Part A but crumbles under Part B's scale. "If calculating each seed took 1 second," the author calculated, "Part A would take 27 seconds, part B would take 60+ years."

Even with parallelization—distributing calculations across CPU cores—memory becomes a bottleneck. Processing billions of values requires terabytes of RAM, rendering this approach impractical for most systems.

The Breakthrough: Range-Based Transformation

The key insight emerged from reframing the problem: instead of processing individual seeds, transform entire ranges at once. By analyzing how seed ranges intersect with map boundaries, the problem reduces to 13 logical cases. For example:

A seed range entirely within a map source transforms to a new target range.
A seed range partially overlapping a map splits into transformed and unchanged segments.

This approach compresses the 2 billion values into just 10 ranges, slashing computational complexity. "Recoding the problem in python got a correct solution in 0.6s compared to hours and hours brute force," the author reported. Even in faster languages like Rust or C, range-based processing proved 10x faster than brute-force.

Language Showdown: Speed, Safety, and Trade-offs

The author implemented solutions across 10 languages, revealing stark performance and usability differences:

C: Unmatched speed (1m20s with -flto optimization) but verbose. OpenMP parallelization reduced time to 30s, while OpenCL achieved 10s on 28 cores. Memory management complexity was a major hurdle.
C++: Modern features (like std::optional) initially slowed execution 10x due to runtime overhead. Switching to pointers restored parity with C. The STL's std::vector simplified data handling.
Rust: "Parallelisation was a dream" with the rayon crate. A single code change enabled multiprocessing, cutting Rust's time to 10s on 28 cores. Cargo's package management and memory safety were key advantages.
Python: Quick to prototype but slowest. A ctypes-based FFI to a C library bridged the gap, demonstrating interoperability power. Poetry could mitigate packaging friction.
Zig: Promising as a C successor but steep learning curve. Build system complexity and allocator friction were drawbacks, though performance rivaled Rust.
C# & Java: Slower than C# due to Maven's complexity. Nullable types were frustratingly verbose.
F#: Functional design automatically parallelized but consumed all memory with 2 billion values. Imperative code resolved this, but the language's syntax was a barrier.
Go: Pythonic syntax and garbage collection were wins, but visibility rules (capitalized functions) felt arbitrary.
CUDA: GPU acceleration solved the problem in under 10s on an RTX 2000 series (sub-second on a 3080). Setup complexity and debugging challenges were significant.

Critical Lessons

Premature Optimization is Evil, But Design is Sacred: As Knuth warned, optimize only after profiling. However, initial design choices—like range-based processing—determined success. "No solution is perfect," the author cautioned. "Kill your darlings" if a better approach emerges.
Profiling Uncovers Hidden Costs: Python's cProfile revealed that a simple Seed class added 20% overhead. In C++, std::optional created a 10x performance penalty. Always measure before optimizing.
Parallelization Has Limits: While multiprocessing (Python), OpenMP (C/C++), and rayon (Rust) accelerated Part A, Part B required algorithmic compression first. GPU acceleration (CUDA/OpenCL) offered extreme speedups but demanded domain-specific knowledge.
Language Choice Depends on Context: For raw speed, C/Rust/Zig excel. For rapid prototyping, Python/Go shine. For safety, Rust's ownership model is unmatched. "I spend most of my time in C, C++ or Rust," the author noted, "but this problem showed where each shines."

The Bigger Picture: Beyond the Puzzle

While few developers face billion-value computations daily, the lessons resonate widely. "Search and sorting algorithms are so good that even with millions or billions of entries we can get results fast," the author reflected. "But we've got techniques to cache, index and store data to make retrieval almost instant."

The true value lies in the process: "Having a set of 'pet' problems that you can use as a litmus test for a language is a great idea." This deep dive honed skills in profiling, interop, parallelization, and low-level optimization—transferrable to real-world bottlenecks in databases, simulations, or ML pipelines.

As computational demands grow, such extreme problem-solving becomes increasingly relevant. Whether optimizing a database query, a rendering pipeline, or a machine learning model, the principles remain clear: rethink the algorithm, leverage parallelism wisely, and never underestimate the cost of abstraction.