Breaking the Prime Barrier: 10 Million Primes Categorized Per Second in Optimized C
Share this article
In computational mathematics, efficiently categorizing prime numbers has long challenged researchers. A new open-source implementation pushes boundaries by processing 10+ million primes per second using the Erdős-Selfridge classification system – a hierarchical method that categorizes primes based on the properties of their successor (p+1).
The Erdős-Selfridge Challenge
The classification works recursively:
Category(p) =
\begin{cases}
1 & \text{if all prime factors of } p+1 \text{ are } 2 \text{ or } 3 \\
k+1 & \text{otherwise, where } k \text{ is the max category of } p+1\text{'s prime factors}
\end{cases}
This creates a pyramid where Category 1 primes (like 2, 5, 17) form the base, while higher categories (like 11 in Category 2) depend on lower-tier primes.
Engineering the Speed Demon
The C implementation employs four key optimizations:
- 210-wheel factorization reduces trial divisions by 77% using incremental steps:
static const uint8_t wheel_210_increments[48] = {
4, 6, 10, 12, 16, 18, 22, 24, 30, 32, 36, 38, ...
};
- Bit-packed category storage achieves 4× memory efficiency:
static inline uint8_t get_category_packed(uint32_t index) {
uint32_t word_idx = index / 16;
uint32_t bit_pos = (index % 16) * 4;
return (g_packed_categories[word_idx] >> bit_pos) & 0xF;
}
- Floating-point-free integer square roots via bit manipulation
- O(1) hash-based prime lookups with linear probing
Performance and Implications
Benchmarks show the system processing 8.5 million primes/second on commodity hardware. For 1 million primes:
| Category | Smallest | Largest | Count |
|---|---|---|---|
| 1 | 2 | 9161 | 46 |
| 2 | 11 | 9173 | 104 |
| ... | ... | ... | ... |
| 11 | 179 | 9973 | 9 |
This demonstrates how low-level optimizations – cache-aware data structures, branch elimination, and algorithmic enhancements – can accelerate mathematical computations far beyond naive implementations. Potential applications include:
- Cryptographic key generation
- Prime number theory research
- High-performance computing benchmarks
As the developer notes: "Optimizations applied: Enhanced 210-wheel factorization, fast integer square root, and cache-optimized memory layout." The source code stands as a testament to how carefully engineered C can still outperform higher-level languages in mathematical computing.