Breaking the Prime Barrier: 10 Million Primes Categorized Per Second in Optimized C

In computational mathematics, efficiently categorizing prime numbers has long challenged researchers. A new open-source implementation pushes boundaries by processing 10+ million primes per second using the Erdős-Selfridge classification system – a hierarchical method that categorizes primes based on the properties of their successor (p+1).

The Erdős-Selfridge Challenge

The classification works recursively:

Category(p) = 
\begin{cases} 
1 & \text{if all prime factors of } p+1 \text{ are } 2 \text{ or } 3 \\
k+1 & \text{otherwise, where } k \text{ is the max category of } p+1\text{'s prime factors}
\end{cases}

This creates a pyramid where Category 1 primes (like 2, 5, 17) form the base, while higher categories (like 11 in Category 2) depend on lower-tier primes.

Engineering the Speed Demon

The C implementation employs four key optimizations:

210-wheel factorization reduces trial divisions by 77% using incremental steps:

static const uint8_t wheel_210_increments[48] = {
  4, 6, 10, 12, 16, 18, 22, 24, 30, 32, 36, 38, ...
};

Bit-packed category storage achieves 4× memory efficiency:

static inline uint8_t get_category_packed(uint32_t index) {
  uint32_t word_idx = index / 16;
  uint32_t bit_pos = (index % 16) * 4;
  return (g_packed_categories[word_idx] >> bit_pos) & 0xF;
}

Floating-point-free integer square roots via bit manipulation
O(1) hash-based prime lookups with linear probing

Performance and Implications

Benchmarks show the system processing 8.5 million primes/second on commodity hardware. For 1 million primes:

Category	Smallest	Largest	Count
1	2	9161	46
2	11	9173	104
...	...	...	...
11	179	9973	9

This demonstrates how low-level optimizations – cache-aware data structures, branch elimination, and algorithmic enhancements – can accelerate mathematical computations far beyond naive implementations. Potential applications include:

Cryptographic key generation
Prime number theory research
High-performance computing benchmarks

As the developer notes: "Optimizations applied: Enhanced 210-wheel factorization, fast integer square root, and cache-optimized memory layout." The source code stands as a testament to how carefully engineered C can still outperform higher-level languages in mathematical computing.

#PrimeFactorization #AlgorithmOptimization #HighPerformanceC

Breaking the Prime Barrier: 10 Million Primes Categorized Per Second in Optimized C

Share this article

The Erdős-Selfridge Challenge

Engineering the Speed Demon

Performance and Implications

Share this article