Breaking the Prime Barrier: 10 Million Primes Categorized Per Second in Optimized C
#Privacy

Breaking the Prime Barrier: 10 Million Primes Categorized Per Second in Optimized C

LavX Team
2 min read

A high-performance C implementation of the Erdős-Selfridge prime classification algorithm achieves unprecedented speeds through wheel factorization, cache optimization, and bit-packing. This computational breakthrough demonstrates how algorithmic innovations can dramatically accelerate mathematical operations critical for cryptography and number theory.

Article Image

In computational mathematics, efficiently categorizing prime numbers has long challenged researchers. A new open-source implementation pushes boundaries by processing 10+ million primes per second using the Erdős-Selfridge classification system – a hierarchical method that categorizes primes based on the properties of their successor (p+1).

The Erdős-Selfridge Challenge

The classification works recursively:

Category(p) = 
\begin{cases} 
1 & \text{if all prime factors of } p+1 \text{ are } 2 \text{ or } 3 \\
k+1 & \text{otherwise, where } k \text{ is the max category of } p+1\text{'s prime factors}
\end{cases}

This creates a pyramid where Category 1 primes (like 2, 5, 17) form the base, while higher categories (like 11 in Category 2) depend on lower-tier primes.

Engineering the Speed Demon

The C implementation employs four key optimizations:

  1. 210-wheel factorization reduces trial divisions by 77% using incremental steps:
static const uint8_t wheel_210_increments[48] = {
  4, 6, 10, 12, 16, 18, 22, 24, 30, 32, 36, 38, ...
};
  1. Bit-packed category storage achieves 4× memory efficiency:
static inline uint8_t get_category_packed(uint32_t index) {
  uint32_t word_idx = index / 16;
  uint32_t bit_pos = (index % 16) * 4;
  return (g_packed_categories[word_idx] >> bit_pos) & 0xF;
}
  1. Floating-point-free integer square roots via bit manipulation
  2. O(1) hash-based prime lookups with linear probing

Performance and Implications

Benchmarks show the system processing 8.5 million primes/second on commodity hardware. For 1 million primes:

Category Smallest Largest Count
1 2 9161 46
2 11 9173 104
... ... ... ...
11 179 9973 9

This demonstrates how low-level optimizations – cache-aware data structures, branch elimination, and algorithmic enhancements – can accelerate mathematical computations far beyond naive implementations. Potential applications include:

  • Cryptographic key generation
  • Prime number theory research
  • High-performance computing benchmarks

As the developer notes: "Optimizations applied: Enhanced 210-wheel factorization, fast integer square root, and cache-optimized memory layout." The source code stands as a testament to how carefully engineered C can still outperform higher-level languages in mathematical computing.

Comments

Loading comments...