An exploration of xorshift128, a fast pseudorandom number generator whose internal state is trivially recoverable from just four outputs, illustrating the critical distinction between statistical quality and cryptographic security in RNG design.
Random number generators occupy a peculiar position in computing. We need them to behave unpredictably, yet they are entirely deterministic machines. The art lies in making that determinism invisible to casual observation while remaining computationally infeasible to reverse. Some generators, like xorshift128, fail spectacularly at the latter requirement despite passing the former with flying colors.
The xorshift128 algorithm, implemented in the code above, maintains four 32-bit integers as its internal state. Each call to the generator performs a series of bit shifts and XOR operations, then rotates the state forward like a conveyor belt: the output becomes the new value of a, a becomes b, b becomes c, and c becomes d. The old value of d serves as the seed for the transformation.
What makes this particular implementation remarkable is not its speed or elegance, but how directly the internal state maps to observable outputs. The four most recent values, read in reverse order, are the state. If you observe four consecutive outputs, you know exactly what the generator will produce next. There is no transformation to undo, no mathematical puzzle to solve. The state is simply laid bare.
The chart in the original post illustrates this beautifully. The relationship between outputs and state is one-to-one and immediate. This is not a security vulnerability waiting to be exploited through clever analysis; it is an architectural choice, almost an afterthought in the generator's design.
This predictability stands in stark contrast to the generator's statistical properties. Xorshift128, like Mersenne Twister and lehmer64 before it, produces numbers that pass standard statistical tests with ease. The sequences it generates appear random to any reasonable battery of tests. The bits tumble and mix in ways that satisfy the demands of Monte Carlo simulations, game development, and statistical sampling. For these applications, xorshift128 is perfectly adequate. Its speed is excellent, its memory footprint minimal, and its output passes every test you might reasonably run against it.
Yet this same generator would be catastrophic for any security-sensitive application. Imagine using xorshift128 to generate encryption keys, session tokens, or nonces. An attacker who observes even a handful of outputs could predict every future value with certainty. The randomness is an illusion, maintained only for those who do not know where to look.
This reveals a fundamental tension in random number generation. The properties that make a generator useful for simulation—speed, simplicity, verified statistical quality—are precisely the properties that make it unsuitable for cryptography. A cryptographically secure PRNG (CSPRNG) must satisfy additional requirements: its internal state must be infeasible to recover from outputs, and its outputs must be indistinguishable from true randomness even to an adversary who has seen many previous outputs.
Generators like ChaCha20 satisfy these requirements through careful design and substantial computational cost. They are slower than xorshift128 by an order of magnitude, and they consume more memory. These tradeoffs are acceptable when security is at stake, but they would needlessly slow down a Monte Carlo simulation running millions of iterations.
The PCG64 generator mentioned in the post occupies an interesting middle ground. Its state recovery is far more complex than xorshift128's trivial exposure, requiring sophisticated mathematical analysis and considerable compute time. However, it remains predictable in principle, which disqualifies it from cryptographic use. The author notes that recovering PCG64's state requires "thousands of hours of compute time"—impressive, but not impossible for a determined adversary with sufficient resources.
The lesson for practitioners is straightforward: choose your random number generator based on the threat model. For games, simulations, and statistical work, any of the generators discussed here will serve well. For anything security-related, reach for a CSPRNG designed for that purpose. The performance penalty is real but manageable, and the alternative is catastrophic.
The broader insight is that randomness is not a binary property but a spectrum of guarantees. A generator can be predictable to one adversary (someone with modest computational resources) but unpredictable to another (someone willing to spend millions on specialized hardware). Understanding where your requirements fall on this spectrum is essential to making good engineering decisions.


Comments
Please log in or register to join the discussion