The Case for Seeds as the Universal Storage Format for ML-KEM Keys
#Security

The Case for Seeds as the Universal Storage Format for ML-KEM Keys

Tech Essays Reporter
5 min read

As NIST finalizes the ML-KEM specification, the cryptography community faces a critical implementation decision: whether to use seeds or expanded formats for storing private keys. This analysis examines why the 64-byte seed format offers compelling advantages in size, security, and simplicity over the much larger expanded keys.

The recent publication of NIST's final ML-KEM specification (FIPS 203) marks a significant milestone in post-quantum cryptography standardization. Among the various technical considerations, one implementation choice stands out as particularly consequential: how we choose to store and handle private decapsulation keys. In this evolving landscape, I believe the cryptography engineering community should converge on a single, optimal approach: using seeds as the universal storage format for ML-KEM keys.

The contrast between these two approaches couldn't be more striking. A seed represents a private key in a compact 64-byte format, while an expanded decapsulation key ranges from 1,632 to 3,168 bytes depending on the ML-KEM parameter set selected. This difference alone might suggest that the expanded format offers some cryptographic advantage, yet the reality reveals a more nuanced picture where seeds demonstrate clear superiority.

The most immediate advantage of seeds lies in their simplicity. A seed, by its very nature, is always valid. There are no complex validation routines required, no intricate checks to ensure consistency between different components of the key. This stands in stark contrast to the expanded format, which FIPS 203, Section 7.3, requires to pass the validation check H(dk[384𝑘 : 768𝑘 + 32])) == dk[768𝑘 + 32 : 768𝑘 + 64] to verify that the pre-computed hash matches the encapsulation key itself.

Beyond this basic validation, the expanded format introduces a labyrinth of potential edge cases. The decapsulation key expanded format consists of ByteEncode₁₂(s) || ByteEncode₁₂(t) || ρ || H(ekPKE) || z, where s and t are vectors of NTT elements. Each NTT element contains field elements encoded as 16-bit integers representing numbers between zero and 3,329. What happens when an encoded field element exceeds this maximum value? The specification provides clear guidance for rejecting such values in encapsulation keys, but becomes ambiguous when these values appear within a decapsulation key's components. This entire validation complexity simply evaporates when using seeds, as the implementation derives all necessary values internally, guaranteeing their validity.

Some might argue that the expanded format offers performance benefits, but this proves to be a misconception. My measurements show that expanding a seed on an M2 processor takes 40µs, while loading 2,400 bytes over a Gigabit connection takes 19µs. This comparison, however, overlooks that the expanded key still requires hash verification and matrix expansion—operations that would have been performed during seed expansion anyway. The performance difference becomes even more negligible when considering the typical usage patterns of private keys: either they're ephemeral (in which case expansion happens once) or they're reused frequently (in which case the cost is amortized).

The expanded format also represents an awkward compromise between wire format and in-memory representation. It doesn't include the full expanded matrix A but only its seed ρ, creating a hybrid state where some values are pre-expanded while others remain compressed. This neither optimizes for transmission nor for efficient in-memory operations, leaving us with a solution that excels at neither.

From an implementation perspective, the benefits of standardizing on seeds become increasingly apparent. By supporting only the seed format, we eliminate the need for complex parsing and validation logic, reducing the potential for implementation errors. This approach aligns with the principle that edge cases should either be common enough to test with random inputs or so rare that they can safely trigger program termination—a philosophy that ML-KEM, with its well-defined parameter spaces, appears to satisfy.

The question of interoperability looms large. If implementations support both formats, we risk creating compatibility issues and maintaining dual code paths unnecessarily. The expanded format, despite its explicit allowance in FIPS 203, introduces complexity without commensurate benefits. By agreeing to use seeds exclusively, we can streamline implementations and reduce the potential for subtle bugs that might emerge from handling multiple formats.

Fortunately, we're already seeing movement in this direction. My implementation at filippo.io/mlkem768 and likely the Go standard library are adopting this approach, relegating expanded key parsing to internal functions rather than exposing them in public APIs. This strategy ensures that test vectors in frameworks like CCTV and Wycheproof can focus on seed inputs while still allowing reproduction of edge cases through brute-forced seeds when necessary.

Looking ahead, standardization efforts like draft-ietf-lamps-kyber-certificates-03 will play a crucial role in cementing this approach. The OID assignments already made by NIST for all ML-KEM parameters suggest that the ecosystem is moving toward standardization, though the eventual key format specification remains to be seen. The hope is that it will settle on the 64-byte seed format, providing the clarity and consistency our implementations need.

Featured image

The Fanal forest in Madeira, with its ancient, twisted trees, serves as a reminder that sometimes the most elegant solutions are those that have stood the test of time. In the rapidly evolving field of post-quantum cryptography, adopting seeds as the universal storage format for ML-KEM keys represents such an elegant solution—simple, secure, and efficient.

As we implement these post-quantum algorithms, we have an opportunity to learn from past cryptographic engineering practices. The transition to specifying APIs in terms of bytes over the past two decades represents one of the most significant progress in our field. By standardizing on seeds, we continue this tradition of clarity and precision, ensuring that implementations remain robust and interoperable.

The path forward seems clear: let's agree to use seeds as ML-KEM keys, eliminate the complexity of expanded formats, and build implementations that are simpler, more secure, and more maintainable. In doing so, we strengthen the entire post-quantum ecosystem and accelerate the adoption of these critical cryptographic technologies.

Comments

Loading comments...