A comprehensive analysis of floating-point number serialization algorithms, examining their performance evolution, implementation trade-offs, and the surprising discovery that none of the current implementations generate the shortest possible decimal strings.
When serializing data to JSON, CSV or when logging, we convert numbers to strings. Floating-point numbers are stored in binary, but we need them as decimal strings. The first formally published algorithm is Steele and White's Dragon schemes (specifically Dragon2) in 1990. Since then, faster methods have emerged: Grisu3, Ryū, Schubfach, Grisu-Exact, and Dragonbox. In C++17, we have a standard function called std::to_chars for this purpose.
A common objective is to generate the shortest strings while still being able to uniquely identify the original number. We recently published Converting Binary Floating-Point Numbers to Shortest Decimal Strings. We examine the full conversion, from the floating-point number to the string. In practice, the conversion implies two steps: we take the number and compute the significant and the power of 10 (step 1) and then we generate the string (step 2). E.g., for the number pi, you might need to compute 31415927 and -7 (step 1) before generating the string 3.1415927. The string generation requires placing the dot at the right location and switching to the exponential notation when needed.
The generation of the string is relatively cheap and was probably a negligible cost for older schemes, but as the software got faster, it is now a more important component (using 20% to 35% of the time). The results vary quite a bit depending on the numbers being converted. But we find that the two implementations tend to do best: Dragonbox by Jeon and Schubfach by Giulietti. The Ryū implementation by Adams is close behind or just as fast. All of these techniques are about 10 times faster than the original Dragon 4 from 1990. A tenfold performance gain in performance over three decades is equivalent to a gain of about 8% per year, entirely due to better implementations and algorithms. Efficient algorithms use between 200 and 350 instructions for each string generated. We find that the standard function std::to_chars under Linux uses slightly more instructions than needed (up to nearly 2 times too many). So there is room to improve common implementations. Using the popular C++ library fmt is slightly less efficient.
A fun fact is that we found that that none of the available functions generate the shortest possible string. The std::to_chars C++ function renders the number 0.00011 as 0.00011 (7 characters), while the shorter scientific form 1.1e-4 would do. But, by convention, when switching to the scientific notation, it is required to pad the exponent to two digits (so 1.1e-04). Beyond this technicality, we found that no implementation always generate the shortest string. All our code, datasets, and raw results are open-source. The benchmarking suite is at https://github.com/fastfloat/float_serialization_benchmark, test data at https://github.com/fastfloat/float-data.
Reference: Converting Binary Floating-Point Numbers to Shortest Decimal Strings: An Experimental Review, Software: Practice and Experience (to appear)

The Historical Context: From Dragon4 to Modern Algorithms
The journey of floating-point to string conversion algorithms began in 1990 with Steele and White's Dragon4 scheme. This algorithm established the foundation for all subsequent work in this domain. The Dragon4 algorithm was designed to produce correctly rounded decimal representations of binary floating-point numbers, ensuring that the conversion was both accurate and reversible.
Over the subsequent decades, researchers have developed increasingly sophisticated algorithms that build upon and improve the original Dragon4 approach. The progression from Dragon4 to modern algorithms represents a fascinating case study in algorithmic optimization and the evolution of computational techniques.
The Modern Landscape of Conversion Algorithms
Several modern algorithms have emerged as leaders in the field of floating-point to string conversion:
Dragonbox by Jeon represents one of the most recent and efficient approaches. This algorithm builds upon the lessons learned from previous implementations while introducing novel optimizations that reduce the computational overhead of string generation.
Schubfach by Giulietti is another highly competitive algorithm that has demonstrated excellent performance characteristics across various test scenarios. The algorithm's name, which means "shotgun" in German, reflects its approach to handling the conversion process.
Ryū by Ulf Adams has gained significant popularity due to its balance of performance and implementation simplicity. The algorithm's design philosophy emphasizes both speed and maintainability, making it an attractive choice for many applications.
Grisu3 and Grisu-Exact represent earlier attempts to improve upon Dragon4, with Grisu-Exact specifically addressing some of the limitations found in the original Grisu3 implementation.
Performance Analysis and Benchmarking
Our comprehensive benchmarking study revealed several important insights about the performance characteristics of these algorithms. The most striking finding is that modern algorithms are approximately 10 times faster than the original Dragon4 implementation from 1990. This represents an average annual improvement rate of about 8%, which is remarkable considering that this progress stems entirely from algorithmic improvements rather than hardware advances.
Modern efficient algorithms typically require between 200 and 350 instructions per string generated. This relatively small instruction count is crucial for achieving high performance, especially in scenarios where large volumes of floating-point numbers need to be converted to strings.
Implementation Analysis: std::to_chars and fmt
Our study examined the performance of standard library implementations, particularly focusing on std::to_chars under Linux. We found that this implementation uses up to nearly twice as many instructions as theoretically necessary, indicating significant room for optimization in commonly used libraries.
The popular C++ library fmt, while widely adopted and generally well-regarded, was found to be slightly less efficient than the optimal implementations we studied. This suggests that even popular, well-maintained libraries may benefit from algorithmic improvements in this domain.
The Shortest String Problem
One of the most intriguing findings of our research is that none of the available implementations consistently generate the shortest possible string representation for floating-point numbers. This discovery has important implications for applications where string size matters, such as network protocols, storage systems, and logging frameworks.
For example, the std::to_chars function converts the number 0.00011 to "0.00011" (7 characters), when the shorter scientific notation "1.1e-4" would be equally valid and more compact. However, the requirement to pad exponents to two digits in scientific notation (resulting in "1.1e-04") complicates the pursuit of the absolute shortest representation.
Practical Implications and Future Directions
The findings from this research have several practical implications for software developers and system architects:
Performance Optimization: Applications that perform extensive floating-point to string conversions can benefit significantly from using modern algorithms like Dragonbox or Schubfach instead of relying on standard library implementations.
Storage Efficiency: The inability of current implementations to always generate the shortest possible strings represents an opportunity for optimization in storage-constrained environments.
Protocol Design: Network protocols that transmit floating-point numbers as strings could potentially achieve better compression and efficiency by implementing custom conversion logic.
Library Development: The identified inefficiencies in std::to_chars and fmt suggest opportunities for library maintainers to improve their implementations.
Open Source Resources
All the code, datasets, and raw results from our study are available as open-source resources:
- Benchmarking Suite: https://github.com/fastfloat/float_serialization_benchmark
- Test Data: https://github.com/fastfloat/float-data
These resources provide a foundation for further research and allow developers to benchmark their own implementations against the state of the art.
Conclusion
The evolution of floating-point to string conversion algorithms demonstrates how focused algorithmic research can yield substantial performance improvements over time. From the original Dragon4 in 1990 to modern implementations like Dragonbox and Schubfasch, the field has seen a tenfold improvement in performance, driven entirely by better algorithms rather than hardware advances.
The discovery that no current implementation consistently generates the shortest possible string representation opens new avenues for research and optimization. As applications continue to process increasingly large volumes of floating-point data, the importance of efficient and compact string representations will only grow.
This research underscores the ongoing relevance of fundamental algorithmic research in computer science and its practical impact on real-world software systems. The availability of open-source benchmarking tools and datasets ensures that this work can serve as a foundation for future improvements in this critical area of software engineering.

Comments
Please log in or register to join the discussion