GNU Awk 5.4 introduces the new MinRX regex engine as default, delivers ~9% faster file reading, and adds Arabic translations while improving Windows and OpenVMS support.
GNU Awk (Gawk) 5.4 has been released, bringing significant performance improvements and new features to this essential text-processing utility. The most notable change is the adoption of the new MinRX regular expression matcher as the default engine, replacing the previous implementation.
MinRX: A New Default Regex Engine
The MinRX matcher, written by Mike Haertel (the original developer behind GNU grep), is now the default regular expression engine in Gawk 5.4. This new matcher is fully POSIX compliant, addressing limitations in the existing GNU matchers. While the old regex and DFA engines remain available for backward compatibility, MinRX will be used by default in all new Gawk 5.4 installations.
Performance Boost: ~9% Faster File Reading
Gawk 5.4 introduces a significant performance optimization for reading regular disk input files. The developers removed timeout checks for such files, resulting in approximately 9% faster processing of large files. This improvement is particularly valuable for users processing large log files or datasets, where Gawk's text-processing capabilities are frequently employed.
Enhanced Windows and Cross-Platform Support
The Windows ecosystem receives notable improvements in this release. The MinGW port now supports UTF-8 encoded non-ASCII text, while the Cygwin port has achieved full UTF-8 support. These changes make Gawk more versatile for international text processing on Windows platforms.
OpenVMS support has also been improved, extending Gawk's reach to this specialized operating system used in enterprise environments.
Build System and Code Quality Enhancements
Gawk 5.4 introduces a --enable-o3 build option, allowing users to compile Gawk with -O3 compiler optimizations for potentially better performance. The C codebase now has assertions enabled, improving code reliability and debugging capabilities.
BSD support has been enhanced, and the build system has been refined to better handle various platform-specific requirements.
Internationalization and Documentation
This release marks the first time Gawk includes Arabic translations, expanding its accessibility to Arabic-speaking users. The manual and documentation have been updated with new policies explicitly forbidding ad hominem attacks on mailing lists and strongly discouraging discussions of proprietary software, reflecting the GNU project's commitment to free software principles.
Technical Improvements
Additional technical enhancements include:
- Support for multi-byte characters with the ordchr extension
- POSIX 2024 spec handling changes
- Alterations to persistent memory usage
Gawk remains one of the most powerful text-processing tools available, combining the pattern-matching capabilities of grep with the report-generation features of sed and the programming constructs of a full language. This release continues that tradition while modernizing the core regex engine and improving performance.
The new version is available for download from GNU.org, with source code and binaries for various platforms. Users of previous Gawk versions are encouraged to upgrade to benefit from the performance improvements and new features, particularly the more compliant and potentially faster MinRX regex engine.

For system administrators, data analysts, and developers who rely on Gawk for text processing tasks, version 5.4 represents a meaningful upgrade that balances backward compatibility with forward-looking improvements to core functionality.

Comments
Please log in or register to join the discussion