MYRA: Java’s Off‑Heap Engine for Sub‑Microsecond Latency
Share this article
MYRA: Java’s Off‑Heap Engine for Sub‑Microsecond Latency
A recent post by Rohan Ray introduces MYRA (Memory Yielded, Rapid Access), a production‑grade Java library stack that leverages the Foreign Function & Memory (FFM) API introduced in JDK 22. The goal is clear: deliver deterministic, sub‑microsecond latency for high‑frequency trading, market data feeds, and other real‑time systems while preserving Java’s safety guarantees.
The Problem with Legacy Off‑Heap Approaches
Historically, Java developers have turned to sun.misc.Unsafe or JNI wrappers to access off‑heap memory. Both approaches suffer from stability and portability issues:
- Unsafe is an internal API that changes with each JDK release, forcing frequent refactoring.
- JNI incurs heavy boilerplate, increases the attack surface, and can introduce hard‑to‑debug memory corruption.
FFM offers a standardized, safe alternative that exposes the same low‑level capabilities but within the JVM’s bounds‑checking and type‑safety mechanisms.
MYRA’s Core Design Principles
The stack is built around four pillars that directly address the performance‑memory trade‑off:
- Zero GC – All data lives off‑heap in memory arenas; the garbage collector never touches the critical path.
- Zero Allocation – Reusable, stateless flyweight views replace object churn; the hot path never allocates on the heap.
- Zero Copy – Structured layouts allow direct reads/writes to raw memory, eliminating serialization overhead.
- Ultra‑Low Latency – Targeting < 30 µs mean latency with controlled tail behavior.
Each principle is realized through a set of six libraries:
roray-ffm-utils– Memory arenas and native resource handling.myra-codec– Zero‑copy serialization.myra-transport– Linuxio_uring‑based networking.express-rpc– A lightweight RPC framework.jia-cache– Off‑heap caching.- (Future) – Additional utilities for metrics and diagnostics.
Benchmarks that Matter
The author reports on two key performance dimensions: serialization and networking.
Serialization
Using a realistic order‑book snapshot workload on an c6a.4xlarge instance with JDK 25, MYRA outperforms competitors in decode throughput:
| Codec | Decode (ops/s) | Encode (ops/s) |
|---|---|---|
| MYRA | 4,150,079 | 1,911,781 |
| SBE | 2,204,557 | 4,990,071 |
| FlatBuffers | 1,968,855 | 1,045,843 |
| Kryo | 1,322,754 | 1,342,611 |
| Avro | 454,553 | 466,816 |
The decode‑dominance of MYRA suits read‑heavy workloads common in trading and market data pipelines.
Networking
A ping‑pong latency test on an ARM64 Graviton instance demonstrates MYRA’s io_uring‑based transport outpacing Netty and NIO:
| Transport | Mean Latency (µs) | Throughput (ops/s) |
|---|---|---|
| MYRA_TOKEN | 28.70 | 34,843 |
| Netty | 39.34 | 25,417 |
| NIO | 13.22 | 75,645 |
The token‑based completion tracking strikes a balance between latency and consistency, delivering 27 % lower latency and 37 % higher throughput than Netty.
Why Java + FFM Beats C/C++/Rust for Most Real‑World Use Cases
The article addresses a common question: Why not write the entire stack in C++ or Rust?
| Factor | C/C++ | Rust | Java + FFM (MYRA) |
|---|---|---|---|
| Memory Safety | Undefined behavior, manual checks | Ownership model, steep learning curve | Bounds‑checked, no segfaults |
| Performance Tuning | Manual SIMD, architecture‑specific code | Zero‑cost abstractions, but still requires expertise | Off‑heap access with deterministic behavior |
| Talent Pool | Scarce, high cost | Niche, crypto‑centric | Broad Java community |
| Tooling | gdb, perf, valgrind | rust‑c, cargo‑watch | JDK profilers, Flight Recorder |
| Ecosystem | Limited to low‑level libs | Fragmented async runtimes | Mature Maven, Spring, Loom |
While C++ may still win in absolute raw speed for ultra‑low latency (1 µs budgets), the performance gap is often within 10‑15 %. For systems that process millions of messages per second, the trade‑off in developer velocity and safety favors Java + MYRA.
Implications for the Industry
MYRA’s approach signals a shift in how latency‑critical Java applications are built:
- Deterministic Off‑Heap – Eliminates GC pauses in the critical path, a long‑standing pain point for HFT firms.
- Standardized API – By basing the stack on the officially supported FFM API, future JDK releases will not break the ecosystem.
- Open‑Source, No‑Enterprise Lock‑In – The author commits to a fully open‑source model, encouraging community contributions and reducing vendor lock‑in.
Developers in finance, ad‑tech, game servers, and IoT can now prototype high‑performance pipelines with the safety and tooling of the JVM, potentially reducing time‑to‑market by months.
Next Steps
The stack is slated for a public open‑source release by Christmas 2025. The author plans to continue optimizations, documentation, and community engagement. For those interested in exploring FFM or building low‑latency Java services, the repository is available at github.com/mvp‑express.
Source: https://www.roray.dev/blog/myra-stack/