A recent thesis from Uppsala University reveals that Java's weak reference overhead is fundamentally a representation problem, not just a pipeline optimization challenge, proposing a field annotation approach that slashes GC collection time by 41%.

Java's WeakReference has long served as the language's mechanism for holding references without preventing garbage collection. When the GC determines an object is only weakly reachable, it clears the referent field and optionally enqueues the reference so the application can respond. This notification mechanism is optional: many uses, like caches and interning maps, never register a queue. Yet ZGC's reference-processing pipeline treats all weak references uniformly, creating overhead that scales linearly with the number of weak references allocated.
Fredrik Hammarberg's master's thesis at Uppsala University, completed in collaboration with Oracle's GC team in Stockholm, takes a systematic look at whether this overhead can be reduced through pipeline modifications or avoided more fundamentally through a different representation of weak semantics. The work proposes four orthogonal mechanisms and benchmarks them against baseline ZGC, revealing results that challenge assumptions about where optimization efforts should focus.
The Pipeline Problem
When ZGC discovers weak references during marking, every reference follows the same path: linked into an intrusive per-thread list via the hidden discovered field, transferred to the ReferenceHandler thread through the pending list, and iterated regardless of whether it will actually be enqueued. The per-reference processing cost includes a load barrier to read the referent, a virtual call to determine the reference type, and a CAS operation via a ZGC barrier to atomically set the field to a colored null.
This uniform treatment creates a mismatch between the optional nature of the callback mechanism and the unconditional work performed for it. The issue has been noted in the OpenJDK tracker (JDK-8029205) but remains unaddressed. For workloads that allocate millions of weak references, this becomes a measurable bottleneck.
Four Approaches to Reduction
The thesis evaluates four mechanisms, each targeting different aspects of the overhead.
Skip-Enqueue Separation (sep) routes queue-less weak references to a separate per-worker list during marking, bypassing the pending list entirely. References on this list are processed and cleared by GC threads directly without involvement from the ReferenceHandler thread. The implementation adds a queue check at discovery time, creating two specialized processing paths.
Dynamic Array (dyn) replaces the intrusive linked list with a contiguous array allocated on the C heap. The linked list exhibits poor cache locality: each element is scattered across the heap, and following the chain requires loading a new cache line per reference. The array enables sequential iteration with index-based access, keeping references in L1/L2 cache. It also retains capacity between cycles to avoid reallocations when reference populations are stable.
Optimised Clear Path (clear_path) simplifies the three operations required to clear a referent. The load barrier is eliminated when combined with the dynamic array, since the referent address and value are pre-loaded at discovery. The virtual call is eliminated because the reference type is statically known at the call site. The CAS is replaced with a plain store, since the only concurrent operations on a referent field are clearing or enqueueing it, both of which set it to null.
Weak Fields (weak_fields) takes a fundamentally different approach. Rather than wrapping a weak pointer in a separate object, weak semantics is expressed directly as a field annotation. The @weak annotation is recognized by the class-file parser and stored in fieldInfo metadata. At GC time, ZGC's marking closure checks each reference field against its metadata and diverts annotated fields to a per-worker array rather than treating them as strong references.
The Superadditivity Effect
The interaction between mechanisms produces results greater than the sum of individual contributions. The clear_path variant alone reduces non-strong processing time by 7%. The dyn variant alone achieves 36%. Together in clear_path_dyn, they achieve 81% reduction. This superadditivity arises because the dynamic array removes the pointer-chasing bottleneck that would otherwise persist when the CAS is eliminated, and the pre-loaded data lets the clear logic run without any barrier overhead.
The sep variant shows that the enqueueing stage itself is not the dominant bottleneck: routing queue-less references away from the ReferenceHandler thread has negligible effect in queue-less benchmarks. However, it might improve branch prediction for workloads with a mix of queue-less and queue-registered references.
The Representation Revelation
The 81% reduction in targeted phase time translates to only an 8% reduction in total collection time under conditions engineered to maximize the effect. Non-strong processing accounts for just 14.7% of baseline major-collection time in the single-object benchmark and 4.5% in the multi-object benchmark. Even dramatic phase-level improvements yield modest system-level gains.
The weak_fields variant tells a different story. By eliminating millions of WeakReference objects from the heap entirely, it reduces major collection time by 41% and old-generation time by 37% in the single-object benchmark. The improvement spans every phase: concurrent mark, relocate, young generation. The WeakReference objects themselves must be marked, promoted, and relocated across every GC cycle, so removing them reduces the GC workload across the board.
Java heap occupancy drops from approximately 1,720 MB to 806 MB, a 53% reduction, directly reflecting the absence of the WeakReference objects. This finding aligns with how weak semantics is implemented across other languages: Go's weak.Pointer, C++'s std::weak_ptr, and .NET's WeakReference<T> all treat weak reachability and cleanup notification as separate concerns.
Trade-offs and Limitations
The dynamic array introduces meaningful auxiliary memory costs. In the single-object benchmark with 20 million simultaneous entries, the clear_path_dyn variant requires 1,268 MB of auxiliary GCr memory compared to the 120 MB baseline, a 957% increase. The all variant saves approximately 30% over clear_path_dyn by not needing to store the reference address in each array entry, making it the more attractive pipeline variant overall.
The weak_fields implementation currently touches 27 files compared to at most 10 for the most complex pipeline variant and remains a prototype. Several paths forward are described in the thesis, including integration of one or more pipeline variants into the OpenJDK project. The full codebase is available as a fork of OpenJDK with all four mechanisms implemented as source-file overlays under patches/, along with build and benchmarking scripts.
Implications for Language Design
The thesis suggests that weak-reference overhead behaves more like a representation problem than a pipeline problem. Reducing it meaningfully requires reconsidering how weak semantics is encoded in the language, not merely how the resulting objects are processed once allocated. This observation extends beyond Java: the pattern of wrapping weak pointers in separate objects creates overhead that no amount of pipeline optimization can fully eliminate.
The @weak field annotation approach aligns Java's implementation with patterns established in other languages, where weak reachability is a property of a reference rather than a wrapper object. The callback-free cost model, where weak references incur minimal overhead unless explicitly registered for notification, matches how most weak references are actually used in practice.
For the broader Java ecosystem, the thesis demonstrates that language-level expressiveness and runtime performance are not always in tension. Sometimes the more expressive representation, annotating a field rather than wrapping it in a class, also happens to be the more efficient one.
The thesis is available at Uppsala University's DiVA portal. The complete codebase and measurement dataset are published on Zenodo.

Comments
Please log in or register to join the discussion