An independent implementation of the PlanB algorithm for IPv6 LPM that addresses limitations in the original reference implementation, offering a portable, performant solution with practical features like dynamic FIB and Python bindings.

planb-lpm: A Practical Implementation of IPv6 Longest-Prefix Match with SIMD Optimization

The planb-lpm project presents a clean-room implementation of the PlanB algorithm for IPv6 longest-prefix match (LPM) optimization. This implementation addresses several practical limitations of the original research code while maintaining the core algorithmic innovations from the PlanB paper by Zhang et al.

The Problem: IPv6 LPM Challenges

Longest-prefix matching is a fundamental operation in networking, used to determine the most specific route for an IP address in a forwarding information base (FIB). As IPv6 adoption grows, the scale of FIBs has expanded dramatically, making efficient LPM lookup critical for router performance. Traditional software implementations struggle with the memory access patterns and computational requirements of large IPv6 routing tables.

The PlanB algorithm, presented at NSDI '26, offers a novel approach using a linearized B+-tree structure optimized for SIMD processing, particularly AVX-512 instructions. While the research demonstrated impressive performance improvements over existing structures, the reference implementation had significant limitations for practical deployment.

What's Actually New in planb-lpm

The planb-lpm repository addresses several key shortcomings of the original reference implementation:

Portability: The reference code was Linux- and AVX-512-only. This implementation works across platforms with automatic detection of AVX-512 support and a transparent scalar fallback path.
Licensing: The original code had no license, making it unusable in commercial or open-source projects. planb-lpm uses the MIT license.
Dynamic FIB: The reference implementation only supported static FIBs. This implementation adds a lpm6::Dynamic class using the paper's rebuild-and-swap model with std::atomic<std::shared_ptr>, providing wait-free lookups.
Language Bindings: Python bindings via pybind11 are included, making the algorithm accessible to Python-based network tools and simulations.
Testing Infrastructure: The project includes correctness tests that verify the tree against a brute-force LPM reference implementation, ensuring algorithmic correctness.
Sample Data Generation: A script to generate synthetic FIBs and traces is provided, enabling reproducible benchmarks.

Algorithm Overview

The PlanB algorithm transforms the IPv6 LPM problem into a predecessor search problem:

Each prefix/length is expanded into a [start, end) interval on the upper 64 bits of the IPv6 address space.
The 2·N boundary points are sorted, with prefix nesting resolved using a stack.
The sorted boundaries are packed into the leaves of a 9-ary linearized B+-tree (8 keys per node, 64-byte cache-line aligned).
Internal nodes hold separator keys copied from the leaves at strides of 9^L · 8.
Lookup uses AVX-512 instructions: at each internal node, a single vpcmpuq + popcnt operation selects the child, with the same process at the leaf level to locate the predecessor edge.

This approach transforms the irregular memory access patterns of traditional trie structures into a predictable, SIMD-friendly predecessor search.

Performance Analysis

The project provides comprehensive benchmarks on a consumer laptop (Intel i5-1035G7, Ice Lake, 4C/8T, AVX-512) that reveal interesting insights:

Single-Core Performance

For a 100,000-prefix FIB:

Single lookup: 48 MLPS (million lookups per second)
Batched lookup (M=8): 66 MLPS (~1.5× improvement)
Batched lookup (M=32): 76 MLPS (~1.6× improvement)

The sweet spot appears to be batch size 8, which provides the best throughput-to-complexity ratio. Larger batch sizes offer diminishing returns due to register pressure and resource contention.

Comparison to Baseline

Against a conventional Patricia trie implementation:

PlanB: 50 MLPS (median)
Patricia: 2.4 MLPS (median)
Advantage: ~20×

This significant improvement demonstrates the effectiveness of the linearized B+-tree approach over traditional pointer-chasing structures.

Memory Footprint

For a 100,000-prefix FIB:

PlanB algorithmic cost: 4.82 MB (~50 bytes per prefix)
PlanB RSS delta: 5.05 MB
Patricia algorithmic cost: 6.04 MB (~90 bytes per prefix)
Patricia RSS delta: 9.02 MB

The flat, cache-aligned layout of PlanB provides better memory efficiency at this scale, though the advantage diminishes at larger FIB sizes due to the discrete depth steps in the B+-tree structure.

Real-World vs. Synthetic Data

Benchmarking against a real RIPE RIS RIB (254,197 prefixes) revealed interesting differences:

Real BGP single lookup: 67.5 MLPS
Synthetic same-size FIB: 20 MLPS
Real BGP batch<8>: 137.2 MLPS
Synthetic same-size FIB batch<8>: 29 MLPS

The real BGP table performed ~3× better than the synthetic equivalent, attributed to spatial locality in real BGP advertisements that concentrates lookups in frequently accessed tree regions.

Multi-Core Scaling

On a 4-core/8-thread system:

1 thread: 130 MLPS
2 threads: 278 MLPS (1.07× efficiency)
4 threads: 396 MLPS (0.76× efficiency)
8 threads: 626 MLPS (0.60× efficiency)

Superlinear scaling at 2 threads suggests better latency hiding with multiple memory streams. Efficiency decreases at higher thread counts due to shared resource contention.

Limitations

The implementation has several limitations worth noting:

Hardware Discrepancy: Benchmarks were conducted on a consumer laptop (Intel i5-1035G7) rather than the server-class hardware used in the paper (12-core Xeon 8331C or AMD Zen5). This creates a 5-15× single-core throughput gap and affects multi-core scaling characteristics.
Baseline Comparison: The Patricia trie used as a baseline is a conventional pointer-chasing structure, not one of the paper's actual baselines (PopTrie, CP-Trie, Neurotrie, HBS). The paper reported 1.6-14× advantages over these more sophisticated structures.
Data Distribution: Synthetic FIBs use length-weighted distributions that don't perfectly match real BGP address distributions. As shown, this can significantly impact performance characteristics.
Trace Characteristics: Benchmarks use uniform-random 64-bit address traces, whereas real network traffic concentrates on long prefixes, which would further differentiate the performance characteristics.

Practical Applications

The implementation is particularly suited for:

Research and Simulation: As a fast LPM backend for network simulators like ns-3 or custom packet-level simulators.
Control-Plane Tooling: SDN controllers and route-monitoring services that need to answer "which prefix covers this address?" over large FIBs without a full routing stack.
Network Analysis: IPv6 scanners, traffic classifiers, and log-enrichment pipelines that tag flows with their covering prefix.
Educational Purposes: A compact, readable implementation of a modern SIMD lookup structure suitable for networking and computer-architecture courses.

Update and Rebuild Performance

The dynamic FIB implementation uses a batched update model where N pending prefix changes are coalesced into one full rebuild:

100,000-prefix rebuild: ~19 ms
Amortized cost per update (coalesced per 1K): ~14 µs
Amortized cost per update (coalesced per 10K): ~1.4 µs

This rebuild cost is competitive with the paper's reported figures, suggesting the implementation faithfully captures the algorithmic approach.

Conclusion

The planb-lpm project successfully bridges the gap between academic research and practical implementation. By addressing the limitations of the original PlanB reference code, it provides a robust, portable solution for IPv6 LPM optimization that can be readily integrated into production systems, research tools, and educational materials.

While the benchmarks on consumer hardware don't reach the peak performance reported in the paper, they demonstrate significant improvements over traditional approaches and validate the core algorithmic innovations. The inclusion of practical features like dynamic FIB support, Python bindings, and comprehensive testing makes this implementation immediately useful for a wide range of applications.

For developers working on IPv6 routing, network simulation, or high-performance network applications, planb-lpm offers a well-implemented, well-documented solution that brings cutting-edge research into practical reach.

#IPv6 #LPM #SIMD #Routing #Performance

planb-lpm: A Practical Implementation of IPv6 Longest-Prefix Match with SIMD Optimization

planb-lpm: A Practical Implementation of IPv6 Longest-Prefix Match with SIMD Optimization

The Problem: IPv6 LPM Challenges

What's Actually New in planb-lpm

Algorithm Overview

Performance Analysis

Single-Core Performance

Comparison to Baseline

Memory Footprint

Real-World vs. Synthetic Data

Multi-Core Scaling

Limitations

Practical Applications

Update and Rebuild Performance

Conclusion

Comments