Arm-backed AI startup Positron claims its Asimov accelerator will deliver five times the tokens per dollar and one-fifth the power of Nvidia's Rubin GPUs by using LPDDR5x memory instead of expensive HBM, while leveraging CXL for massive memory expansion up to 2.3TB per chip.
Positron, the Arm-backed AI startup, is making bold claims about its next-generation Asimov accelerators, positioning them as a direct challenge to Nvidia's upcoming Rubin GPUs. The company asserts that its inference chip will deliver five times as many tokens per dollar while consuming one-fifth the power of Nvidia's latest accelerators.

The key to Positron's approach lies in its unconventional memory choice. Unlike its previous generation Atlas systems that used high-bandwidth memory (HBM), the Asimov accelerators employ LPDDR5x memory – the same type found in laptops and mobile devices. This decision allows for significant cost savings and higher memory capacity, with each chip supporting 864GB of on-package memory expandable to 2.3TB using Compute Express Link (CXL).
Memory Architecture: LPDDR5x vs HBM
The trade-off is clear: while LPDDR5x offers lower cost and higher capacity compared to HBM, it comes with significantly reduced bandwidth. Nvidia's Rubin GPUs pack 288GB of HBM4 capable of 22 TB/s peak bandwidth, whereas Asimov tops out at around 3 TB/s. However, Positron claims its chips can actually utilize 90 percent of that bandwidth in real-world scenarios, compared to the 30 percent utilization typically seen with HBM-based GPUs.
This bandwidth utilization claim is crucial to Positron's value proposition. The company argues that while Rubin's memory remains 2.4x faster even accounting for utilization differences, the combination of lower cost, higher capacity, and better bandwidth efficiency makes Asimov competitive for inference workloads.
Memory Expansion Strategy
The CXL expansion capability is particularly noteworthy. Positron plans to use the expanded memory pool to store key-value caches (KV-Cache), which track model state during inference. This approach theoretically mitigates much of the complexity and overhead associated with KV-Cache offloading, a critical consideration for large language model inference.
However, the CXL memory expansion faces bandwidth limitations. The chip's 32 PCIe 3.0 lanes restrict expansion bandwidth to approximately 256 GB/s, which could become a bottleneck for certain workloads despite the massive memory capacity.
Compute Architecture
While memory architecture dominates the discussion, the compute capabilities are equally important. The Asimov features a 512x512 systolic array running at 2 GHz, supporting multiple data types including TF32, FP16/BF16, FP8, NVFP4, and Int4. The array is fed by Armv9 cores and can be reconfigured to different dimensions (128x512 or 512x128) depending on workload requirements.
Interestingly, Positron hasn't disclosed teraFLOPS figures, suggesting the company believes raw compute performance is less critical than memory capacity and interconnect bandwidth for inference workloads. This aligns with the broader industry trend of optimizing for inference rather than training performance.
Scale-Out Architecture
Each Asimov accelerator includes 16 Tbps of chip-to-chip bandwidth, translating to 2 TB/s per chip. Four Asimov chips combine to form the Titan compute platform, which functions more like compute blades in Nvidia's NVL72 racks rather than standalone systems.
The scale-up capability is impressive: up to 4,096 Titan systems can be combined into a single scale-up domain with more than 32 petabytes of memory. Positron achieves this using a pure chip-to-chip mesh topology rather than the switched scale-up fabrics employed by Nvidia or AMD.
This mesh approach eliminates power-hungry packet switches but introduces its own challenges. Unlike switched fabrics that can be easily reconfigured, mesh topologies are less flexible. Google has addressed similar challenges using optical circuit switches that physically change chip connections, while Amazon has moved toward switched fabrics with Trainium 3 for better inference workload scalability.
Market Positioning and Competition
Positron's strategy directly challenges Nvidia's dominance in the AI accelerator market. By focusing on inference workloads and leveraging cost-effective memory solutions, the company aims to capture market share from organizations seeking lower-cost alternatives to Nvidia's premium offerings.
The timing is strategic, coming amid growing concerns about AI infrastructure costs and the search for more efficient inference solutions. Other companies are pursuing similar strategies – Microsoft's Maia 200 promises Blackwell-level performance for two-thirds the power, while various startups explore optical computing and other novel approaches to AI acceleration.
Technical Challenges and Considerations
Several technical challenges could impact Asimov's real-world performance:
- Bandwidth limitations: While LPDDR5x offers good cost-performance, the 3 TB/s peak bandwidth may prove insufficient for certain workloads, particularly when combined with CXL expansion constraints.
- Memory hierarchy complexity: Managing data movement between on-package LPDDR5x, CXL-expanded memory, and compute units adds complexity that could impact performance.
- Interconnect scalability: The chip-to-chip mesh approach may face scaling limitations as cluster sizes grow, particularly compared to switched fabric alternatives.
- Software ecosystem: Positron will need to build a robust software stack to compete with Nvidia's mature CUDA ecosystem.
Industry Context
The AI accelerator market is experiencing rapid innovation as companies seek alternatives to Nvidia's dominant position. Recent developments include:
- Intel's Xeon workstation refresh targeting memory-intensive workloads
- Bill Gates-backed startups exploring optical transistors to revive Moore's Law
- AI networking startups like Upscale AI raising significant capital to challenge Nvidia's NVSwitch
- Microsoft's Maia 200 offering competitive performance at lower power consumption
Future Outlook
Asimov is expected to begin shipping next year, giving Positron a relatively short timeline to prove its technology. The company's success will depend on delivering on its performance claims while building a compelling ecosystem around its hardware.
The use of laptop-class memory in high-performance AI accelerators represents a significant departure from industry norms. If successful, this approach could influence future AI hardware designs, particularly for inference-focused applications where memory capacity often trumps raw bandwidth.
For now, Positron's bold claims and unconventional approach make Asimov one of the most interesting AI accelerator developments to watch in the coming year. The real test will come when production systems are deployed and benchmarked against established solutions from Nvidia, AMD, and other competitors.

Comments
Please log in or register to join the discussion