Intel Releases llm-scaler-vllm 0.14.0-b8, Talks Up 1.49x Performance With BMG-G31

Intel's latest llm-scaler-vllm update confirms BMG-G31 GPU support and delivers up to 1.49x performance gains for AI inferencing workloads.

Intel kicked off March 2026 by releasing llm-scaler-vllm 0.14.0-b8, the latest Docker-based solution for running vLLM on Intel Battlemage GPUs for AI inferencing. This update brings significant performance improvements and expands model support, while also confirming the existence of the previously elusive BMG-G31 "Big Battlemage" GPU.

Performance Improvements and Technical Updates

The new release rebases against vLLM 0.14 upstream and upgrades PyTorch to version 2.10, along with the latest oneAPI components. Thanks to Intel oneDNN optimizations, INT4 performance sees up to 25% throughput improvement compared to the prior release.

Expanded Model Support

llm-scaler-vllm 0.14.0-b8 adds official support for several new models:

Qwen3-VL-Reranker-2B/8B
Qwen3-VL-Embedding-2B/8B
GLM-4.7-Flash
Ministral models
DeepSeek-OCR-2
Qwen3-Coder-Next

BMG-G31 Validation and Performance

The most significant revelation in this release is the validated support for the BMG-G31 "Big Battlemage" GPU. This GPU has remained mysterious with no official announcement and rumors of cancellation, but the open-source software enablement continues.

Intel's announcement provides performance metrics comparing BMG-G31 to the G21 on a non-golden setup B70 system:

1.49x geometric mean performance under SLA constraints
1.13x geometric mean at fixed batch size

The announcement notes that throughput should be better on systems with golden BKC setup, suggesting the B70 system used for testing had limitations with allreduce operations for small message sizes.

Arc Pro B70 Connection

The performance data seemingly confirms that the talked-about Arc Pro B70 is indeed the BMG-G31. However, whether BMG-G31 will appear in any consumer Intel Arc Graphics card remains uncertain.

Technical Implications

The 1.49x geometric mean performance improvement under SLA constraints is particularly noteworthy for production AI inferencing workloads where quality of service guarantees are critical. This level of improvement could make Intel's Battlemage GPUs more competitive in the AI accelerator market.

The continued software development around BMG-G31, despite the lack of hardware announcements, suggests Intel is still planning to bring this GPU to market, even if the timeline remains unclear.

For more details on the Intel llm-scaler-vllm update, see the GitHub release announcement.

#Intel #llm-scaler-vllm #BMG-G31 #vLLM #AI inference