Intel's latest llm-scaler-vllm update confirms BMG-G31 GPU support and delivers up to 1.49x performance gains for AI inferencing workloads.
Intel kicked off March 2026 by releasing llm-scaler-vllm 0.14.0-b8, the latest Docker-based solution for running vLLM on Intel Battlemage GPUs for AI inferencing. This update brings significant performance improvements and expands model support, while also confirming the existence of the previously elusive BMG-G31 "Big Battlemage" GPU.
Performance Improvements and Technical Updates
The new release rebases against vLLM 0.14 upstream and upgrades PyTorch to version 2.10, along with the latest oneAPI components. Thanks to Intel oneDNN optimizations, INT4 performance sees up to 25% throughput improvement compared to the prior release.
Expanded Model Support
llm-scaler-vllm 0.14.0-b8 adds official support for several new models:
- Qwen3-VL-Reranker-2B/8B
- Qwen3-VL-Embedding-2B/8B
- GLM-4.7-Flash
- Ministral models
- DeepSeek-OCR-2
- Qwen3-Coder-Next
BMG-G31 Validation and Performance
The most significant revelation in this release is the validated support for the BMG-G31 "Big Battlemage" GPU. This GPU has remained mysterious with no official announcement and rumors of cancellation, but the open-source software enablement continues.
Intel's announcement provides performance metrics comparing BMG-G31 to the G21 on a non-golden setup B70 system:
- 1.49x geometric mean performance under SLA constraints
- 1.13x geometric mean at fixed batch size
The announcement notes that throughput should be better on systems with golden BKC setup, suggesting the B70 system used for testing had limitations with allreduce operations for small message sizes.
Arc Pro B70 Connection
The performance data seemingly confirms that the talked-about Arc Pro B70 is indeed the BMG-G31. However, whether BMG-G31 will appear in any consumer Intel Arc Graphics card remains uncertain.
Technical Implications
The 1.49x geometric mean performance improvement under SLA constraints is particularly noteworthy for production AI inferencing workloads where quality of service guarantees are critical. This level of improvement could make Intel's Battlemage GPUs more competitive in the AI accelerator market.
The continued software development around BMG-G31, despite the lack of hardware announcements, suggests Intel is still planning to bring this GPU to market, even if the timeline remains unclear.
For more details on the Intel llm-scaler-vllm update, see the GitHub release announcement.

Comments
Please log in or register to join the discussion