Open Image Denoise 2.5 Brings Real Speed-Ups to Intel Arc GPUs and AMX-FP16 CPUs

Intel's denoising library that powers Blender and other renderers just shipped a release that cuts both runtime and memory use on Arc Battlemage hardware. I put it on a B70 to see what the XMX optimizations actually buy you.

Intel shipped Open Image Denoise 2.5 last week, and for anyone running a render box with Arc graphics, this is the kind of release worth flashing onto your workstation the same day. OIDn is the open-source denoising library that Blender, Cycles users, and a long list of other ray-tracing renderers lean on to clean up noisy path-traced output without you having to crank sample counts into the thousands. When the denoiser gets faster, your effective time-to-final-frame drops, and that matters whether you're rendering a single hero shot or batching an animation overnight.

Open Image Denoise 2.5

What actually changed in 2.5

The headline improvements target two pieces of Intel silicon specifically. On the GPU side, Arc cards with XMX (Xe Matrix Extensions) units get new optimizations that deliver both higher throughput and lower memory consumption. On the CPU side, OIDn previously demonstrated big gains by tapping AMX with the BF16 data path, and 2.5 extends that work to AMX-FP16. Lower memory usage is the underrated half of this. Denoising large frames at 4K already eats VRAM, and if you're sharing the card with the renderer's own scene data, every megabyte the denoiser hands back is a tile you don't have to split or a resolution you don't have to drop.

The FP16 path is the interesting architectural detail. AMX (Advanced Matrix Extensions) is Intel's on-die matrix multiply accelerator, the same general idea as the GPU's XMX blocks but sitting in the CPU cores on Xeon and select client parts. BF16 and FP16 are both 16-bit formats, but they trade off range versus precision differently: BF16 keeps the 8-bit exponent of FP32 (wide range, coarse mantissa), while FP16 spends more bits on the mantissa (finer precision, narrower range). For a denoising network where the weights and activations sit in a predictable numeric band, FP16 can hold more precision per element, and getting AMX to chew through FP16 efficiently means the CPU-only denoise path stays competitive on hardware that supports it.

Benchmarking on the Arc Pro B70

I wanted to see the GPU gains firsthand, so I set up a clean comparison on an Intel Arc Pro B70, the BMG-G31 Battlemage part. The only variable I changed was the OIDn version itself. Same GPU, same driver stack, same oneAPI SYCL runtime, same input frames. Swap the library, rerun, compare. That's the only honest way to attribute a delta to the software.

The run that tells the story is the RT.hdr_alb_nrm.3840x2160 workload, the 4K case that feeds the denoiser the noisy beauty pass plus albedo and normal auxiliary buffers. That's the realistic configuration for production denoising, since the albedo and normal guides let the network preserve edges and texture detail instead of smearing them. At 3840x2160 the network has the most pixels to process, so it's also where any per-element optimization compounds hardest.

Version 2.5 came out fastest in that comparison, and the speed-up on Battlemage is genuinely hearty rather than a rounding-error bump. For Blender users running an Arc card as their compute device, this lands as free performance on hardware you already own. No new GPU, no driver gymnastics, just a newer library build.

The CPU side, and the part that didn't move

Not everything got faster, and that's worth saying plainly. On the desktop CPU side I tested an Arrow Lake Refresh part, the Core Ultra 7 270K Plus, and observed no change in performance between 2.4 and 2.5. That's the expected result: the AMX-FP16 work only pays off on cores that actually expose AMX, and the AMX feature set has lived on Intel's server Xeon lineup rather than mainstream desktop client chips. If you're denoising on an Arrow Lake desktop, you're running the AVX code path, and 2.5 left that path alone.

This is the kind of detail that separates a useful benchmark from a marketing slide. The release notes say "AMX-FP16 improvements," and on the right Xeon that's real, but on a 270K Plus desktop it's a no-op. Knowing which of your boxes benefit before you schedule a maintenance window saves you from chasing a regression that was never going to appear.

Build recommendations

If you're speccing or already running a render node, here's how I'd weigh it:

Arc Battlemage owners (B-series, including Arc Pro B70): Update to 2.5 immediately. You get the throughput gain and the lower memory ceiling, which together let you push larger frames or denoise more aggressively in the same VRAM budget.
Xeon with AMX: The FP16 path is the reason to look at 2.5 for CPU-side denoising, especially if you were previously on the BF16 path and want the precision headroom.
Arrow Lake / mainstream desktop CPUs: Update for the maintenance hygiene, but don't expect a CPU denoise speed-up. Your gains, if any, come from putting the work on an Arc GPU instead.

The broader pattern here is Intel treating OIDn as a first-class showcase for both XMX and AMX, the matrix engines on its GPUs and CPUs respectively. Denoising is an ideal demonstration workload because it's a fixed, well-understood neural network running on every frame of a render, so any matrix-engine optimization shows up as a clean, repeatable number. For a homelab that does any rendering, that makes OIDn one of the better real-world proxies for whether Intel's matrix hardware is actually earning its transistors.

The project lives at OpenImageDenoise.org, with source and release builds on the GitHub repository if you want to compile against your own oneAPI toolchain rather than wait for a distro package.