AMD's Lemonade SDK Adds NVIDIA CUDA Support, Bringing Cross-Vendor Local AI Serving to 10.7

AMD's open-source Lemonade local AI server now runs on its biggest competitor's silicon. The 10.7 release wires up Llama.cpp's CUDA back-end across Windows and Linux, adds a unified benchmarking command, and exposes a native Prometheus endpoint for monitoring.

AMD has shipped Lemonade 10.7, and the headline addition is one that crosses a competitive line most vendors prefer to defend: native NVIDIA CUDA support. Lemonade is AMD's local AI server stack, designed to serve large language models and other workloads across CPUs, GPUs, and NPUs through OpenAI-, Anthropic-, and Ollama-compatible APIs. With this release, the same server experience now extends to NVIDIA GPUs, the hardware that holds the dominant share of the discrete accelerator market.

Twitter image

What shipped in 10.7

The CUDA integration is the centerpiece. Lemonade 10.7 wires up the Llama.cpp CUDA back-end on both Windows and Linux, adding proper NVIDIA GPU detection inside Lemonade along with the surrounding integration work needed to route inference to the correct device. For image generation, the release adds the stable-diffusion.cpp CUDA back-end on Linux, and it brings stable-diffusion.cpp Vulkan support to both Windows and Linux. That Vulkan path matters because it is vendor-neutral: a single back-end that can target AMD, NVIDIA, and Intel GPUs without per-vendor binary stacks.

Beyond the back-end expansion, 10.7 adds support for LMX-Omni models, a native Prometheus endpoint for real-time statistics monitoring, and a set of smaller enhancements. The Prometheus endpoint is a practical touch for anyone running Lemonade as more than a desktop toy. It lets operators scrape token throughput, latency, and utilization metrics into the same observability tooling they already use for everything else.

The other notable addition is a new lemonade bench command aimed at apples-to-apples LLM benchmarking. The pitch is consistent measurement across four very different execution paths: Llama.cpp, FastFlowLM, vLLM, and AMD's Ryzen AI software. Comparable numbers across those back-ends are genuinely hard to produce by hand, because each one has its own warmup behavior, batching defaults, and quantization assumptions. A standardized harness inside the tool itself is the kind of plumbing that makes published performance figures trustworthy.

The hardware matrix

Lemonade builds on FastFlowLM, vLLM, Llama.cpp, and other open-source components to assemble a fairly wide hardware footprint. {{IMAGE:2}} AMD's own silicon is the obvious target: Ryzen AI NPUs, Radeon and Instinct GPU accelerators, and x86_64 CPUs. On top of that, the project already supported Apple Metal GPUs and AArch64 CPUs. Adding CUDA fills in the one large gap that remained, which means a developer can now write against Lemonade's API surface once and deploy across essentially the full range of consumer and datacenter accelerators in circulation.

That breadth is the strategic point. The value of a local AI server is not the inference kernel, which Llama.cpp and vLLM already provide, but the consistent API and device abstraction layered on top. By making that layer run on NVIDIA hardware, AMD positions Lemonade as a portability tool rather than a hardware lock-in mechanism. A team prototyping on the NVIDIA GPUs they happen to own today can move the same workload to Ryzen AI or Instinct parts later without rewriting their integration.

Why a chip vendor ships software for a rival's GPUs

It looks counterintuitive for AMD to spend engineering effort enabling its competitor's products, but the logic holds up. Adoption of a software framework compounds. The more places Lemonade runs, the more developers build against it, and the more those developers eventually evaluate AMD silicon as a deployment target because the path is already paved. Restricting the framework to AMD-only hardware would cap its install base at AMD's current market share, which on the discrete GPU side is the smaller slice. Supporting CUDA trades a small amount of competitive purity for a much larger potential funnel.

There is also a defensive angle. NVIDIA's CUDA ecosystem is the gravitational center of GPU compute, and frameworks that ignore it tend to stay niche. By meeting developers where the hardware already is, AMD keeps Lemonade relevant in mixed-vendor environments, which describe most real organizations. Few shops run a single GPU vendor end to end across workstations, CI, and production.

Phoronix's Michael Larabel flagged the new lemonade bench command as the feature he is most interested in testing, with plans to use it for future cross-vendor performance coverage. That is the natural follow-up question this release raises: once the same server and the same benchmark harness run on AMD and NVIDIA hardware alike, the comparisons get a lot cleaner, and a lot more interesting for buyers weighing accelerators on price and throughput rather than software availability.

Lemonade 10.7 downloads and the full changelog are available through the project's GitHub repository.