Vortex 3.0 Adds a Full 3D Pipeline to the Open-Source RISC-V GPU
#Hardware

Vortex 3.0 Adds a Full 3D Pipeline to the Open-Source RISC-V GPU

Hardware Reporter
5 min read

Georgia Tech's open-source RISC-V GPU project just grew a fixed-function graphics stack, tensor core sparsity, and a Mesa Vulkan back-end. It still runs in simulation or on FPGAs, which means you can actually inspect every stage of the pipeline instead of treating the GPU as a black box.

Most of the silicon in a modern graphics card is a sealed box. You feed it shaders and command buffers, you read back frames and counters, and everything in between is proprietary RTL you will never see. Vortex is the opposite of that. It is a fully open-source GPU design built on RISC-V, developed by the team at Georgia Tech's College of Computing, and version 3.0 just landed with a change that reframes the whole project: it now has a real 3D graphics pipeline.

Twitter image

What Vortex actually is

Vortex is a soft GPU. Up through the 2.x series it was an OpenCL-compatible GPGPU implementation, meaning it executed compute kernels but had no fixed-function graphics hardware. You run it one of two ways. Either you use the cycle-level simulator or the RTL simulator entirely in software, or you synthesize the RTL onto an FPGA. The project supports both AMD-Xilinx and Altera (Intel) FPGA targets, so a board like an Alveo U50 or a Stratix 10 dev kit becomes a working, inspectable GPU you can clock, probe, and rewrite.

That distinction matters for anyone who measures things. On a retail GPU your performance counters are whatever the vendor decided to expose. On Vortex every warp scheduler decision, every cache miss, every memory transaction is visible because you have the source. The trade-off is obvious and steep: FPGA clock rates sit in the low hundreds of MHz against the 2+ GHz of shipping silicon, and you are working with a handful of cores rather than thousands. Vortex is not competing with an RTX card on frames per second. It is competing on transparency, and on that axis it wins outright.

The 3.0 graphics stack

The headline addition is a fixed-function graphics path. Vortex 3.0 introduces a rasterizer and texture units, the two pieces of dedicated hardware that separate a graphics GPU from a pure compute array. Rasterization (turning triangles into covered pixel fragments) and texture sampling (filtered memory fetches with addressing and interpolation baked into hardware) are exactly the workloads you do not want to emulate in software shaders if you care about throughput. Putting them in fixed function is the same architectural decision every commercial GPU makes, and now you can read the implementation.

To expose that pipeline to real applications, 3.0 ships a Mesa back-end. The new driver is called vortexpipe, and it plugs into Mesa through the Lavapipe Vulkan path. Lavapipe is normally Mesa's software Vulkan rasterizer; here it becomes the front-end that feeds Vortex's actual hardware stages. That gives you a standards-conformant Vulkan entry point into an open GPU, which is a combination that essentially did not exist before in something you can fully synthesize.

On the compute side, 3.0 adds HIP support by way of chipStar, the project that maps HIP and CUDA-style code onto SPIR-V and OpenCL/Level Zero. So a single design now answers to OpenCL, Vulkan, and HIP.

{{IMAGE:2}}

Beyond graphics: the compute and scheduling changes

The 3.0 release is not just a graphics bolt-on. Several of the additions target matrix and scheduling throughput, which is where the interesting architecture research lives:

Feature What it does
Tensor core structured sparsity Skips structured-zero operands in matrix math, the same idea behind NVIDIA's 2:4 sparsity, now in open RTL
Warp group-level matrix multiply Coordinates a group of warps on one large GEMM instead of per-warp tiles
Hardware kernel scheduler Moves kernel dispatch decisions into hardware rather than host-driven launches
Command processor architecture A front-end that ingests and sequences command streams, like a real GPU's CP
Async barriers Lets warps synchronize without stalling the whole group, improving occupancy

The command processor is the quietly significant one. A discrete GPU does not get fed one kernel at a time by the CPU; it pulls from command buffers through a dedicated processor that handles scheduling and state. Adding that to Vortex closes part of the gap between a teaching/research core and the structure of production hardware, and it pairs naturally with the new hardware kernel scheduler.

Why a homelab cares

There is no power-draw chart to publish here, because draw depends entirely on which FPGA you map it to and at what clock. That is the point. If you are the kind of builder who wants to know why a workload is bound, Vortex lets you measure the cause and not just the symptom. Want to know how structured sparsity changes your GEMM cycle count? Synthesize it with and without and read the counters. Want to see exactly when async barriers stop stalling your warps? The scheduler is right there in the source.

The practical recommendation depends on what you are after. If you only want to run OpenCL or Vulkan workloads, the software simulators are the cheapest way in and need no hardware at all, just patience, since cycle-accurate simulation is slow. If you want real timing data, an AMD-Xilinx or Altera FPGA board is the move, and the existing Vortex build flow already targets both vendors. For anyone teaching GPU architecture or prototyping accelerator ideas, having Vulkan, HIP, and OpenCL all land on one auditable design removes a huge amount of guesswork.

Vortex 3.0 source, build instructions, and the FPGA flows are all on the project's GitHub, with the project hosted through Georgia Tech's College of Computing. It remains one of the few places where the phrase "open-source GPU" means the entire stack from the Vulkan driver down to the rasterizer RTL, and 3.0 is the first version where that claim covers graphics as well as compute.

Comments

Loading comments...