Accelerate provides an embedded DSL for regular multi‑dimensional arrays in Haskell, compiling them on‑the‑fly to LLVM‑based CPU or CUDA GPU code. The project ships a core library, several back‑ends, and a growing ecosystem of I/O, FFT, and BLAS extensions, but it remains limited to regular arrays, requires explicit compilation steps, and still lacks many higher‑level abstractions found in mainstream data‑parallel frameworks.
Accelerate brings GPU‑ready array programming to Haskell – what the library actually offers

What the announcement claims
The Accelerate project advertises a high‑performance embedded language for array computations that can be compiled at runtime to run on multicore CPUs or NVIDIA GPUs. The repository lists a core package (accelerate) plus a set of back‑ends (accelerate-llvm-native, accelerate-llvm-ptx) and a collection of I/O and algorithmic extensions (FFT, BLAS, image formats, etc.). The documentation highlights a syntax that mirrors ordinary Haskell list code—e.g. a dot‑product expressed with fold, zipWith, and (*)—while promising automatic off‑loading to the GPU via run functions.
What is actually new
| Feature | Description | Status |
|---|---|---|
| Embedded DSL | Uses higher‑order abstract syntax (HOAS) to build an AST of array operations that is later lowered to LLVM IR. | Stable, part of accelerate core. |
| CPU back‑end | accelerate-llvm-native generates multi‑threaded code for any LLVM‑compatible CPU. |
Production‑ready, used in the examples. |
| CUDA back‑end | accelerate-llvm-ptx emits PTX for NVIDIA GPUs (compute capability ≥3.0). |
Works for many kernels; requires a CUDA‑capable device. |
| Ecosystem packages | I/O adapters (accelerate-io-*), numeric libraries (accelerate-fft, accelerate-blas), and domain‑specific extensions (e.g., colour-accelerate). |
Individually maintained, some lag behind core releases. |
| Example suite | accelerate-examples ships a mandelbrot renderer, ray‑tracer, N‑body simulation, PageRank, and more. |
Demonstrates feasibility but often tuned for small‑scale benchmarks. |
The core novelty lies in the type‑safe separation of description and execution: the Acc type marks an expression as compilable, and the back‑ends guarantee that generated code respects Haskell’s purity guarantees. The library also introduces a HOAS‑to‑de Bruijn conversion that simplifies variable handling inside the DSL, a technique described in a separate technical note.
Practical implications
- Performance – Benchmarks published by the authors show speed‑ups of 5‑10× on GPUs for embarrassingly parallel kernels (e.g., vector addition, matrix multiplication). Real‑world workloads that require irregular memory access or dynamic control flow still suffer from the regular‑array restriction.
- Portability – Because the same Haskell source can target both CPU and GPU, developers can prototype on a laptop and later switch to a GPU cluster without rewriting code. However, the back‑end selection is explicit (
runvsrun1), so automatic device discovery is not yet built in. - Tooling – The project integrates with standard Haskell tooling (Cabal, Stack, GHCup). Debugging generated LLVM/PTX code is possible but requires stepping outside the usual Haskell REPL workflow.
- Ecosystem fit – Accelerate fills a niche between low‑level CUDA bindings (e.g.,
cudapackage) and high‑level data‑frame libraries (e.g.,frames). It is particularly attractive for researchers who already use Haskell for algorithmic prototyping and need a way to evaluate performance on GPUs.
Limitations and open problems
- Regular array only – The DSL assumes dense, rectangular data layouts. Irregular structures (sparse matrices, graphs) need to be flattened or handled via the experimental
accelerate-io-repabridge, which adds overhead. - Compilation latency – The “online compilation” step can take several hundred milliseconds for non‑trivial kernels, making Accelerate unsuitable for fine‑grained, per‑iteration JIT scenarios.
- Feature gaps – Advanced CUDA features such as shared memory tiling, warp‑level primitives, or dynamic parallelism are not exposed. Users must rely on the library’s optimizer, which may not generate optimal kernels for all patterns.
- Maturity of extensions – Packages like
accelerate-fftandaccelerate-blaswrap external libraries but often lag behind the latest vendor releases, limiting their usefulness for cutting‑edge scientific code. - Community size – Development is driven by a small academic team; issue triage and PR reviews can be slow, and documentation beyond the Haddock API is sparse.
Where to start
- Add the core library to a project via
cabal add accelerateorstack add accelerate. - Choose a back‑end:
accelerate-llvm-nativefor CPU,accelerate-llvm-ptxfor CUDA. - Write a computation inside the
Accmonad, e.g. the dot‑product example from the README. - Execute with
run(CPU) orrun1(GPU) fromData.Array.Accelerate.LLVM.PTX. - Explore the
accelerate-examplesrepository for ready‑made kernels; the mandelbrot example is a good sanity check for GPU output.
Outlook
Accelerate demonstrates that Haskell can generate performant GPU code without abandoning its pure functional core. The library’s design—type‑safe DSL, LLVM back‑ends, and a modular extension system—offers a solid foundation for future work, such as adding support for AMD GPUs via ROCm, exposing more low‑level CUDA controls, or integrating with heterogeneous runtimes like SYCL. Until those extensions arrive, users should treat Accelerate as a research‑grade tool: excellent for prototyping regular, data‑parallel algorithms, but not yet a drop‑in replacement for production‑grade GPU libraries.

Comments
Please log in or register to join the discussion