Unikernel‑First Email Search: OCaml, Solo5, and the Blame Project
Share this article
A New Kind of Email Search Engine
The Blame project, announced on 2025‑12‑04, showcases a fully OCaml‑based unikernel that hosts a web application for searching email archives. Built on Solo5 and the OCaml 5 compiler, Blame removes the need for the Mirage toolchain and relies on a custom workflow that compiles C sources with -ffreestanding, statically links the binary, and uses the Miou scheduler.
From Archive to Search
Blame is a synthesis of two earlier efforts: an archive system that packs emails into a block‑aligned file, and a search engine that pre‑computes word frequencies for each document. The archive is created with the blaze tool:
opam install blaze
blaze pack make -o pack.pack --align 512 <<EOF
001.eml
002.eml
003.eml
EOF
The resulting pack.pack is mounted as a block device by the Solo5 tender.
Building the Unikernel
Unlike Mirage, Blame does not perform a dependency resolution step. Instead, opam resolves the required libraries, and a small script pulls the source for each dependency so that it can be compiled with the modified OCaml compiler.
git clone https://git.robur.coop/robur/blame.git
cd blame
opam pin add --yes --no-action .
opam install --deps-only blame
./source.sh
# …
dune build
The output is a statically linked executable that Solo5 can run:
file _build/solo5/main.exe
# _build/solo5/main.exe: ELF 64-bit LSB executable, x86-64 …
Running on Solo5
A Solo5 unikernel requires a CPU, memory, a network tap, and a block device. The following commands set up a bridge and a tap interface, then launch the unikernel with 512 MB of RAM:
sudo ip link add name service type bridge
sudo ip addr add 10.0.0.1/24 dev service
sudo ip tuntap add name tap0 mode tap
sudo ip link set tap0 master service
sudo ip link set service up
sudo ip link set tap0 up
solo5-hvt --mem=512 --net:service=tap0 --block:archive=pack.pack \
-- _build/solo5/main.exe --ipv4=10.0.0.2/24 --color=always
The web application is now reachable at http://10.0.0.2/.
A Web Framework for Unikernels
Blame uses Vif, a lightweight OCaml web framework built on top of Miou. Vif provides typed routes, JSON serialization, and a DSL for handling regular‑expression paths. Its design mirrors that of the vif framework used by the builds.robur.coop website, but it is specialized for the Solo5 environment.
let run _ cidr gateway port =
let devices =
[ Mnet.stackv4 ~name:"service" ?gateway cidr ] in
Mkernel.run devices @@ fun (daemon, tcpv4, _udpv4) () ->
(* ... *)
Vifu.run ~cfg tcpv4 routes ()
The framework is intentionally minimal: it avoids heavy dependencies such as cstruct and reduces the number of functors to keep the binary small.
Compiling Frontend Assets
Because a unikernel has no filesystem, any JavaScript must be embedded in the binary. The build pipeline uses js_of_ocaml to compile the frontend, then ocaml-crunch to serialize the resulting .js file into an OCaml module that the unikernel can statically link.
# Compile the frontend
js_of_ocaml script.ml -o script.bc.js
# Serialize into OCaml code
ocaml-crunch --with-comments -f script_js:_build/default/script.bc.js > documents.ml
The documents.ml module exposes a byte array script_js, which the server serves as application/javascript.
The process of embedding JavaScript into an OCaml unikernel is a prime example of how build tooling can adapt to the constraints of a minimal runtime.
Memory‑Efficient Search
A key challenge for any email search engine is memory consumption. Blame’s strategy is to keep the unikernel’s footprint below 256 MB and push the heavy lifting to the client.
- The backend stores only the pre‑computed word frequencies and a small index.
- The frontend downloads a stem representation of each email (length, mail ID, blob ID, and token counts) via a JSON API.
- The browser then performs the scoring of search queries locally, avoiding any GC pressure on the unikernel.
This approach allows Blame to handle archives of 30 k emails (≈150 MB) while keeping the total memory usage under 140 MB, as measured with memtrace.
Demonstration
A GIF of the application in action is included below:
The animation shows a user typing a query, the browser fetching the relevant stems, and the client computing relevance scores locally.
Lessons Learned
- Unikernel Development Made Enjoyable – By eliminating the Mirage monorepo and adopting a more straightforward
opam‑centric workflow, developers can iterate faster. - Stable Memory Footprint – Unlike Mirage‑tcpip, the Solo5 stack shows no leaks; the GC maintains a stable ratio between live data and available space.
- Client‑Side Offloading – Offloading heavy data processing to the browser sidesteps the limitations of the OCaml GC in a constrained environment.
Future Directions
Blame remains experimental, primarily because it depends on still‑maturing libraries such as mnet, utcp, and vifu. Nevertheless, the project demonstrates that a full‑stack web application can run as a tiny, statically linked unikernel while still delivering a rich user experience.
The team plans to extend the archive to the Caml‑List mailing list, which contains over 65 k emails, and to refine the search algorithm to support more complex queries.
Source: https://blog.robur.coop/articles/2025-04-12-ptt-search-webapp.html