The Irish Council for Civil Liberties’ Enforce project releases Verity, a Model Context Protocol (MCP) server that layers seven verification stages onto self‑hosted LLMs. Built on a dual‑GPU 2021‑era PC, Verity lets hobbyists and small enterprises run a strong‑critic model alongside their primary LLM, dramatically cutting hallucinations without cloud fees.
Verity MCP Server Turns a DIY Homelab into a Fact‑Checking Powerhouse

Why a Local Fact‑Checker Matters
Large language models are notorious for “hallucinating” – they can produce confident statements that have no grounding in reality. When those models are baked into customer‑service bots, search‑result summarizers, or even judicial‑assist tools, a single false claim can have serious consequences. The Irish Council for Civil Liberties (ICCL) Enforce project tackles this problem at the hardware level with Verity, an MCP server that adds a seven‑layer verification pipeline to any self‑hosted LLM.
Architecture at a Glance
| Layer | Role | Model / Tool | Key Metric |
|---|---|---|---|
| 1️⃣ Strict Fact‑Sourcing Rules | Enforces citation style, rejects uncited claims | Rule‑engine (Python) | 0 % tolerance for missing source URLs |
| 2️⃣ Strong Critic LLM | Provides a high‑capacity, cross‑family sanity check | IBM Granite 3.2 8B (Q4_K_M) | 78 % reduction in hallucinations |
| 3️⃣ Small Critic LLM | Faster, lower‑memory sanity check on the same input | IBM Granite 2B (Q4_K_M) | 45 % reduction, < 0.5 s latency |
| 4️⃣ Entailment Encoder | Scores premise‑hypothesis consistency | TinyBERT‑entail (256 M) | 0.92 AUC on SNLI test set |
| 5️⃣ Regex Evaluator | Catches format‑specific errors (dates, UUIDs, etc.) | Hand‑crafted patterns | 99 % precision on known patterns |
| 6️⃣ Stochastic Re‑Sampler | Re‑generates low‑confidence tokens for a second opinion | Sampling temperature 0.7 | 12 % drop in token‑entropy outliers |
| 7️⃣ Log‑Prob Analyzer | Flags outputs with unusually flat probability distributions | Custom log‑prob script | 0.03 % false‑positive rate |
The pipeline runs sequentially but the two GPU cards allow the primary LLM and the strong critic to operate in parallel, keeping end‑to‑end latency under 1.2 seconds for a typical 150‑token query.
Reference Build – What You Need
The Verity team supplies a “reference build” that balances cost, power draw, and compatibility. It assumes a 2021‑era desktop chassis with two GPUs:
| Component | Model | Year | VRAM | Typical Power Draw |
|---|---|---|---|---|
| Primary GPU | Nvidia RTX 5070 Ti (16 GB) | 2025 | 16 GB GDDR6 | 250 W |
| Critic GPU | AMD Radeon RX 5700 XT (8 GB) | 2019 | 8 GB GDDR6 | 225 W |
| CPU | AMD Ryzen 7 5800X | 2020 | – | 105 W |
| Motherboard | B550 chipset, dual‑PCIe 4.0 slots | 2020 | – | 30 W |
| RAM | 32 GB DDR4‑3200 (2 × 16 GB) | 2020 | – | 15 W |
| SSD | 2 TB NVMe (PCIe 3.0) | 2021 | – | 6 W |
| PSU | 750 W 80+ Gold | 2020 | – | 5 W (idle) |
| Total System Power (load) | – | – | – | ≈ 720 W |
Power & Thermals
- Peak draw: 720 W, comfortably within a 750 W Gold PSU’s 90 % efficiency envelope.
- Thermal budget: RTX 5070 Ti runs ~78 °C under sustained 150‑token inference; RX 5700 XT stays under 70 °C thanks to its older, lower‑power architecture.
- Noise: Two 120 mm fans on the RTX card at 1500 RPM and a 140 mm fan on the Radeon at 1300 RPM keep the chassis at ~38 dBA.
Benchmarks – Fact‑Checking in Real Time
The team measured Verity on three representative workloads:
| Test | Primary Model | Critic Model | End‑to‑End Latency | Hallucination Rate (pre‑Verity) | Hallucination Rate (post‑Verity) |
|---|---|---|---|---|---|
| FAQ Bot (150 tokens) | Qwen 3.5 9B (Q4_K_M) | Granite 3.2 8B | 1.18 s | 23 % | 5 % |
| Legal Summarizer (300 tokens) | LLaMA‑2‑13B | Granite 2B | 2.03 s | 31 % | 8 % |
| Real‑time Search Snippet (80 tokens) | Mistral‑7B | Granite 3.2 8B | 0.78 s | 19 % | 4 % |
All tests ran on the reference build with the two GPUs active. When the critic GPU is disabled (single‑GPU mode), latency rises by ~0.3 s and hallucination reduction drops to roughly half, but the system still outperforms a vanilla LLM by a factor of three in factual accuracy.
Compatibility & Deployment Options
| Scenario | Recommended Setup | Notes |
|---|---|---|
| Full‑speed dual‑GPU homelab | RTX 5070 Ti + RX 5700 XT | Best for continuous inference services (chatbots, internal knowledge bases). |
| Single‑GPU laptop or Mac mini | Any GPU with ≥ 8 GB VRAM (e.g., RTX 3060, Apple M2 Pro) | Run Verity in post‑hoc mode – the primary model answers first, then the critic evaluates and flags dubious output. |
| Edge device | AMD Ryzen 5 5600G + integrated Vega graphics | Use the tiny‑critic (Granite 2B) only; expect ~2× latency but still gains a 6 % hallucination cut. |
The MCP server communicates over a local gRPC endpoint, making it easy to plug into existing inference pipelines written in Python, Rust, or Go. The Verity repo includes Dockerfiles for both GPU‑accelerated and CPU‑only containers, and a Helm chart for Kubernetes deployments on a home‑lab cluster.
Building the Reference System – Step‑by‑Step
- Assemble hardware – Install both GPUs in PCIe 4.0 slots, connect the 12 V rail from the PSU, and attach the 2 TB NVMe.
- Install OS – Ubuntu 24.04 LTS with the latest NVIDIA and AMD drivers (
nvidia-driver-560,amdgpu-pro). - Clone Verity –
git clone https://github.com/iccl-enforce/verity-mcp.git - Create a virtual environment –
python3 -m venv venv && source venv/bin/activate - Install dependencies –
pip install -r requirements.txt - Download models – Use the provided
model_fetch.shscript to pull Qwen 3.5 9B, Granite 3.2 8B, Granite 2B, and the entailment encoder from HuggingFace. - Configure
verity.yaml– Setprimary_gpu: 0,critic_gpu: 1, and adjustmax_batch_sizebased on VRAM. - Run the server –
python -m verity.server --config verity.yaml - Hook into your app – Point your inference client to
localhost:50051and wrap calls with theverity.clientwrapper.
The entire process takes under two hours for a seasoned builder; newcomers should allocate an extra day for driver troubleshooting.
The Bigger Picture
Verity proves that hardware diversity can be a security feature. By forcing a second, independent model to critique the first, you gain a cheap, low‑latency “second opinion” without paying for external APIs. The approach also sidesteps the privacy pitfalls of sending queries to cloud providers – everything stays on‑prem.
The trade‑off is the knowledge cutoff of the local models. Unless you equip Verity with a live web‑scraper or retrieval‑augmented generation (RAG) module, it cannot verify facts that emerged after the model’s training date (mid‑2023 for Qwen 3.5 9B). The ICCL team plans an upcoming plug‑in that injects real‑time search results into the verification pipeline, which should close that gap.
Bottom Line for Homelab Builders
- Performance – Dual‑GPU setup halves hallucination rates while keeping latency sub‑second for typical queries.
- Power – ~720 W under load; a 750 W Gold PSU provides headroom and keeps the system quiet.
- Cost – Using a 2025 RTX 5070 Ti (≈ $650) plus a second‑hand RX 5700 XT (≈ $120) yields a fact‑checking rig for under $800, far cheaper than a comparable cloud‑only verification service.
- Scalability – The MCP interface scales to Kubernetes; you can spin up additional critic pods on spare GPUs for higher throughput.
If you’re already running a local LLM for internal tooling, adding Verity is the most straightforward way to turn your rig into a responsible AI assistant. The source code and detailed docs are available on the project’s GitHub page, and the ICCL team encourages community contributions to the critic‑model zoo.
For a full list of hardware requirements, benchmark logs, and the Docker deployment guide, see the official Verity MCP repository.

Comments
Please log in or register to join the discussion