Verity MCP Server Turns a DIY Homelab into a Fact‑Checking Powerhouse
#AI

Verity MCP Server Turns a DIY Homelab into a Fact‑Checking Powerhouse

Hardware Reporter
7 min read

The Irish Council for Civil Liberties’ Enforce project releases Verity, a Model Context Protocol (MCP) server that layers seven verification stages onto self‑hosted LLMs. Built on a dual‑GPU 2021‑era PC, Verity lets hobbyists and small enterprises run a strong‑critic model alongside their primary LLM, dramatically cutting hallucinations without cloud fees.

Verity MCP Server Turns a DIY Homelab into a Fact‑Checking Powerhouse

Featured image

Why a Local Fact‑Checker Matters

Large language models are notorious for “hallucinating” – they can produce confident statements that have no grounding in reality. When those models are baked into customer‑service bots, search‑result summarizers, or even judicial‑assist tools, a single false claim can have serious consequences. The Irish Council for Civil Liberties (ICCL) Enforce project tackles this problem at the hardware level with Verity, an MCP server that adds a seven‑layer verification pipeline to any self‑hosted LLM.

Architecture at a Glance

Layer Role Model / Tool Key Metric
1️⃣ Strict Fact‑Sourcing Rules Enforces citation style, rejects uncited claims Rule‑engine (Python) 0 % tolerance for missing source URLs
2️⃣ Strong Critic LLM Provides a high‑capacity, cross‑family sanity check IBM Granite 3.2 8B (Q4_K_M) 78 % reduction in hallucinations
3️⃣ Small Critic LLM Faster, lower‑memory sanity check on the same input IBM Granite 2B (Q4_K_M) 45 % reduction, < 0.5 s latency
4️⃣ Entailment Encoder Scores premise‑hypothesis consistency TinyBERT‑entail (256 M) 0.92 AUC on SNLI test set
5️⃣ Regex Evaluator Catches format‑specific errors (dates, UUIDs, etc.) Hand‑crafted patterns 99 % precision on known patterns
6️⃣ Stochastic Re‑Sampler Re‑generates low‑confidence tokens for a second opinion Sampling temperature 0.7 12 % drop in token‑entropy outliers
7️⃣ Log‑Prob Analyzer Flags outputs with unusually flat probability distributions Custom log‑prob script 0.03 % false‑positive rate

The pipeline runs sequentially but the two GPU cards allow the primary LLM and the strong critic to operate in parallel, keeping end‑to‑end latency under 1.2 seconds for a typical 150‑token query.

Reference Build – What You Need

The Verity team supplies a “reference build” that balances cost, power draw, and compatibility. It assumes a 2021‑era desktop chassis with two GPUs:

Component Model Year VRAM Typical Power Draw
Primary GPU Nvidia RTX 5070 Ti (16 GB) 2025 16 GB GDDR6 250 W
Critic GPU AMD Radeon RX 5700 XT (8 GB) 2019 8 GB GDDR6 225 W
CPU AMD Ryzen 7 5800X 2020 105 W
Motherboard B550 chipset, dual‑PCIe 4.0 slots 2020 30 W
RAM 32 GB DDR4‑3200 (2 × 16 GB) 2020 15 W
SSD 2 TB NVMe (PCIe 3.0) 2021 6 W
PSU 750 W 80+ Gold 2020 5 W (idle)
Total System Power (load) ≈ 720 W

Power & Thermals

  • Peak draw: 720 W, comfortably within a 750 W Gold PSU’s 90 % efficiency envelope.
  • Thermal budget: RTX 5070 Ti runs ~78 °C under sustained 150‑token inference; RX 5700 XT stays under 70 °C thanks to its older, lower‑power architecture.
  • Noise: Two 120 mm fans on the RTX card at 1500 RPM and a 140 mm fan on the Radeon at 1300 RPM keep the chassis at ~38 dBA.

Benchmarks – Fact‑Checking in Real Time

The team measured Verity on three representative workloads:

Test Primary Model Critic Model End‑to‑End Latency Hallucination Rate (pre‑Verity) Hallucination Rate (post‑Verity)
FAQ Bot (150 tokens) Qwen 3.5 9B (Q4_K_M) Granite 3.2 8B 1.18 s 23 % 5 %
Legal Summarizer (300 tokens) LLaMA‑2‑13B Granite 2B 2.03 s 31 % 8 %
Real‑time Search Snippet (80 tokens) Mistral‑7B Granite 3.2 8B 0.78 s 19 % 4 %

All tests ran on the reference build with the two GPUs active. When the critic GPU is disabled (single‑GPU mode), latency rises by ~0.3 s and hallucination reduction drops to roughly half, but the system still outperforms a vanilla LLM by a factor of three in factual accuracy.

Compatibility & Deployment Options

Scenario Recommended Setup Notes
Full‑speed dual‑GPU homelab RTX 5070 Ti + RX 5700 XT Best for continuous inference services (chatbots, internal knowledge bases).
Single‑GPU laptop or Mac mini Any GPU with ≥ 8 GB VRAM (e.g., RTX 3060, Apple M2 Pro) Run Verity in post‑hoc mode – the primary model answers first, then the critic evaluates and flags dubious output.
Edge device AMD Ryzen 5 5600G + integrated Vega graphics Use the tiny‑critic (Granite 2B) only; expect ~2× latency but still gains a 6 % hallucination cut.

The MCP server communicates over a local gRPC endpoint, making it easy to plug into existing inference pipelines written in Python, Rust, or Go. The Verity repo includes Dockerfiles for both GPU‑accelerated and CPU‑only containers, and a Helm chart for Kubernetes deployments on a home‑lab cluster.

Building the Reference System – Step‑by‑Step

  1. Assemble hardware – Install both GPUs in PCIe 4.0 slots, connect the 12 V rail from the PSU, and attach the 2 TB NVMe.
  2. Install OS – Ubuntu 24.04 LTS with the latest NVIDIA and AMD drivers (nvidia-driver-560, amdgpu-pro).
  3. Clone Veritygit clone https://github.com/iccl-enforce/verity-mcp.git
  4. Create a virtual environmentpython3 -m venv venv && source venv/bin/activate
  5. Install dependenciespip install -r requirements.txt
  6. Download models – Use the provided model_fetch.sh script to pull Qwen 3.5 9B, Granite 3.2 8B, Granite 2B, and the entailment encoder from HuggingFace.
  7. Configure verity.yaml – Set primary_gpu: 0, critic_gpu: 1, and adjust max_batch_size based on VRAM.
  8. Run the serverpython -m verity.server --config verity.yaml
  9. Hook into your app – Point your inference client to localhost:50051 and wrap calls with the verity.client wrapper.

The entire process takes under two hours for a seasoned builder; newcomers should allocate an extra day for driver troubleshooting.

The Bigger Picture

Verity proves that hardware diversity can be a security feature. By forcing a second, independent model to critique the first, you gain a cheap, low‑latency “second opinion” without paying for external APIs. The approach also sidesteps the privacy pitfalls of sending queries to cloud providers – everything stays on‑prem.

The trade‑off is the knowledge cutoff of the local models. Unless you equip Verity with a live web‑scraper or retrieval‑augmented generation (RAG) module, it cannot verify facts that emerged after the model’s training date (mid‑2023 for Qwen 3.5 9B). The ICCL team plans an upcoming plug‑in that injects real‑time search results into the verification pipeline, which should close that gap.

Bottom Line for Homelab Builders

  • Performance – Dual‑GPU setup halves hallucination rates while keeping latency sub‑second for typical queries.
  • Power – ~720 W under load; a 750 W Gold PSU provides headroom and keeps the system quiet.
  • Cost – Using a 2025 RTX 5070 Ti (≈ $650) plus a second‑hand RX 5700 XT (≈ $120) yields a fact‑checking rig for under $800, far cheaper than a comparable cloud‑only verification service.
  • Scalability – The MCP interface scales to Kubernetes; you can spin up additional critic pods on spare GPUs for higher throughput.

If you’re already running a local LLM for internal tooling, adding Verity is the most straightforward way to turn your rig into a responsible AI assistant. The source code and detailed docs are available on the project’s GitHub page, and the ICCL team encourages community contributions to the critic‑model zoo.


For a full list of hardware requirements, benchmark logs, and the Docker deployment guide, see the official Verity MCP repository.

Comments

Loading comments...