The Irish Council for Civil Liberties’ Enforce project releases Verity, a Model Context Protocol (MCP) server that layers seven verification stages onto self‑hosted LLMs. Built on a dual‑GPU 2021‑era PC, Verity lets hobbyists and small enterprises run a strong‑critic model alongside their primary LLM, dramatically cutting hallucinations without cloud fees.

Verity MCP Server Turns a DIY Homelab into a Fact‑Checking Powerhouse

Why a Local Fact‑Checker Matters

Large language models are notorious for “hallucinating” – they can produce confident statements that have no grounding in reality. When those models are baked into customer‑service bots, search‑result summarizers, or even judicial‑assist tools, a single false claim can have serious consequences. The Irish Council for Civil Liberties (ICCL) Enforce project tackles this problem at the hardware level with Verity, an MCP server that adds a seven‑layer verification pipeline to any self‑hosted LLM.

Architecture at a Glance

Layer	Role	Model / Tool	Key Metric
1️⃣ Strict Fact‑Sourcing Rules	Enforces citation style, rejects uncited claims	Rule‑engine (Python)	0 % tolerance for missing source URLs
2️⃣ Strong Critic LLM	Provides a high‑capacity, cross‑family sanity check	IBM Granite 3.2 8B (Q4_K_M)	78 % reduction in hallucinations
3️⃣ Small Critic LLM	Faster, lower‑memory sanity check on the same input	IBM Granite 2B (Q4_K_M)	45 % reduction, < 0.5 s latency
4️⃣ Entailment Encoder	Scores premise‑hypothesis consistency	TinyBERT‑entail (256 M)	0.92 AUC on SNLI test set
5️⃣ Regex Evaluator	Catches format‑specific errors (dates, UUIDs, etc.)	Hand‑crafted patterns	99 % precision on known patterns
6️⃣ Stochastic Re‑Sampler	Re‑generates low‑confidence tokens for a second opinion	Sampling temperature 0.7	12 % drop in token‑entropy outliers
7️⃣ Log‑Prob Analyzer	Flags outputs with unusually flat probability distributions	Custom log‑prob script	0.03 % false‑positive rate

The pipeline runs sequentially but the two GPU cards allow the primary LLM and the strong critic to operate in parallel, keeping end‑to‑end latency under 1.2 seconds for a typical 150‑token query.

Reference Build – What You Need

The Verity team supplies a “reference build” that balances cost, power draw, and compatibility. It assumes a 2021‑era desktop chassis with two GPUs:

Component	Model	Year	VRAM	Typical Power Draw
Primary GPU	Nvidia RTX 5070 Ti (16 GB)	2025	16 GB GDDR6	250 W
Critic GPU	AMD Radeon RX 5700 XT (8 GB)	2019	8 GB GDDR6	225 W
CPU	AMD Ryzen 7 5800X	2020	–	105 W
Motherboard	B550 chipset, dual‑PCIe 4.0 slots	2020	–	30 W
RAM	32 GB DDR4‑3200 (2 × 16 GB)	2020	–	15 W
SSD	2 TB NVMe (PCIe 3.0)	2021	–	6 W
PSU	750 W 80+ Gold	2020	–	5 W (idle)
Total System Power (load)	–	–	–	≈ 720 W

Power & Thermals

Peak draw: 720 W, comfortably within a 750 W Gold PSU’s 90 % efficiency envelope.
Thermal budget: RTX 5070 Ti runs ~78 °C under sustained 150‑token inference; RX 5700 XT stays under 70 °C thanks to its older, lower‑power architecture.
Noise: Two 120 mm fans on the RTX card at 1500 RPM and a 140 mm fan on the Radeon at 1300 RPM keep the chassis at ~38 dBA.

Benchmarks – Fact‑Checking in Real Time

The team measured Verity on three representative workloads:

Test	Primary Model	Critic Model	End‑to‑End Latency	Hallucination Rate (pre‑Verity)	Hallucination Rate (post‑Verity)
FAQ Bot (150 tokens)	Qwen 3.5 9B (Q4_K_M)	Granite 3.2 8B	1.18 s	23 %	5 %
Legal Summarizer (300 tokens)	LLaMA‑2‑13B	Granite 2B	2.03 s	31 %	8 %
Real‑time Search Snippet (80 tokens)	Mistral‑7B	Granite 3.2 8B	0.78 s	19 %	4 %

All tests ran on the reference build with the two GPUs active. When the critic GPU is disabled (single‑GPU mode), latency rises by ~0.3 s and hallucination reduction drops to roughly half, but the system still outperforms a vanilla LLM by a factor of three in factual accuracy.

Compatibility & Deployment Options

Scenario	Recommended Setup	Notes
Full‑speed dual‑GPU homelab	RTX 5070 Ti + RX 5700 XT	Best for continuous inference services (chatbots, internal knowledge bases).
Single‑GPU laptop or Mac mini	Any GPU with ≥ 8 GB VRAM (e.g., RTX 3060, Apple M2 Pro)	Run Verity in post‑hoc mode – the primary model answers first, then the critic evaluates and flags dubious output.
Edge device	AMD Ryzen 5 5600G + integrated Vega graphics	Use the tiny‑critic (Granite 2B) only; expect ~2× latency but still gains a 6 % hallucination cut.

The MCP server communicates over a local gRPC endpoint, making it easy to plug into existing inference pipelines written in Python, Rust, or Go. The Verity repo includes Dockerfiles for both GPU‑accelerated and CPU‑only containers, and a Helm chart for Kubernetes deployments on a home‑lab cluster.

Building the Reference System – Step‑by‑Step

Assemble hardware – Install both GPUs in PCIe 4.0 slots, connect the 12 V rail from the PSU, and attach the 2 TB NVMe.
Install OS – Ubuntu 24.04 LTS with the latest NVIDIA and AMD drivers (nvidia-driver-560, amdgpu-pro).
Clone Verity – git clone https://github.com/iccl-enforce/verity-mcp.git
Create a virtual environment – python3 -m venv venv && source venv/bin/activate
Install dependencies – pip install -r requirements.txt
Download models – Use the provided model_fetch.sh script to pull Qwen 3.5 9B, Granite 3.2 8B, Granite 2B, and the entailment encoder from HuggingFace.
Configure verity.yaml – Set primary_gpu: 0, critic_gpu: 1, and adjust max_batch_size based on VRAM.
Run the server – python -m verity.server --config verity.yaml
Hook into your app – Point your inference client to localhost:50051 and wrap calls with the verity.client wrapper.

The entire process takes under two hours for a seasoned builder; newcomers should allocate an extra day for driver troubleshooting.

The Bigger Picture

Verity proves that hardware diversity can be a security feature. By forcing a second, independent model to critique the first, you gain a cheap, low‑latency “second opinion” without paying for external APIs. The approach also sidesteps the privacy pitfalls of sending queries to cloud providers – everything stays on‑prem.

The trade‑off is the knowledge cutoff of the local models. Unless you equip Verity with a live web‑scraper or retrieval‑augmented generation (RAG) module, it cannot verify facts that emerged after the model’s training date (mid‑2023 for Qwen 3.5 9B). The ICCL team plans an upcoming plug‑in that injects real‑time search results into the verification pipeline, which should close that gap.

Bottom Line for Homelab Builders

Performance – Dual‑GPU setup halves hallucination rates while keeping latency sub‑second for typical queries.
Power – ~720 W under load; a 750 W Gold PSU provides headroom and keeps the system quiet.
Cost – Using a 2025 RTX 5070 Ti (≈ $650) plus a second‑hand RX 5700 XT (≈ $120) yields a fact‑checking rig for under $800, far cheaper than a comparable cloud‑only verification service.
Scalability – The MCP interface scales to Kubernetes; you can spin up additional critic pods on spare GPUs for higher throughput.

If you’re already running a local LLM for internal tooling, adding Verity is the most straightforward way to turn your rig into a responsible AI assistant. The source code and detailed docs are available on the project’s GitHub page, and the ICCL team encourages community contributions to the critic‑model zoo.

For a full list of hardware requirements, benchmark logs, and the Docker deployment guide, see the official Verity MCP repository.

#LLMs #Hardware #Machine Learning #Open Source

Verity MCP Server Turns a DIY Homelab into a Fact‑Checking Powerhouse

Verity MCP Server Turns a DIY Homelab into a Fact‑Checking Powerhouse

Why a Local Fact‑Checker Matters

Architecture at a Glance

Reference Build – What You Need

Power & Thermals

Benchmarks – Fact‑Checking in Real Time

Compatibility & Deployment Options

Building the Reference System – Step‑by‑Step

The Bigger Picture

Bottom Line for Homelab Builders

Comments