Sherlock: Open Source Tool for Monitoring LLM API Traffic and Token Usage

Sherlock provides developers with real-time visibility into LLM API interactions, tracking token consumption, context window usage, and prompt history through an intuitive terminal dashboard.

Developers working with large language models now have a powerful new debugging tool with Sherlock, an open source proxy that intercepts and visualizes LLM API traffic. The tool addresses a critical pain point in AI development - understanding exactly how token budgets get consumed during interactions with models like Claude, GPT, and Gemini.

At its core, Sherlock operates as a transparent MITM proxy that sits between your application and LLM APIs. It captures every API request and response, then displays key metrics in a terminal-based dashboard that updates in real-time. The system requires zero code changes, working through standard proxy environment variables that most LLM clients respect.

Key features include:

Token Tracking: Precise counts for prompt and completion tokens across all API calls
Context Window Monitoring: A visual "fuel gauge" showing cumulative usage against configured limits
Prompt Archiving: Automatic saving of every interaction in both human-readable markdown and raw JSON formats
Multi-Provider Support: Currently handles Anthropic's Claude API, with OpenAI and Google Gemini integrations planned

For teams managing tight token budgets, Sherlock's context window visualization provides immediate feedback on usage patterns. The color-coded display shifts from green to red as consumption approaches configured limits, helping prevent unexpected overages.

The prompt archiving system creates searchable records of every API interaction, stored in ~/.sherlock/prompts/. This historical data proves invaluable for:

Debugging unexpected model responses
Auditing prompt engineering experiments
Recreating successful interactions
Calculating precise cost projections

Installation follows standard Python package procedures, requiring Python 3.10+ and Node.js for full functionality. The setup process automatically handles certificate generation for SSL interception, with clear instructions for installing the MITM proxy CA in your system's trust store.

What makes Sherlock particularly useful for development workflows is its ability to track usage across entire sessions. Developers can see how token consumption accumulates during extended interactions, helping identify optimization opportunities in multi-turn conversations.

The project's MIT license and active development community suggest strong potential for expansion. Planned features include:

Expanded provider support (OpenAI, Gemini)
Team collaboration features
Cost projection calculators
Integration with observability platforms

For developers building on LLM APIs, Sherlock fills a crucial gap in the toolchain - providing the same level of visibility we expect when working with traditional web APIs. By surfacing the hidden costs of prompt engineering, it enables more informed development decisions and helps prevent budget surprises.

Sherlock: Open Source Tool for Monitoring LLM API Traffic and Token Usage

Comments