A new CLI tool eliminates the token cost of tool schemas by converting MCP servers and OpenAPI specs into discoverable command-line interfaces that load tools on-demand.
mcp2cli is a command-line tool that transforms any MCP server or OpenAPI specification into a discoverable CLI interface at runtime, without code generation. The tool addresses a significant inefficiency in AI agent workflows where tool schemas consume thousands of tokens on every interaction.
The Problem: Tool Schema Token Waste When AI agents connect to multiple MCP servers or OpenAPI endpoints, they typically inject the full JSON schemas of all available tools into their system prompt on every turn. For a setup with 6 MCP servers containing 84 tools, this can consume approximately 15,540 tokens before a conversation even begins. These costs repeat on every message, regardless of whether the model actually uses those tools.
Anthropic recognized this issue and built Tool Search into their API, allowing tools to be marked with defer_loading: true so they're discovered via a search index instead of loading all schemas upfront. This typically cuts token usage by 85%, but still injects full JSON schemas when tools are fetched.
How mcp2cli Works Unlike code-generation approaches, mcp2cli reads schemas at runtime and builds a CLI on the fly. Point it at a spec URL or MCP server and the CLI exists immediately. When servers add new endpoints, they appear on the next invocation without rebuilding.
The tool supports both MCP servers (via HTTP/SSE or stdio transport) and OpenAPI specifications (JSON or YAML, local or remote). It provides compact discovery with --list returning summaries at ~16 tokens per tool versus ~121 tokens for native schemas, and --help returning human-readable text that's typically cheaper than raw JSON.
Token Savings in Practice Real measurements using the cl100k_base tokenizer show dramatic reductions:
- A 120-tool MCP platform over 25 turns: 357,169 tokens saved (99% reduction)
- 30-tool MCP server over 10 turns: 34,576 tokens saved (95.2% reduction)
- Multi-server setup (80 tools) over 20 turns: 189,472 tokens saved (97.7% reduction)
Key Features
- Zero code generation - works with any spec immediately
- Provider-agnostic - works with Claude, GPT, Gemini, or local models
- OpenAPI support alongside MCP servers
- Configurable caching with TTL control
- TOON output encoding for token-efficient LLM consumption
- Built-in AI agent skill for tools like Claude Code and Cursor
The tool is available via pip install mcp2cli or can be run directly with uvx mcp2cli --help. It's MIT licensed and includes 96 tests covering various scenarios including token savings verification.
For developers tired of paying the full schema tax on every AI interaction, mcp2cli offers a practical solution that dramatically reduces context costs while maintaining full functionality.

Comments
Please log in or register to join the discussion