GitHub’s engineering team introduced a token‑usage auditor and optimizer that prune unused Model Context Protocol tools, replace MCP calls with GitHub CLI commands, and surface costly runs. The loop delivers up to a 62 % reduction in effective token (ET) consumption across several CI‑integrated LLM agents, offering a practical blueprint for teams seeking predictable AI‑driven CI costs.

GitHub Cuts Agent Workflow Token Spend by Up to 62 % Using Daily Audits and MCP Pruning

GitHub has published a detailed post‑mortem of the work it did to shrink token usage in the agentic workflows that run inside its own repositories. By adding a daily audit‑optimisation loop, pruning unused Model Context Protocol (MCP) tools, and swapping MCP calls for GitHub CLI invocations, the team recorded effective‑token (ET) reductions of up to 62 %. The results matter for any organization that runs large‑language‑model (LLM) agents in continuous‑integration (CI) pipelines, where hidden token consumption can become a significant portion of the cloud bill.

What’s new?

Token‑usage.jsonl artefact – Every workflow run now writes a token-usage.jsonl file that records input, output, and cache tokens for Claude, Copilot, and Codex CLIs in a single normalized format.
Effective Tokens (ET) metric – GitHub weights output tokens by 4×, cache reads by 0.1×, then applies a model‑specific multiplier (Haiku 0.25×, Sonnet 1×, Opus 5×). A 10 % ET drop translates directly to a 10 % cost reduction, regardless of the model tier.
Daily Token Usage Auditor – Aggregates token consumption per workflow, flags anomalies, and surfaces the most expensive jobs.
Daily Token Optimiser – Reads the flagged workflow’s source and recent logs, opens a GitHub issue, and proposes concrete fixes (e.g., tool removal, CLI substitution).
MCP pruning – Unused tool schemas, which can add 10–15 KB per turn, are stripped from the request payload.
CLI‑based data fetch – Pull‑request diffs and file contents are now retrieved via the gh CLI, either pre‑downloaded or proxied through a transparent HTTP layer that keeps auth tokens out of the agent’s environment.

Developer experience: how the loop works

Run – An agentic workflow executes as usual. The proxy logs every token exchange to token-usage.jsonl.
Audit – At the end of the day, the Auditor parses all artefacts, groups them by workflow, and calculates ET totals. Runs that exceed a configurable threshold are flagged.
Issue creation – For each flagged run, the Optimiser clones the workflow definition, inspects recent logs, and automatically opens a GitHub issue titled "Token optimisation for <workflow‑name>.
Recommendation – The issue body contains a diff‑style suggestion (e.g., - tool: github/mcp/issue‑comment → + # removed) and a short rationale.
Human review – Engineers review the PR, merge the change, and the next run benefits from the reduced context.

The entire cycle is packaged in the gh‑aw CLI (gh-aw audit and gh-aw optimise). Because the agents themselves appear in the daily report, teams can see the impact of each optimisation within a single CI dashboard.

User impact: measurable savings and trade‑offs

Workflow (production)	ET reduction	Comment
Auto‑Triage Issues	62 %	Removed 12 KB of unused MCP schema per turn; switched diff fetch to `gh pr diff`.
Security Guard	43 %	Pruned 8 unused tools, replaced file‑read calls with `gh api`.
Smoke Claude	59 %	Combined MCP pruning with CLI‑based content fetch.
Daily Community Attribution	37 %	Mostly due to audit‑driven caching improvements.
Contribution Check	+5 %	Increase traced to larger PRs, not a regression.

The most common inefficiency was unused MCP tools. Because LLM APIs are stateless, each request carries the full tool schema. In GitHub’s internal test suite, a server exposing 40 tools added roughly 10–15 KB of JSON per turn. Stripping tools that never get called shaved 8–12 KB off the context, which directly lowered the token count for every subsequent turn.

When pruning stops helping

The Daily Community Attribution workflow kept eight unused tools but saw no ET change after removal. GitHub explained that the tool manifests were a tiny fraction of the overall prompt size, so the savings were below the measurement granularity. This highlights a practical limit: once the context payload is dominated by the actual code or issue description, tool‑schema trimming yields diminishing returns.

Compatibility and version notes

The audit‑optimise loop works with Claude 3 Haiku, Sonnet, and Opus, OpenAI GPT‑4, and GitHub Copilot CLIs.

It requires GitHub CLI 2.38+ and the gh‑aw extension (available via gh extension install github/gh-aw).
The proxy that writes token-usage.jsonl is part of the GitHub Actions runner v2.34; older self‑hosted runners need a manual install of the proxy binary.

Broader implications for CI‑integrated LLM agents

GitHub’s approach shows that observability at the proxy layer can be combined with autonomous optimisation agents to keep token spend transparent. Two patterns emerge for teams looking to replicate the results:

Centralised token logging – Capture every LLM request and response in a machine‑readable artefact. Normalising across providers (Claude, OpenAI, Cohere) lets you compare apples‑to‑apples.
Automated remediation – Instead of a manual cost‑review meeting, let a bot surface the exact code change needed. The GitHub issue workflow provides a familiar review loop and audit trail.

Other ecosystems already provide pieces of this puzzle. Anthropic and OpenAI expose prompt‑caching APIs, while LangChain offers callback‑based token tracking. GitHub’s contribution is the closed‑loop: the same system that records usage also proposes the fix, reducing the friction between insight and action.

What to try next

Enable token‑usage artefacts in your own GitHub Actions by adding the actions/token‑usage step.
Install the gh‑aw extension and run gh aw audit --threshold 0.1 to get a first‑pass report.
Audit your MCP tool set – List all tools in your server (gh api /mcp/tools) and compare against actual usage logs.
Replace heavyweight MCP calls with gh CLI equivalents where possible. For example, gh pr view --json files can replace a GET /repos/:owner/:repo/pulls/:number/files call inside the agent.

By treating token consumption as a first‑class metric—just like CPU or memory—you can keep AI‑driven CI pipelines predictable, cost‑effective, and easier to maintain.

Mark Silvester is a Platform and Architecture Manager at Griffiths Waite, a Birmingham‑based consultancy. He focuses on cloud‑native DevOps, AI‑augmented engineering, and cost‑aware architecture.

Author photo

#DevOps #LLMs #Cloud #AI #Infrastructure

GitHub Cuts Agent Workflow Token Spend by Up to 62% Using Daily Audits and MCP Pruning

GitHub Cuts Agent Workflow Token Spend by Up to 62 % Using Daily Audits and MCP Pruning

What’s new?

Developer experience: how the loop works

User impact: measurable savings and trade‑offs

When pruning stops helping

Compatibility and version notes

Broader implications for CI‑integrated LLM agents

What to try next

Comments

GitHub Cuts Agent Workflow Token Spend by Up to 62% Using Daily Audits and MCP Pruning

GitHub Cuts Agent Workflow Token Spend by Up to 62 % Using Daily Audits and MCP Pruning

What’s new?

Developer experience: how the loop works

User impact: measurable savings and trade‑offs

When pruning stops helping

Compatibility and version notes

Broader implications for CI‑integrated LLM agents

What to try next

Comments

GitHub Cuts Agent Workflow Token Spend by Up to 62 % Using Daily Audits and MCP Pruning