When AI Security Scanners Reach the Long Tail of Software

Perfetto's trace processor maintainer received 17 real security bugs in 10 weeks from an internal AI scanner, a glimpse of what happens when automated review finally reaches the vast middle of the software world that was never important enough for human attention.

Lalit Maganti, who maintains the trace processor inside Google's Perfetto tracing project, has spent the last several weeks doing something most open-source maintainers rarely get to do: closing a steady stream of real security bugs that nobody had the time or tooling to find before. Over roughly ten weeks he received 21 reports, 17 of which were genuine issues, all surfaced by an AI-based security scanner run by a central team somewhere inside the company. His account of the experience is one of the more grounded field reports we have on what automated vulnerability discovery actually does once it stops being a demo and starts filing tickets against working engineers.

The central claim worth taking seriously is not that AI found bugs. Fuzzers have been finding bugs in C++ parsers for a decade. The claim is about where the bugs were found. For years, the economics of security research pushed nearly all serious attention toward a short list of high-stakes targets: kernels, cryptographic libraries, password managers, the handful of components whose failure cascades into everyone else's systems. Trace processor is not on that list. It is a C++ library that parses recorded traces, usually traces you collected yourself, processed offline, on data you mostly trust. It is security-relevant, in the sense that some people do feed it traces from bug reports or dogfood users, but it is not security-critical. That distinction is exactly why it sat unexamined for nine years.

The argument

Maganti's thesis is that AI scanning changes the economics of attention rather than the ceiling of capability. The bugs that remained in trace processor after years of internal fuzzing were the ones living deep in the internals, reachable only through a precisely crafted byte sequence that random mutation would almost never stumble into. A fuzzer throws arbitrary inputs at a surface and waits for a crash; it has no model of the code's intent. An AI scanner that can read the source can reason about which sequence of conditions would walk execution into a fixed-size stack buffer whose bounds check only fires in debug builds. That is precisely one of the bugs he describes: a metadata name pulled straight from the trace, copied into a stack buffer, with the only guard compiled out of release builds. A long enough name escapes the buffer. The fix was trivial, swapping the stack buffer for a std::string, because the code path runs once or twice per trace and the heap allocation costs nothing.

The breakdown of the 17 real issues tells its own story. Ten were bounds-checking failures, arbitrary data flowing into fixed-size buffers or unchecked array indices. Five were use-after-free conditions involving back-pointers or hashmap keys outliving the objects they referenced. One was unbounded recursion on deeply nested input, a classic stack overflow. One was a missing allowlist check on a rare code path. These are, with a few exceptions, mechanical defects. They are the accumulated residue of nine years of a team writing fast in the early days and never circling back, the kind of latent fault that every large C++ codebase carries and that no one budgets time to hunt when the project isn't running in production attack paths.

Why the mechanical nature matters more than it seems

There is a quietly important observation buried in the middle of the post: the same property that makes these bugs findable by AI also makes them fixable by AI. Maganti hands the well-written report to a coding agent and gets a 10-to-20 line pull request back within ten minutes, which he then reviews line by line. The loop closes almost entirely inside automated tooling, with the human acting as the reviewer of record rather than the investigator. When the discovery cost and the remediation cost both collapse at the same time, the entire calculus of whether a marginal-risk project deserves security attention inverts. The work that was never worth a human's afternoon becomes worth a human's ten-minute review.

But he is careful not to oversell this. A few of the reports pointed at design problems rather than implementation slips. The most instructive was a use-after-free where a state object held a back-pointer that could dangle when certain trace data caused a parent to be freed before its child. The immediate patch, a callback that nulls the back-pointer on free, works but is described honestly as a horrible hack that makes object lifetimes impossible to reason about. The proper fix meant restructuring ownership so the lifetime was correct by construction. And here the story turns reflective: this was a cleanup he had known about and intended to do for nearly a year, never finding the justification. The security report supplied the push. The bug was a symptom of a deeper architectural weakness, and the scanner, by surfacing the symptom, forced the cure. Security findings, in this framing, function as a proxy signal for hacky or poorly-architected code, which means the value of scanning exceeds the literal list of vulnerabilities it produces.

Implications for the rest of the ecosystem

Maganti sketches a useful three-way split for predicting who experiences this shift and how. Projects with untrusted input that are also security-critical, the curls and kernels and OpenSSLs of the world, will see many complex reports with a higher false-positive rate, because the obvious bugs in their hot paths were already picked clean by years of human attention. Projects with untrusted input that were never seriously audited, his own category, get a wave of mechanical bugs at a manageable pace and low individual stress. Projects with no untrusted input at all, internal tools and math libraries operating only on trusted data, will barely notice. This taxonomy is more useful than the usual framing because it predicts the texture of the experience, not just its presence. The maintainer of a sleepy parser and the maintainer of a TLS library are about to have very different relationships with the same technology.

The observation that matches reports from Daniel Stenberg of curl and from Linux kernel maintainers is that report quality has risen sharply in recent months. Stenberg, who has been publicly critical of low-effort AI bug reports drowning his triage queue, has acknowledged that the newer generation of tooling produces genuinely useful findings. Maganti's reports arrived well-described, often with the attacker model already worked out and a minimal fix proposed. That quality jump is what separates a useful scanner from a denial-of-service attack on a maintainer's attention, and it is the variable most likely to determine whether this wave is remembered as help or as noise.

The counter-perspectives the author raises against himself

What makes the piece credible is that Maganti spends real effort undercutting his own optimism. He names three conditions that make his situation manageable and that do not generalize: he is paid full-time to maintain the project, someone else absorbs the cost of running the scans, and an upstream human lightly filters the reports before they reach him. Strip any of those away and the experience degrades. A volunteer maintaining a parser in their spare time, with no upstream triage, facing the same drip of three or four reports on a busy day, is in a materially worse position. He identifies this as the genuine gap: most open-source projects cannot afford a dedicated team running scans on their behalf, and telling a maintainer to stand up their own pipeline when their security risk is marginal simply restricts the benefit to the most motivated few.

He also resists the comforting assumption that the stream will stay gentle. His instinct is that it will taper to zero, because the scanner appears to sweep each part of the codebase for a day or two before moving on, repeats are rare, and the 17 issues represent a one-time harvest across nine years of accumulated code. New code is being added more slowly now than in the project's prolific early years, so the scanner should outpace fresh defects. But he flags the real unknown: whether future models will start surfacing the harder design-level problems rather than the mechanical ones. Those are the bugs that took him days and architectural restructuring to fix, not ten minutes and a type swap. A scanner that graduates from finding std::string substitutions to finding ownership-model flaws would change the stress profile entirely, and he is honest that he has no way to predict it.

The Rust question hangs over all of this, and he addresses it in a footnote with the weary precision of someone who has heard it many times. Yes, a binary parser of untrusted data should ideally be written in a memory-safe language. No, rewriting a large library embedded in hundreds of downstream tools, many in environments without a Rust toolchain, maintained by a team with no Rust expertise, is not a practical answer to a marginal security risk. The mechanical bugs the scanner found are overwhelmingly the kind that Rust's borrow checker would have prevented at compile time, which is an argument for new projects and a non-argument for this one.

What lingers after reading is less a verdict than a snapshot, which is how Maganti explicitly frames it. The long tail of software, the enormous middle that is relevant but not critical, has lived in a kind of benign neglect because attention was the scarce resource and it flowed only to the top. When the cost of attention falls toward zero, that neglect becomes visible, and nine years of latent defects surface in ten weeks. Whether the ecosystem builds the upstream infrastructure to make this manageable for the maintainers who lack a Google-scale team behind them is the open question, and it is the one that will decide whether this becomes a broad improvement in software safety or a privilege reserved for the already-resourced.