AI in Vulnerability Management: Promising Assistant, Not Yet a Pro
Share this article
Vulnerability management is an unrelenting race against attackers, where delays in detection can mean catastrophic breaches. Seeking to accelerate this process, Intruder’s security engineers embarked on an experiment: Could AI reliably generate accurate vulnerability checks using the popular Nuclei framework? Their findings reveal a nuanced reality—AI shows promise as a productivity booster but falters without rigorous human oversight.
The Chatbot Failure and Agentic Breakthrough
Initial attempts using LLM chatbots like ChatGPT, Claude, and Gemini yielded chaotic results. Prompts to create Nuclei templates produced invalid syntax, non-existent features, and unreliable matchers. "The outputs were messy," the team noted, highlighting a critical gap in basic reliability.
The tide turned with an agentic approach, leveraging tools like Cursor’s AI agent. Unlike static chatbots, this system uses curated knowledge bases, follows strict rules, and references real Nuclei templates. By indexing a repository of high-quality examples and enforcing coding standards, the AI’s output improved dramatically. Templates began resembling those crafted by human engineers, though constant supervision was still required. This shift reframed AI’s role: not a full automator, but a force multiplier that speeds up repetitive tasks, freeing engineers for complex analysis.
Where AI Excels: Scaling Attack Surface Checks
Agentic AI proved particularly adept at scaling checks for overlooked vulnerabilities. For instance, identifying exposed admin panels—a simple yet labor-intensive task—became far more efficient. The AI rapidly generated templates for niche products missing from major scanners, expanding coverage for large-scale environments. One standout success was detecting unsecured Elasticsearch instances. Engineers provided the agent with:
- A concise task description (e.g., "Confirm unauthenticated data access via endpoints X and Y").
- Vulnerable and non-vulnerable test targets.
- Curated rules for multi-request flows.
The result? A robust Nuclei template that reliably flags exposed data without overwhelming engineers. "The agent handled the heavy lifting," the team reported, emphasizing gains in speed and coverage.
Persistent Pitfalls: Why Humans Can’t Step Back
Despite progress, the experiment exposed critical limitations:
1. Weak Matchers: AI often defaulted to superficial checks, like omitting unique favicon identifiers, risking false positives. Human intervention was essential to enforce stronger validations.
2. Token Optimization Woes: Tools like Cursor truncate curl outputs to save resources, accidentally excluding vital data for matchers.
3. Tool Ignorance: The agent occasionally overlooked Nuclei flags (e.g., -l for host lists), reinventing inefficient workflows instead of leveraging built-in features.
These issues underscore a non-negotiable truth: AI cannot yet replicate the contextual judgment of security experts. As Intruder’s engineers stress, hype about "fully automated" vulnerability checks is premature—and dangerous. False negatives or positives in security scans have real-world consequences, from wasted resources to undetected breaches.
The Path Forward: Augmentation, Not Replacement
AI’s value lies in amplifying human capability, not replacing it. For security teams, this means integrating agentic tools into workflows to handle templatizable tasks—like scaling checks for common exposures—while engineers focus on novel threats and validation. The journey ahead involves refining rules, expanding knowledge bases, and tempering expectations. In vulnerability management, AI is a powerful co-pilot, but the cockpit still needs a skilled pilot.
Source: Sponsored and written by Intruder, based on research by Benjamin Marr. Original article: Can We Trust AI To Write Vulnerability Checks? Here's What We Found