Minor Text Edits Can Turn AI Agent Skills into Weapons, Researchers Warn

Security researchers demonstrate how small changes to AI skill descriptions can manipulate agent behavior, bypass security checks, and potentially lead to data breaches and regulatory violations.

The rapid adoption of AI agents has created a new frontier for cybersecurity threats, as researchers reveal how minor edits to text-based skills can transform these digital assistants into rogue agents capable of bypassing security controls and potentially compromising sensitive data.

The Expanding Attack Surface

AI agents—models wrapped in software that can use tools and perform multi-step tasks—often take direction from text-based skills. These skills, typically written in SKILL.md files, consist of text prompts with other data and resource references that get combined with user input and system prompts to generate responses.

"Many agent frameworks allow users to install skills from online registries so the agent can discover and use new capabilities on demand," explained Soheil Feizi, computer science professor at the University of Maryland (UMD) and founder/CEO of RELAI.ai. "This is powerful, but it also creates a new attack surface."

The Vulnerability of Text-Based Skills

Unlike traditional software vulnerabilities that exist in code, these new threats emerge from natural language processing. When a model's prompt—combining user input, skill instructions, and system prompts—gets modified either inadvertently or adversarially, it creates a prompt injection vulnerability.

"Skills can effectively act as user-authorized prompt injection," Feizi noted. "And agents may also automatically retrieve and load third-party skills if their descriptions appear relevant to the task being pursued."

Regulatory Implications

This vulnerability carries significant regulatory consequences under data protection frameworks like the EU's GDPR and California's CCPA. If an AI agent manipulated through malicious skills were to process or exfiltrate personal data without proper authorization, organizations could face substantial fines—GDPR penalties can reach up to 4% of global annual turnover or €20 million, whichever is higher.

"The challenge is that these attacks don't necessarily involve traditional code vulnerabilities," said Feizi. "An attacker may not need to hide malware in executable code. Small semantic changes to a skill description can affect how the skill is discovered in a registry, whether an agent selects it over alternatives, and whether it passes governance or safety checks."

Research Findings

In a preprint paper titled "Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry," Feizi and UMD co-authors Shoumik Saha and Kazem Faghih examined how adversarial skills get discovered, selected, and vetted before execution.

The researchers demonstrated that short 20-token triggers can be added to a SKILL.md file to:

Influence the chance an agent will discover it in a registry
Affect whether an agent selects that skill
Evade detection through semantic evasion strategies

Their experiments showed concerning results:

They could induce an agent to discover their skill over an unaltered source skill 86% of the time
They succeeded in making an agent select their skill over variants 77.6% of the time
They were able to evade registry scanning defenses between 36.5% and 100% of the time

The most successful evasion strategy involved overflowing the context window of the scanner—making the skill too long for the scanner to handle properly.

"In ClawHub-style review, only the first 10K characters of long SKILL.md files are passed to the LLM reviewer, so we place the malicious instruction beyond this boundary while keeping it in the submitted skill," the authors explained.

Industry Context

This research builds on earlier findings from security firm Snyk, which discovered that 13.4% of skills on popular repositories like ClawHub and skills.sh "contain at least one critical-level security issue, including malware distribution, prompt injection attacks, and exposed secrets." With thousands of skills available in these registries, the potential attack surface is substantial.

Impact on Users and Organizations

For organizations deploying AI agents, these vulnerabilities represent significant compliance risks. If an agent processes personal data in ways that violate privacy regulations, the organization could be held responsible—even if the manipulation occurred through a third-party skill.

For individual users, the risks include potential exposure of personal information, manipulation of automated systems that control smart home devices, or financial systems. The autonomous nature of many AI agents means these attacks could occur without immediate human detection.

Recommended Mitigations

The researchers emphasize that protecting AI agents requires treating natural-language specifications as security-sensitive objects.

"We hope this encourages more careful design of skill registries, ranking mechanisms, governance pipelines, and agent-side defenses," Feizi said.

Organizations should consider implementing:

Enhanced vetting of third-party skills
Limitations on automatic skill loading
Comprehensive monitoring of agent behavior
Regular security assessments of AI agent systems
Clear policies regarding data processing by AI agents

The researchers have published their source code and supporting documentation on GitHub, which security professionals can use to understand and defend against these emerging threats.

As AI systems become more autonomous and integrated into critical infrastructure, addressing these text-based vulnerabilities will be essential for maintaining both security and regulatory compliance in the age of intelligent agents.

#AI #Security #Vulnerabilities #Prompt Injection #AI_Agents