Hades Malware Hides Payloads Behind Fake Weapon Prompts That Trip AI Scanners' Safety Filters

The Hades supply-chain campaign now embeds prompt-injection comments asking AI scanners to build nuclear and biological weapons. The goal isn't to get an answer. It's to trigger the bot's refusal so it stops reading before reaching the real payload. The upgraded campaign spans an estimated 37 Python and 106 JavaScript packages.

A malware campaign tracked as Hades has added a clever piece of social engineering aimed not at humans but at the AI models increasingly used to vet software packages. Instead of trying to slip past a scanner's logic, the malware tries to make the scanner refuse to look at all.

The technique is almost insulting in its simplicity. Some of the malicious JavaScript files carry a code comment that addresses the reviewing AI directly, telling it that it is now operating in an unrestricted mode with no safety guidelines. The comment then asks the model to produce detailed instructions for building biological and nuclear weapons. No competent scanning model is going to comply, and that is precisely the point. The refusal itself is the exploit.

The attack works by making the bot quit early

When a safety-tuned model hits content like that, it triggers its failsafe and halts. An X user demonstrated the effect by feeding the file to Anthropic's Fable, which returned the familiar "Chat paused" message. Because the model stops reading at the injected prompt, it never reaches the section of the file where the actual malicious payload sits. The scanner reports that it cannot continue, the developer shrugs, and the package gets a pass.

Malware hiding

In AI terminology this is an adversarial attack, and on its own it is not expected to be broadly reliable. Purpose-built scanning models will likely be configured to keep parsing rather than bail out on offensive content. But the campaign is betting on the cursory check, the developer who pastes a freshly installed package into a chat window and asks whether it contains malware, then trusts a cheerful "you're good to go." Automated checks wired into CI/CD pipelines could fall for the same trick if they rely on a single conversational pass.

The security firm Socket, which documented the campaign in its blog and threat research, notes that every other analysis method still functions normally. Pattern matching, static parsing of the source, entropy checks that flag randomized blocks likely to conceal a payload, and dynamic execution inside a sandbox all see straight through the ploy. The prompt-injection trick only defeats one narrow class of review, but the attackers seem to operate on the principle that any incremental gain in evasion is worth the few lines of comment it costs them.

A campaign that keeps leveling up

The prompt injection is the eye-catching addition, but it sits alongside a set of more conventional upgrades that make Hades harder to catch. The malware now ships a self-wipe trigger that fires under various conditions, with sandbox detection being a common one. If it senses it is being executed in an isolated analysis environment, it destroys itself before revealing its behavior.

The authors have also split their operation across packages. In some cases the loader and the payload live in two separate packages that are typically installed together, a separation most scanners do not expect to reconstruct. They have leaned harder on precompiled binaries, which raise no suspicion in performance-sensitive Python packages where compiled extensions are routine. And critically, more of the malicious behavior now triggers only when a package is imported and run in the target's code rather than at install time. That shift defeats scanners that watch installation hooks but never actually exercise the imported module.

The loot list has grown considerably

Earlier versions of the campaign focused mainly on CI/CD credentials. The current iteration is far greedier. It now harvests npm, PyPI, RubyGems, JFrog, and Kubernetes service account tokens, AWS temporary credentials, SSH keys, Docker configurations, shell histories, .env files, and configuration files for AI developer tools. That last category is notable given the target population.

Bruno Ferreira

Hades and its relatives concentrate on development packages used in scientific and machine-learning work. As of this writing the expanded campaign covers an estimated 37 Python and 106 JavaScript packages, including several typo-squats designed to catch fat-fingered installs. One example swaps "requests" for "rsquests," a single transposed character that a hurried engineer can easily miss.

Why the target audience is the soft spot

The uncomfortable subtext is that the people most likely to install these packages are not necessarily the people most equipped to spot the swap. Scientific and AI engineers command high salaries and deep domain expertise, but security hygiene around package authorship, name verification, and credential handling is frequently an afterthought rather than a habit. A typo-squatted dependency only works because someone trusted the name without checking the source.

The defenses against this campaign are not exotic. Pin dependencies to known-good versions, verify package names and maintainers before installing, scan with tools that parse and execute rather than merely chat, and run untrusted code in disposable sandboxes. The prompt-injection trick is a reminder that as more of the review pipeline gets handed to language models, attackers will probe the seams of those models' behavior, including their refusals, with the same persistence they once reserved for antivirus signatures. For teams that ship software through public registries, the Socket research and standard supply-chain hardening guidance are the practical starting points.