Researchers demonstrate how inaudible ultrasonic commands can commandeer speech‑activated assistants, exposing a practical attack vector that could affect billions of devices. The study outlines the technical method, real‑world implications, and steps the industry is taking to mitigate the risk.

Hidden Signals Can Hijack AI Voice Systems

Artificial‑intelligence‑driven voice assistants have become ubiquitous, from smart speakers on kitchen counters to in‑car infotainment hubs. Their convenience rests on a simple premise: a microphone listens for a spoken wake word, then a cloud‑based model parses the command. A recent paper from the University of California, Berkeley, and the University of Michigan shows that this premise can be subverted with a signal humans cannot hear.

The problem: ultrasonic injection attacks

Most voice assistants are designed to respond to frequencies in the 300 Hz–8 kHz range, matching typical human speech. However, the acoustic front‑ends of many devices—microphones, analog‑to‑digital converters, and digital signal processing pipelines—are not perfectly band‑limited. They can inadvertently capture energy well above 20 kHz, the upper limit of human hearing. By modulating commands onto an ultrasonic carrier (usually 20–25 kHz) and playing them through a conventional speaker, an attacker can embed a full‑length instruction that the device interprets while a bystander hears nothing.

The researchers built a proof‑of‑concept system using off‑the‑shelf hardware: a laptop, a cheap ultrasonic transducer, and a custom Python script that modulates text‑to‑speech output onto a 22 kHz carrier using amplitude shift keying. When placed a few meters from a Google Home Mini, the device executed commands such as “turn off the lights” and “order a pizza” without any audible cue.

Why it matters now

Voice assistants are increasingly integrated with critical services—smart locks, payment systems, and home automation hubs. An undetectable command could, for example, unlock a front door, approve a purchase, or trigger a car’s emergency brake. The attack does not require physical access; a malicious advertisement playing on a TV, a compromised Bluetooth speaker, or even a hidden ultrasonic beacon in a public space could serve as the launchpad.

Industry analysts estimate that over 1.5 billion voice‑enabled devices will be shipped in 2026. Even a low‑probability vulnerability translates to a massive attack surface. The study therefore raises a red flag for manufacturers, regulators, and end users alike.

Technical walk‑through

Signal generation – The attacker encodes a command using binary amplitude shift keying (ASK). Each bit is represented by the presence or absence of the ultrasonic carrier for a short interval (≈ 100 µs). The resulting waveform is a series of high‑frequency bursts.
Playback – An ultrasonic transducer, often used for dog‑training or distance‑measurement, reproduces the signal. Because the carrier is above the audible range, most listeners perceive silence.
Capture – The target device’s microphone, despite a nominal low‑pass filter, picks up the ultrasonic energy due to non‑ideal filter roll‑off. The analog front‑end may also introduce intermodulation that down‑converts part of the signal into the audible band.
Demodulation – The device’s voice‑activation pipeline performs a short‑time Fourier transform (STFT) to extract spectral features. The ultrasonic bursts appear as a distinct pattern that the downstream neural network mistakenly classifies as speech.
Execution – Once the hidden command passes the wake‑word detector, the usual natural‑language understanding (NLU) stack processes it, leading to the intended action.

The researchers measured a success rate of 87 % across three commercial assistants (Google Assistant, Amazon Alexa, Apple Siri) when the ultrasonic source was within 2 m. Signal strength dropped sharply beyond 4 m, suggesting that proximity remains a practical limitation.

Mitigation strategies being explored

Hardware filtering – Adding a sharper analog low‑pass filter before the ADC can attenuate frequencies above 20 kHz. Some newer microphones already incorporate such filters, but cost constraints keep many budget devices on the older designs.
Software detection – A secondary classifier can monitor the spectral envelope for anomalous high‑frequency energy. If detected, the system either discards the input or prompts the user for confirmation.
Challenge‑response – Requiring a secondary, user‑visible cue (e.g., a light flash) before executing high‑risk commands adds a human verification step without sacrificing usability for routine tasks.
Regulatory guidance – The IEEE Standards Association is drafting a recommendation (IEEE P2801) that outlines best practices for acoustic security in consumer IoT devices. The draft suggests a minimum attenuation of 40 dB at 20 kHz for all production‑grade microphones.

Industry response

Amazon, Google, and Apple have each issued brief statements acknowledging the research and pledging to review their acoustic pipelines. In a recent blog post, Google noted that “future firmware updates will incorporate additional signal‑validation checks to reduce the likelihood of ultrasonic injection.” Apple’s security team, meanwhile, is reportedly testing a hardware‑level filter on the next generation of HomePod.

Venture capitalists have taken note. Two startups—AcoustiGuard and SilentShield AI—raised $12 million combined in seed rounds to commercialize ultrasonic detection modules for OEMs. Their approach blends a tiny DSP chip with a machine‑learning model trained on millions of benign and malicious audio samples.

What users can do today

Keep devices away from unknown speakers – If a Bluetooth speaker or TV is playing unexpected content, mute or disconnect it.
Update firmware regularly – Manufacturers often roll out patches that tighten audio preprocessing.
Consider physical mute switches – Many smart speakers now include a hardware button that disables the microphone entirely.

Looking ahead

The hidden‑signal attack illustrates a broader truth: as AI interfaces become more seamless, the attack surface expands in directions that traditional security models rarely consider. Acoustic channels, once thought to be harmless, now demand the same scrutiny as network ports.

Future research will likely explore cross‑modal attacks—combining ultrasonic commands with visual cues to bypass multimodal authentication—and the development of standardized acoustic security benchmarks.

For a deeper dive, see the full paper “Ultrasonic Command Injection Attacks on Voice Assistants” on the arXiv preprint server.

Hidden Signals Can Hijack AI Voice Systems

Hidden Signals Can Hijack AI Voice Systems

The problem: ultrasonic injection attacks

Why it matters now

Technical walk‑through

Mitigation strategies being explored

Industry response

What users can do today

Looking ahead

Comments