Overview

Adversarial attacks involve making tiny, often invisible changes to an input (like an image or a piece of text) that cause the model to misclassify it with high confidence.

Examples

  • Image Perturbation: Adding 'noise' to a photo of a stop sign so an autonomous car sees it as a speed limit sign.
  • Textual Adversaries: Changing a few words in an email to bypass a spam filter.

Why it Matters

These attacks expose the fragility of current AI systems and are a major concern for safety-critical applications like self-driving cars and medical diagnosis.

Related Terms