Overview
Adversarial attacks involve making tiny, often invisible changes to an input (like an image or a piece of text) that cause the model to misclassify it with high confidence.
Examples
- Image Perturbation: Adding 'noise' to a photo of a stop sign so an autonomous car sees it as a speed limit sign.
- Textual Adversaries: Changing a few words in an email to bypass a spam filter.
Why it Matters
These attacks expose the fragility of current AI systems and are a major concern for safety-critical applications like self-driving cars and medical diagnosis.