Buried Report Exposes Gaps in U.S. AI Safety Testing Framework
Share this article
Buried Report Exposes Critical Gaps in U.S. AI Safety Framework
At a pivotal cybersecurity conference last October, elite AI researchers uncovered 139 novel ways to compromise leading artificial intelligence systems—from generating dangerous misinformation to leaking sensitive data. Their red-teaming exercise, organized by the National Institute of Standards and Technology (NIST), exposed critical weaknesses in the U.S. government's freshly drafted AI Risk Management Framework. Yet the resulting report was never published, buried during the presidential transition amid concerns it might clash with the incoming administration's agenda.
The Groundbreaking Red-Teaming Experiment
Conducted at the Conference on Applied Machine Learning in Information Security (CAMLIS), the event brought together researchers from NIST's ARIA program and Humane Intelligence to stress-test cutting-edge AI tools:
- Meta's Llama open-source language model
- Anote's model-fine-tuning platform
- Robust Intelligence's (now Cisco-owned) AI security system
- Synthesia's avatar-generation platform
Participants employed NIST's draft AI 600-1 framework—designed to assess risks like misinformation propagation, data leakage, and cybersecurity vulnerabilities. Their findings revealed alarming gaps: attackers bypassed guardrails using multilingual prompts (including Russian, Gujarati, and Telugu) to extract terror-group recruitment guides from Llama and manipulated systems to expose private user data. Crucially, the exercise exposed ambiguities in NIST's own risk categories, with participants noting some were "insufficiently defined to be useful."
The Political Silencing
Despite being finalized during the Biden administration, the report joined multiple NIST AI documents withheld from publication. Sources describe an environment mirroring "climate change research or cigarette research" suppression. This aligns with the Trump administration's July 2023 AI Action Plan, which mandated removing "misinformation, Diversity, Equity, and Inclusion, and climate change" references from NIST's framework—even as it paradoxically called for more AI hackathons.
"If published, others could have learned how the [NIST] framework applies to red teaming," laments Alice Qian Zhang, a Carnegie Mellon researcher who participated. "We engaged directly with tool makers—it was invaluable."
Implications for AI's Future
The suppression carries profound consequences:
1. Security Deficits: Critical attack vectors remain undocumented, leaving developers unaware of emerging threats
2. Framework Fragility: NIST's role as a standards-bearer weakens when risk assessments become politically negotiable
3. Innovation Barriers: Silencing DEI and climate-related AI risks stifles holistic safety approaches
Ironically, while the administration cited "existential risks" like AI-enabled WMD development as justification for pivoting attention, it actively suppressed empirical research on tangible, near-term threats. As one anonymous participant noted: "Politics must have been involved. We uncovered scientific insights the community needs."
The buried report symbolizes a troubling precedent: when AI safety becomes entangled in political ideology, the entire ecosystem—developers, enterprises, and citizens—pays the price in unmitigated risks.
Source: WIRED