Buried Report Exposes Critical Gaps in U.S. AI Safety Framework

Article illustration 1

At a pivotal cybersecurity conference last October, elite AI researchers uncovered 139 novel ways to compromise leading artificial intelligence systems—from generating dangerous misinformation to leaking sensitive data. Their red-teaming exercise, organized by the National Institute of Standards and Technology (NIST), exposed critical weaknesses in the U.S. government's freshly drafted AI Risk Management Framework. Yet the resulting report was never published, buried during the presidential transition amid concerns it might clash with the incoming administration's agenda.

The Groundbreaking Red-Teaming Experiment

Conducted at the Conference on Applied Machine Learning in Information Security (CAMLIS), the event brought together researchers from NIST's ARIA program and Humane Intelligence to stress-test cutting-edge AI tools:

  • Meta's Llama open-source language model
  • Anote's model-fine-tuning platform
  • Robust Intelligence's (now Cisco-owned) AI security system
  • Synthesia's avatar-generation platform

Participants employed NIST's draft AI 600-1 framework—designed to assess risks like misinformation propagation, data leakage, and cybersecurity vulnerabilities. Their findings revealed alarming gaps: attackers bypassed guardrails using multilingual prompts (including Russian, Gujarati, and Telugu) to extract terror-group recruitment guides from Llama and manipulated systems to expose private user data. Crucially, the exercise exposed ambiguities in NIST's own risk categories, with participants noting some were "insufficiently defined to be useful."

The Political Silencing

Despite being finalized during the Biden administration, the report joined multiple NIST AI documents withheld from publication. Sources describe an environment mirroring "climate change research or cigarette research" suppression. This aligns with the Trump administration's July 2023 AI Action Plan, which mandated removing "misinformation, Diversity, Equity, and Inclusion, and climate change" references from NIST's framework—even as it paradoxically called for more AI hackathons.

"If published, others could have learned how the [NIST] framework applies to red teaming," laments Alice Qian Zhang, a Carnegie Mellon researcher who participated. "We engaged directly with tool makers—it was invaluable."

Implications for AI's Future

The suppression carries profound consequences:
1. Security Deficits: Critical attack vectors remain undocumented, leaving developers unaware of emerging threats
2. Framework Fragility: NIST's role as a standards-bearer weakens when risk assessments become politically negotiable
3. Innovation Barriers: Silencing DEI and climate-related AI risks stifles holistic safety approaches

Ironically, while the administration cited "existential risks" like AI-enabled WMD development as justification for pivoting attention, it actively suppressed empirical research on tangible, near-term threats. As one anonymous participant noted: "Politics must have been involved. We uncovered scientific insights the community needs."

The buried report symbolizes a troubling precedent: when AI safety becomes entangled in political ideology, the entire ecosystem—developers, enterprises, and citizens—pays the price in unmitigated risks.

Source: WIRED