DHS, Dirty Data, and the Quiet Erosion of Technical Guardrails in US Watchlisting Systems

Main article image

When a Misconfiguration Becomes a Doctrine

On paper, this was a narrow technical exercise.

In 2021, a field officer in the Department of Homeland Security’s Office of Intelligence & Analysis (I&A) sought a bulk extract of Chicago Police Department gang records—a dataset already publicly infamous for being racist, error-ridden, and operationally unsound. The goal: test how local “street-level” intelligence might feed into federal watchlists and border screening systems, including parallel efforts to flag transnational gang and cartel actors.

By late 2023, the experiment had collapsed. Nearly 800 files on roughly 900 Chicagoland residents sat for months on a DHS server in violation of an intelligence oversight body’s deletion order. Signatures were missing, reporting deadlines ignored, and domestic-person data retention limits breached. I&A eventually purged the data only after internal scrutiny, while offering little public transparency.

For most people, it reads like another opaque government scandal. For anyone building large-scale data, AI, or security infrastructure, it’s something else: a real-world, high-stakes postmortem on what happens when:

unverified upstream data is ingested into critical decision systems,
governance and deletion policies exist only on paper,
and system architecture makes it cheap to fuse sensitive datasets but expensive to prove compliance.

This isn’t a story about one dataset that should’ve been deleted faster. It’s about how the emerging technical stack for US watchlisting—and, increasingly, AI-assisted enforcement—leans on brittle guardrails that engineers in the private sector would recognize as design flaws.

Source: All factual information in this article is drawn from WIRED’s reporting: “DHS Kept Chicago Police Records for Months in Violation of Domestic Espionage Rules,” September 2024.

A Dirty Dataset Looking for an Algorithm

Before DHS ever touched it, Chicago’s gang data was a warning label.

Audits had found:

Individuals listed with impossible ages (born before 1901 or appearing to be infants).
People tagged as gang members with no specified gang.
Occupations recorded as “SCUM BAG,” “TURD,” or simply “BLACK.”
No consistent process for notification, appeal, or removal.
Roughly 95% of labeled individuals were Black or Latino.
Data sprawled across at least 18 systems; Chicago PD could not definitively account for all gang-related records.

Despite this, the data had real operational impact. It informed prosecutions, bail decisions, sentencing, and immigration actions. Immigration authorities accessed it more than 32,000 times over a decade, exploiting a carve-out for "known gang members" that undercut Chicago’s sanctuary policies.

From a technical standpoint, this is the nightmare input to any risk-scoring, watchlisting, or AI classification pipeline:

biased labeling,
uncontrolled duplication,
no authoritative source of truth,
no lifecycle governance.

Yet in 2021, following an FBI move to expand its Transnational Organized Crime Actor Detection Program (TADP) list to include the Chicago-born Latin Kings, I&A requested a bulk extract to "fully exploit the list." The point was to see if Chicago PD records could help flag suspected gang members at borders, airports, and during law enforcement encounters.

In other words: route a deeply contaminated municipal dataset into a federal, semi-opaque targeting infrastructure.

Architecture Without Ownership

The Chicago data transfer triggered DHS’s Data Access Review Council (DARC), which set two core conditions:

Delete all US-person data within one year.
File a six-month usage report for oversight.

Neither happened.

Key failure points, as documented in internal reviews and reported by WIRED:

The initiating field officer left in early 2022; their replacement arrived eight months later—an unmanaged gap for a sensitive program.
The required approval was signed not by the under secretary for I&A (or authorized surrogate), but by the CIO—contrary to policy, with no clear rationale.
No senior leaders appeared aware that formal terms and conditions were active.
Required reporting and audits were never completed.
When the April 2023 deletion deadline passed, no extension was requested.
At least 797 documents remained in violation of rules meant to prevent domestic intelligence collection on US citizens and lawful permanent residents.

This is the systems anti-pattern many of you will recognize:

Governance coupled to individuals, not infrastructure.
Critical flows approved via ad hoc signatures instead of verifiable workflows.
No runtime observability over where sensitive data lives, how it’s used, or whether it’s expired.

In February 2024—only after the breach was documented—I&A leadership imposed new training and process requirements for bulk transfers. Yet a July GAO report concluded that I&A still lacks basic mechanisms to track intelligence collection and use. It took the office 12 years to produce a consolidated intelligence budget mandated by law. A separate WIRED story revealed that misconfigurations in DHS’s Homeland Security Information Network exposed I&A reports to thousands of unauthorized users.

These aren’t cosmetic problems. They’re fundamental implementation gaps in:

data lineage;
role-based and attribute-based access control;
configuration management;
policy-as-code enforcement.

If this were a commercial cloud service handling regulated data, it would be a case study in how to fail a compliance audit.

Watchlists, AI Ambitions, and Sanctuary Workarounds

Zoom out from the Chicago dataset, and the picture gets more structural.

The federal watchlisting ecosystem hinges on systems like:

the Terrorist Screening Dataset (TSD), a consolidated watchlist used in travel and border vetting;
the Transnational Organized Crime Actor Detection Program (TADP), which runs alongside TSD to flag cartel and gang actors.

Both are increasingly entwined with broader data fusion efforts, as DHS and other agencies pursue:

cross-system linkages previously kept separate by policy design,
machine learning models over public, commercial, and government data,
and automated risk scoring at the edges: airports, ports of entry, local police encounters.

A March 2025 executive order encouraged agencies to “eliminate information silos across the government,” while DHS’s own AI roadmaps champion integrated data for enforcement and “enhanced screening.”

On a whiteboard, this looks efficient: fewer silos, richer features, better detection.

In practice, as this case shows, it also:

creates a technical path for federal actors to route around local sanctuary rules (federal intelligence can ingest and repackage local data, then share outputs with immigration enforcement);
amplifies the risks of ingesting biased, unvalidated, or rights-violating datasets into durable, hard-to-contest designations;
depends on precise legal and technical boundaries (e.g., no US citizens or LPRs in TADP) that current controls have repeatedly failed to enforce.

As Brennan Center counsel Spencer Reynolds notes, once “transnational crime” and “terrorism” are used elastically, watchlisting and AI-assisted targeting can ripple into:

family networks,
religious institutions,
mutual aid groups,
and immigrant-serving organizations.

This is the real stakes for technologists: systems we design or integrate with do not merely reflect policy; they concretize it. Weak constraints in code become aggressive powers in practice.

Lessons for Engineers Building the Next Generation of Government Tech

Developers and architects working on data platforms, AI systems, and security tooling—inside or adjacent to government—should read the Chicago/DHS episode as an implementation guide for what not to build.

Key takeaways, translated into technical requirements:

Policy must compile into code.
- Retention rules like “delete US-person data within one year” cannot live as PDF text and training slides.
- They must be:
  - encoded as machine-enforceable policies,
  - evaluated continuously,
  - logged immutably when violated.
- If a system cannot prove that all data past its retention period has been deleted or anonymized, it is not compliant—no matter how many memos say otherwise.
Data lineage is not optional at this scale.
- Every ingest—from local PD feeds to commercial brokers—should carry provenance metadata:
  - source system and collection context,
  - legal authority and constraints,
  - sensitivity and reliability scores.
- Downstream systems (watchlists, risk models, case management tools) must be able to answer: “Why is this person here?” and “Can this record legally be used for this purpose?”
Untrusted inputs demand robust validation.
- Chicago’s gang records were known-bad. Any ingestion pipeline should:
  - run schema checks and sanity constraints (ages, identifiers, missing fields),
  - detect hate speech or slurs in categorical fields,
  - reject or quarantine records that fail reliability thresholds,
  - assign low-confidence weights that down-rank their influence on automated decisions.
- In ML-driven systems, biased source data must be guarded against at both labeling and inference stages.
Approvals need cryptographic, not bureaucratic, assurances.
- Critical operations—like bulk imports into watchlist-adjacent systems—should:
  - require multi-party, role-verified approvals,
  - be codified in version-controlled policy repositories,
  - and leave audit trails verifiable independently of any one office.
- A single CIO’s unexplained signature should be a system error, not a system state.
Observability is civil liberties infrastructure.
- Access logs, configuration histories, and data movement events should be:
  - complete by default,
  - tamper-evident,
  - routinely analyzed for anomalies,
  - and subject to both automated checks and independent review.
- The GAO’s finding that I&A couldn’t fully track its own intelligence activities is, in engineering terms, a monitoring and telemetry failure.
Design for contestability and remediation.
- Especially when integrating external data into systems that can impact travel, immigration, or criminal exposure, architectures should anticipate:
  - error correction,
  - origin tracing,
  - flag removal and propagation,
  - and user-accessible explanations when law permits.
- From a systems design perspective, “there is no way to fix a wrong label” is a bug, not a policy quirk.

These requirements are not idealistic. They are table stakes for any high-risk data system that aims to be defensible—to courts, to regulators, to the public, and to its own engineers.

When the Stack Outgrows the Safeguards

DHS is on track to operate with a budget north of $191 billion, pushing aggressively into AI-driven screening, cross-database fusion, and real-time risk analytics. At the same time, congressional auditors report that the department’s intelligence arm:

lacks a current, accurate map of all offices performing intelligence work;
publishes reports with factual errors and insufficient review;
has only recently attempted to centralize basic oversight.

Chicago’s gang data is now gone from I&A’s servers. But Illinois maintains its own statewide gang file (LEADS), with a fresh data-sharing agreement signed by DHS’s Enforcement and Removal Operations—a quiet reminder that the pipelines adapt faster than the controls.

For technologists, this episode should not be filed under “government being bad at paperwork.” It’s a preview of what happens when:

we scale surveillance-adjacent systems faster than we scale verifiable guardrails;
we treat civil liberties protections as external policy, not as first-class system constraints;
and we assume someone else—some oversight body, some counsel’s office—will encode the ethics our architectures leave undefined.

The more intelligent and interconnected our enforcement infrastructure becomes, the less room there is for hand-waving. Either the constraints are in the code, or they are, functionally, gone.

#WatchlistArchitecture #DataGovernance #AIandSurveillance