Public Voter Records Create Privacy Risks Through Data Linking
#Privacy

Public Voter Records Create Privacy Risks Through Data Linking

Privacy Reporter
5 min read

Research reveals how publicly available voter data can be combined with other datasets to re-identify individuals, exposing personal information and creating potential for discrimination, identity fraud, and targeting of vulnerable populations.

Public voter records, while intended to be transparent, can become powerful tools for re-identification when linked with other data sources, according to new research that highlights significant privacy risks in our current data ecosystem.

The Research Findings

Noah M. Kenney, founder of consultancy Digital 520, analyzed publicly available voter records from Travis County, Texas, and Robeson County, North Carolina, to demonstrate how seemingly limited information can be combined with other datasets to identify individuals with alarming accuracy.

"I picked two different counties that kind of represented opposite ends of the spectrum," Kenney explained. "In Texas, they hide a lot of information and then North Carolina makes a lot of it public in terms of the specific records. And what I was looking at specifically is if you go and merge this data set or link this data set with other data sets, how likely are you to be able to re-identify a person?"

The research, detailed in a paper titled "Public Voting Records: A Record, or an Attack Surface?", reveals several concerning findings:

  • Name and ZIP code uniquely identify 95.81% of Texas voters and 87.79% of North Carolina voters
  • 88.53% of North Carolina voters with phone numbers have numbers unique within the county
  • Among Travis County voters who have participated in 20 or more elections, 98.4% have unique turnout patterns that serve as digital fingerprints
  • Texas' redaction of date of birth as a privacy measure is undermined by other data, with 28% of voters uniquely identifiable when combining ZIP code and gender
  • The Travis County voter file exposes 320 deployed military families through the publication of APO/FPO codes for military mailings

Featured image

The Science of Re-identification

More than 25 years ago, research by Harvard professor Latanya Sweeney demonstrated that most of the US population (87%) could be identified with just three anonymous data points – a five-digit ZIP code, gender, and date of birth. This foundational research has become even more relevant with the advent of advanced AI tools that can analyze and correlate data with unprecedented speed and accuracy.

In one practical example, Kenney used a Python script to link Texas voter records with Federal Election Commission contribution data. When analyzing 500 contribution records from ZIP code 78704 (covering Austin neighborhoods), he found that 52.49% of contributors could be uniquely matched to voter records using just name and ZIP code. With the sophisticated tools used by commercial data brokers, this match rate could reach 90-95%.

Potential Harmful Scenarios

The ability to re-identify individuals from voter data creates numerous potential harms:

  • A foreign intelligence service could identify family members of deployed military personnel by cross-referencing voter records and social media
  • Employers could screen job applicants based on political affiliation by analyzing primary ballot history
  • Identity fraud rings could target voters whose mail has been returned (indicated by voter file suspense codes) to take over addresses using bogus change-of-address requests
  • Political campaigns could create detailed voter profiles for targeted messaging and potential manipulation

Currently, there is no comprehensive federal privacy law protecting voter data in the United States. While many states have privacy rules, implementation varies significantly across jurisdictions.

"Even within a specific state, most of the counties are individually handling these public records requests, so they all handle them differently across the country," Kenney noted. "Some of them, you can't get them. Some of them, you need an ID to get them. Some of them you have to go through a request process for public records or you have to pay for them."

The European Union's General Data Protection Regulation (GDPR) and California's Consumer Privacy Act (CCPA) provide some frameworks, but voter data falls into a gray area where specific protections are limited. Under GDPR, the "right to be forgotten" and data minimization principles could theoretically apply, but enforcement is challenging when data is already public.

Kenney argues that access controls represent a better solution than redacting certain data fields, pointing to his findings that show redaction doesn't necessarily protect against privacy harms. His recommendations include:

  • Implementing rate limits on bulk file requests
  • Requiring identity verification for data access
  • Maintaining audit logs of all requests
  • Prohibiting commercial resale of these records
  • Generalizing voter registration dates to a year rather than a specific day
  • Excluding armed forces mailing codes from voter rolls
  • Allowing individuals to opt out of inclusion in public datasets

Legislative Efforts

Last week, House Republicans introduced the Secure Data Act in an effort to create federal privacy rules. However, the bill is considered significantly weaker than many state regulations and has little chance of passing in its current form.

"The industry consensus is that the likelihood of it passing is extremely low, at least in its current form," Kenney said. "This represents the third attempt to pass comprehensive data privacy in recent years, most recent being the American Data Privacy and Protection Act, which failed to pass."

Brober Implications

The research highlights a fundamental tension in democratic societies: the need for transparency in electoral processes versus the right to privacy for individuals. As data collection and analysis capabilities continue to advance, this tension will become increasingly difficult to manage without comprehensive privacy frameworks that address the unique challenges of public records.

For individuals concerned about their privacy, the findings underscore the importance of understanding what information is publicly available and how it might be used. While complete anonymity may no longer be possible in our interconnected data ecosystem, stronger safeguards and transparency measures could help prevent the most harmful forms of re-identification.

Comments

Loading comments...