UK's National Data Library plan faces major hurdles, study warns
#Regulation

UK's National Data Library plan faces major hurdles, study warns

Regulation Reporter
5 min read

The UK's ambitious National Data Library project to fuel AI development with public data is struggling with poor data quality, outdated information, and inadequate metadata, potentially forcing AI systems to rely on less reliable sources.

The UK's ambitious plan to create a National Data Library (NDL) that would fuel cutting-edge AI development with public data is facing significant challenges, according to a new study from the Open Data Institute (ODI). The government's vision of providing researchers and businesses with powerful insights to drive growth and transform public services through better data access appears to be hampered by fundamental issues with the quality and accessibility of existing public datasets.

Data quality problems threaten AI ambitions

The ODI's findings paint a concerning picture of the current state of public data in the UK. Datasets available through data.gov.uk suffer from misleading titles, non-existent metadata, and outdated information that makes them effectively unusable for modern AI systems. The study found that even basic terms like "crime" proved difficult to analyze properly, with local authority statistical releases that couldn't be combined due to a lack of shared standards.

Professor Elena Simperl, director of research at the ODI, explained the practical implications: "If you don't update your data, if your metadata is not good quality and has lots of missing values, we could see from our experiments with the AI agent we built that they would just circumvent the available data. It would go elsewhere on social media and other places to try to find that information in a report somewhere, because it's much easier for them."

This creates a fundamental problem for the NDL's stated goals. When authoritative government data is hard to access or poorly structured, AI systems naturally turn to alternative sources like news reports or commercial data that may be less accurate or reliable. The study's prototype, which processed over 100,000 files from six public sector sources, demonstrated that while the NDL could be built at relatively low cost, significant work is needed to make the data AI-ready.

Government investment faces practical challenges

The government has committed £100 million to the NDL as part of £1.9 billion being provided to the Department for Science, Innovation and Technology (DSIT) through 2028/29. DSIT claims to have completed an extensive discovery phase to map out "the biggest opportunities and priorities" and "test approaches to systemic reform" across the public sector.

However, the ODI's prototype revealed that even major datasets face serious accessibility issues. One Home Office crime dataset, for instance, hasn't been updated since 2018, and while an updated version exists, it cannot be accessed via the API provided by the Office for National Statistics (ONS). This kind of fragmentation and inaccessibility undermines the entire premise of a unified national data library.

Historical context of UK data sharing initiatives

The NDL isn't the UK's first attempt at creating a comprehensive national data sharing system. The Secure Research Service (SRS), launched in 2004, already offers curated, research-ready datasets to accredited researchers. In 2020, the government planned to replace this system with the Integrated Data Service (IDS) from the ONS, but this initiative faced significant setbacks.

The IDS project, which had a budget of £240.8 million, effectively had its funding cut in March 2025, although existing services continue to be available within the ONS. Some of the budget was diverted to fund more general tech and data costs as the ONS struggled with legacy IT systems. This history of failed or scaled-back initiatives adds pressure on the NDL to succeed where previous efforts have fallen short.

Government response and future plans

A government spokesperson acknowledged the findings, stating that the government wants to "maximise the benefits of public sector data" to make services "more efficient and grow the economy." The spokesperson pointed to the Roadmap for Modern Digital Government, which includes building new infrastructure like the NDL in a way that ensures public sector data is shared and used more easily.

The roadmap also promises upgrades to outdated systems and new guidance for the safe and ethical use of public data. However, the ODI's study suggests that these measures may not be sufficient without addressing the fundamental issues of data quality, standardization, and accessibility that currently plague the UK's public data ecosystem.

The path forward

The ODI's study serves as both a warning and a roadmap for the NDL project. While the prototype demonstrated that the library could be built at relatively low cost, it also highlighted the substantial work needed to make public data truly usable by modern AI systems. This includes improving metadata quality, establishing shared standards for data categorization, ensuring datasets are regularly updated, and creating APIs that actually work as intended.

Without these improvements, the NDL risks becoming another missed opportunity in the UK's efforts to leverage public data for economic growth and improved public services. The government's AI ambitions depend heavily on having access to reliable, well-structured data, and the current state of public datasets suggests there's still a long way to go before that vision can be realized.

As Professor Simperl noted, "The government's National Data Library has huge potential, but much of the data it would rely on is not yet usable by modern AI systems. If that doesn't change, there is a risk that AI tools will increasingly rely on sources that are easier to access, rather than those that are most reliable."

The success of the NDL will ultimately depend on whether the government can address these fundamental data quality issues before the project moves too far down its development path. The stakes are high, as the ability to effectively leverage public data could be a key differentiator in the global AI race.

Comments

Loading comments...