The UK government is committed to advancing artificial intelligence through the National Data Library (NDL). However, a recent study shows that the plan may face significant challenges if the availability of public datasets is not improved. A study by the Open Data Institute (ODI) pointed out that the available data has issues such as misleading titles and lack of metadata, making it difficult to use effectively in practical analysis.

In the 2024 Autumn Statement, the government confirmed the NDL plan and pledged to provide important data insights for researchers and businesses, promoting economic growth and improving quality of life. The government also announced that the project will receive £100 million in funding, part of the £1.9 billion budget allocated to the Department for Science, Innovation and Technology (DSIT) by the 2028/29 financial year.

The ODI recently launched a prototype system called "NDL-Lite," which provides access to over 100,000 public datasets. The research found that some datasets suffer from inconsistent labeling, outdated information, and difficulties in being effectively accessed by AI tools. The ODI warned that when authoritative data is lacking, AI systems may turn to other sources, such as news reports or commercial data, whose accuracy is not always guaranteed.

Although the ODI study found that building the NDL involves relatively low costs, it also emphasized the significant amount of work required to adjust the data for AI processing. The research found that even broad terms like "crime" are difficult to analyze effectively. Some datasets cannot be integrated due to the lack of shared standards, leading to difficulties in analysis.

Professor Elena Simperl from the Open Data Institute said that the gap between the quantity of public data and its actual availability is growing. She pointed out that if the government does not update the data and improve the quality of metadata in a timely manner, AI systems may seek out other more accessible information sources.

A government spokesperson said that the government hopes to "maximize the benefits of public sector data" to improve service efficiency and promote economic growth. To this end, the government is enhancing the convenience of data sharing and usage through its program to modernize digital public infrastructure.

The National Data Library is the latest initiative to help researchers and data scientists access public data. However, the ODI's research reminds people that this plan must avoid becoming a missed opportunity.

Key Points:

🔍 The NDL initiative aims to drive AI development by providing public data, but faces challenges with data availability.

💡 The ODI study shows that existing public datasets have issues such as non-standardized labels and outdated data.

📉 If data quality is not improved, AI systems may turn to unreliable information sources.