Article 10 of the EU AI Act requires providers of high-risk AI systems to implement data governance and management practices for training, validation, and testing datasets. These practices must address data quality criteria including relevance, representativeness, accuracy, and completeness, with specific obligations around bias examination and privacy protection.
EU AI Act Article 10: Data and Data Governance for AI Systems
Scope and Purpose of Article 10
Article 10 of Regulation (EU) 2024/1689 addresses one of the foundational challenges in AI system development: the quality and governance of data used to train, validate, and test high-risk AI systems. The provision recognises that the performance, safety, and fundamental rights impact of an AI system are directly shaped by the data on which it is built. Poor data governance can lead to biased outputs, inaccurate predictions, and discriminatory outcomes that undermine the objectives of the Regulation.
The requirements apply to high-risk AI systems that employ techniques involving the training of AI models with data. This encompasses the vast majority of modern machine learning systems, including supervised learning, unsupervised learning, and reinforcement learning approaches. For high-risk AI systems that do not involve training in the traditional sense, Article 10(6) still imposes data governance requirements on input data where relevant to the intended purpose.
Data Quality Criteria
Article 10(3) establishes specific data quality criteria that training, validation, and testing datasets must meet. These criteria are central to compliance and include relevance, representativeness, accuracy, and completeness. Each criterion serves a distinct function in ensuring that the AI system performs as intended without causing undue harm.
Relevance requires that the data used bears a meaningful connection to the intended purpose of the AI system and the conditions under which it will operate. Data that is outdated, collected from unrelated contexts, or that does not reflect the deployment environment may fail this criterion. Representativeness demands that datasets adequately reflect the characteristics of the persons, groups, or settings on which the AI system will be used. This is particularly important for systems that affect diverse populations, where underrepresentation of certain groups in training data can lead to disparate performance.
Accuracy requires that data be free from material errors that could compromise the AI system's performance or lead to incorrect outputs. Completeness means that datasets must be sufficiently comprehensive to support the AI system's intended functionality without significant gaps that could undermine reliability. These criteria must be assessed in light of the intended purpose of the specific high-risk AI system and the specific geographical, contextual, behavioural, or functional setting within which the system is intended to be used.
Bias Examination and Mitigation
Article 10(2)(f) introduces a specific obligation to examine training, validation, and testing datasets in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination. This requirement goes beyond general data quality and targets one of the most widely discussed risks of AI systems.
The bias examination must consider biases that may arise from multiple sources. These include historical bias embedded in datasets that reflect past discriminatory practices, representation bias where certain groups are over- or under-represented, measurement bias where data collection instruments or processes systematically distort information, and aggregation bias where data that should be disaggregated is treated as homogeneous.
Where biases are detected, providers must take appropriate measures to address them. Article 10(5) permits the processing of special categories of personal data referred to in Article 9(1) of Regulation (EU) 2016/679 (GDPR) and Article 10 of Directive (EU) 2016/680 to the extent that it is strictly necessary for the purposes of ensuring bias detection and correction. This represents an important intersection between the AI Act and existing data protection law, providing a legal basis for processing sensitive data where it is needed to prevent discriminatory AI outcomes.
Privacy Considerations and GDPR Intersection
Article 10 operates within the broader framework of EU data protection law, and its requirements must be fulfilled in compliance with Regulation (EU) 2016/679 (GDPR), Directive (EU) 2016/680, and Regulation (EU) 2018/1725. This means that data governance practices for AI training data must respect the principles of data minimisation, purpose limitation, storage limitation, and the rights of data subjects.
The interaction between AI training data requirements and GDPR obligations creates specific compliance challenges. For example, the Article 10 requirement for representative and complete datasets may tension with the GDPR principle of data minimisation under Article 5(1)(c). Providers must navigate this balance carefully, collecting sufficient data to meet AI Act quality requirements while not processing more personal data than is necessary.
Article 10(5) provides a specific legal framework for processing special categories of personal data for bias detection and correction purposes. However, this processing is subject to strict conditions: it must be strictly necessary, adequate safeguards must be in place for fundamental rights including technical limitations on re-use and security measures, the personal data must not be transmitted or transferred to other parties, and the personal data must be deleted once the bias has been corrected or the retention period has expired.
Documentation and Statistical Properties
Article 10(2) requires providers to implement data governance and management practices that address a comprehensive list of elements. These include the design choices for datasets, data collection processes and the origin of data, the relevant data preparation operations such as annotation, labelling, cleaning, updating, enrichment, and aggregation, the formulation of relevant assumptions regarding the information that the data is supposed to measure and represent, and an assessment of the availability, quantity, and suitability of the datasets needed.
Providers must document the statistical properties of their datasets, including the characteristics and distribution of data points, the presence of any gaps or shortcomings, and the measures taken to address them. This documentation forms part of the technical documentation required under Article 11 and Annex IV, and must be maintained throughout the lifecycle of the AI system.
The documentation requirements serve multiple purposes. They enable market surveillance authorities to assess compliance, support internal quality management processes, and provide deployers with the information they need to use the AI system appropriately. Thorough documentation also facilitates the post-market monitoring obligations under Article 72, enabling providers to track data-related issues that emerge during real-world deployment.
Practical Implementation Considerations
Implementing Article 10 requirements in practice demands structured data management processes that many organisations will need to develop or enhance. Providers should establish clear data governance frameworks that assign responsibilities for data quality, define processes for data collection and preparation, and create mechanisms for ongoing monitoring of data quality throughout the AI system lifecycle.
For organisations that rely on third-party datasets, compliance with Article 10 requires due diligence on the quality, provenance, and governance practices applied to those datasets. The provider of the high-risk AI system retains responsibility for ensuring that all datasets used meet the requirements of Article 10, regardless of whether those datasets were created internally or obtained from external sources.
Providers should also consider implementing automated data quality monitoring tools that can flag potential issues in training, validation, and testing datasets. Regular audits of datasets against the Article 10 criteria can help ensure ongoing compliance and identify emerging data quality issues before they affect the AI system's performance in deployment.
Check your AI compliance readiness — free.
Take the Readiness Check 3 minutes · 10 questions · no signup requiredThis article is for informational purposes only and does not constitute legal advice. Regulatory requirements change frequently — verify current rules with official sources. Built by Sawai Gyoseishoshi Office, Hiroshima, Japan.