What is real-world testing under Article 60 of the EU AI Act?

Article 60 allows providers and prospective providers of Annex III high-risk AI systems to test them in real-world conditions outside regulatory sandboxes before market placement. It requires an approved real-world testing plan, registration in the EU database with a unique identification number, informed consent of subjects under Article 61, and compliance with strict protective conditions.

How long can real-world testing of high-risk AI last?

Testing in real-world conditions may last as long as needed to achieve its objectives, but no longer than six months. It may be extended once, for an additional period of up to six months, subject to prior notification to the market surveillance authority accompanied by an explanation of the need for the extension.

Do test subjects have to consent to AI real-world testing?

Yes. Under Article 61, subjects must give freely given informed consent before participating, after receiving clear information about the testing's nature, objectives, conditions, their rights and the reversal arrangements. Consent must be documented and a copy provided. A narrow exception exists for law enforcement, migration and border contexts where consent would prevent the testing, subject to additional protections and data deletion.

Can the AI system's decisions during testing affect real people?

Outputs of the system under test must be capable of being effectively reversed and disregarded. Subjects may withdraw at any time without detriment and may request permanent deletion of their personal data. The provider remains fully liable under Union and national law for any harm caused during testing, and authorities can require modification, suspension or termination.

Is real-world testing the same as a regulatory sandbox?

No. A regulatory sandbox under Article 57 is a supervised development environment with regulatory guidance, operated by competent authorities, suited to resolving design and legal uncertainties. Article 60 real-world testing is provider-led testing of a mature system in genuine operational conditions, generating field evidence for conformity assessment. The instruments can be used in sequence.

Quick answer

Article 60 of the EU AI Act allows providers and prospective providers of Annex III high-risk AI systems to test them in real-world conditions outside regulatory sandboxes, before placing them on the market. Testing requires an approved real-world testing plan, registration, informed consent of participants under Article 61, a maximum duration of six months extendable once by six months, and full reversibility of outcomes.

Updated June 2026 · MmowW AI Compliance

EU AI Act Article 60: Real-World Testing of High-Risk AI Outside Sandboxes

Overview: Field Evidence Before Market Placement

Laboratory metrics rarely survive first contact with operational reality, and the EU AI Act acknowledges this. Article 60 creates a legal pathway for providers — and prospective providers — of high-risk AI systems listed in Annex III to conduct testing in real-world conditions outside AI regulatory sandboxes, before the system is placed on the market or put into service. The purpose is evidence: data on how the system performs with real users, real workloads and real edge cases, gathered under safeguards strict enough that the testing itself does not become the harm the regulation exists to prevent. For providers preparing conformity assessment ahead of August 2, 2026, Article 60 is one of the most practically useful provisions in the entire regulation — and one of the most procedurally demanding.

The Core Mechanics

Testing in real-world conditions runs on five structural elements:

A real-world testing plan: the provider drafts a plan covering the objectives, methodology, scope, duration, subjects involved and protective measures; the Commission specifies the plan's elements through implementing acts
Authority approval: the plan is submitted to the market surveillance authority in the Member State where testing will occur. Approval may be explicit, and where the authority does not respond within the prescribed period, the testing plan may be treated as approved unless national law excludes such tacit approval
Registration: the testing must be registered in the EU database under Article 71 with a Union-wide unique identification number before it begins, with publicly available information except for sensitive fields in law enforcement, migration and related areas
Time limits: testing may last no longer than six months, extendable once by an additional six months upon prior notification to the authority with an explanation of the need
Accountability anchors: the provider must be established in the Union or have appointed a legal representative in the Union, and remains fully liable under applicable Union and national liability law for any harm caused during the testing

Informed Consent: Article 61

The human safeguards are where Article 60 shows its debt to clinical research ethics. Subjects of the testing must give informed consent under Article 61 before participating: they must receive clear, concise information about the nature and objectives of the testing, the conditions of participation, their rights, the possible effects on them, and the arrangements for requesting the reversal or disregarding of the system's outputs. Consent must be documented, dated, and a copy given to the subject. A narrow carve-out applies to testing in law enforcement, migration, asylum and border control contexts, where seeking consent would prevent the testing from occurring — there, testing must have no negative effect on the persons concerned and their personal data must be deleted after the test.

Beyond consent, the regulation builds in protective conditions: subjects who are vulnerable due to age or disability receive appropriate additional protection; participation can be withdrawn at any time without justification and without detriment; subjects may request the immediate and permanent deletion of their personal data; and — a provision with real operational bite — the predictions, recommendations or decisions of the AI system under test must be capable of being effectively reversed and disregarded. A recruitment system under test cannot quietly reject real candidates; its outputs must remain advisory and reversible throughout the test.

Oversight, Incidents and Stopping Rules

The provider must designate and train those overseeing the testing, monitor it effectively, and remain ready to suspend or terminate. Market surveillance authorities hold inspection powers — they may request information, conduct unannounced checks, and require modification, suspension or termination of testing where conditions are breached or where risks emerge. Any serious incident during testing triggers reporting to the market surveillance authority under the logic of Article 73, and the provider must adopt immediate mitigation or suspend the testing. Article 60 also requires that testing not begin before approval and registration, and that subjects not be selected in ways that undermine the protective purpose — testing on people who are unaware, or recruiting only those least likely to complain, defeats the design and exposes the provider to enforcement.

Who Should Use Article 60 — and Who Should Not

Article 60 fits providers with a near-final Annex III system that needs operational evidence: accuracy under realistic load, human-AI interaction patterns, failure modes invisible in curated datasets. It suits employment screening tools tested with consenting applicant cohorts, triage systems shadowing real emergency workflows with reversible outputs, and educational assessment tools piloted in consenting institutions. It is the wrong instrument for early-stage development — that belongs in a regulatory sandbox under Article 57, with its guidance and data-processing basis — and unnecessary for systems that can be fully validated on historical or synthetic data. It is also distinct from deployer-led piloting after market placement: Article 60 governs the pre-market window, which is exactly what makes it valuable for conformity evidence.

Practical Steps

Decide the instrument: sandbox for unresolved design and legal questions, real-world testing for operational evidence on a mature system — or sequence the two
Draft the testing plan early against the Commission's implementing act template, with explicit stopping rules, reversal mechanisms and subject protection measures
Build the consent pipeline: information sheets in plain language, documented and dated consent, withdrawal and deletion workflows that actually function
Engineer reversibility before the test starts: every output of the system under test must be tagged, traceable and capable of being disregarded without residue in downstream systems
Register the testing, calendar the six-month clock, and pre-draft the extension notification in case it is needed
Capture results in the structure of Annex IV technical documentation so the evidence flows directly into conformity assessment

Concrete Example

A provider has built an AI system that prioritises emergency calls for a regional dispatch centre — Annex III point 5(d). Bench testing on historical call data is complete, but the conformity case needs evidence of live performance and operator interaction. Under Article 60, the provider files a testing plan with the national market surveillance authority: six months of shadow operation in two dispatch centres, with the AI's prioritisation displayed to operators as advisory only, every recommendation reversible, dispatch decisions remaining fully human, consent obtained from the participating operators, and arrangements approved for the handling of caller data. The system's recommendations are logged against actual outcomes, producing exactly the accuracy and human-oversight evidence Articles 14 and 15 demand. Two serious mismatches between AI prioritisation and clinical outcome are detected, reported, analysed and fixed — before the system ever holds real authority.

Action Before August 2, 2026

Providers intending to place Annex III systems on the market in late 2026 or 2027 should count backwards: six months of testing, preceded by plan approval cycles and consent infrastructure, preceded by reversibility engineering. That arithmetic puts plan drafting in the immediate present for many roadmaps. Watch for the Commission implementing acts specifying the testing plan elements, monitor how your national market surveillance authority handles approval in practice, and treat the registration and consent records as permanent compliance assets — they will be examined whenever the system's market history is reviewed. Real-world testing done properly is slower than informal piloting; it is also the difference between evidence a regulator accepts and anecdotes a regulator investigates.

Common Pitfalls Observed in Early Testing Programmes

Early adopters of structured pre-market testing report recurring failure patterns worth designing against. The first is consent decay: participants consent at the start of a six-month test, but staff turnover, shift changes and organisational drift mean that by month four, people are interacting with the system who never signed anything — consent management must be continuous, not ceremonial. The second is reversibility theatre: outputs are formally advisory, but interface design nudges operators into treating recommendations as decisions, which both contaminates the evidence and undermines the protective premise; testing plans should include measurement of actual reliance, not just a policy statement. The third is scope creep: a system under test acquires new features mid-test, quietly invalidating the approved plan — change control during the testing window must be as strict as in any clinical protocol, with material changes notified to the authority. The fourth is data residue: subjects exercise their deletion rights, but copies persist in analytics pipelines and backups; deletion workflows need to be engineered and tested before the first subject enrols. None of these pitfalls is exotic, and all of them are visible to an inspecting authority reviewing logs. Providers who instrument their testing programme to detect these patterns internally — before the regulator does — convert Article 60 from a procedural hurdle into what it was designed to be: the cheapest available source of truth about how a high-risk system actually behaves among real people.

Check your AI compliance readiness — free.

Take the Readiness Check 3 minutes · 10 questions · no signup required

This article is for informational purposes only and does not constitute legal advice. Regulatory requirements change frequently — verify current rules with official sources. Built by Sawai Gyoseishoshi Office, Hiroshima, Japan.