When does a GPAI model have systemic risk under the EU AI Act FLOPs threshold?

Article 51(2) presumes a GPAI model has high-impact capabilities, and therefore systemic risk, when the cumulative compute used for its training exceeds 10^25 floating point operations. The Commission can also designate models below that threshold based on the criteria in Annex XIII, such as parameters, dataset size, benchmarks and user reach.

What must a provider do when its model crosses 10^25 FLOPs?

Under Article 52, the provider must notify the Commission without delay and at the latest within two weeks of meeting the threshold or of it becoming known that the threshold will be met. The notification may include arguments that the model nevertheless does not present systemic risk.

What extra obligations apply to GPAI models with systemic risk?

Article 55 adds duties on top of Article 53: state-of-the-art model evaluation including adversarial testing, assessment and mitigation of systemic risks at Union level, tracking and reporting of serious incidents to the AI Office, and adequate cybersecurity protection of the model and its physical infrastructure. The open-source exemptions also cease to apply.

Can the Commission change the 10^25 FLOPs threshold?

Yes. Article 51(3) empowers the Commission to adopt delegated acts to amend the threshold and to supplement benchmarks and indicators in light of technological developments, such as algorithmic improvements or increased hardware efficiency.

Quick answer

Under Article 51(2) of the EU AI Act, a general-purpose AI model is presumed to have high-impact capabilities — and therefore systemic risk — when the cumulative compute used for its training exceeds 10^25 floating point operations. Providers must notify the Commission within two weeks of meeting or expecting to meet the threshold.

Updated June 2026 · MmowW AI Compliance

EU AI Act 10^25 FLOPs Threshold: When a GPAI Model Has Systemic Risk

Q: Can a provider rebut the systemic risk presumption?

Yes. The provider may present sufficiently substantiated arguments that, despite meeting the compute threshold, the model does not present systemic risk due to its specific characteristics. The Commission may accept or reject those arguments, and a reassessment can be requested no earlier than six months after the classification decision.

Why Compute Is the Trigger

Chapter V of Regulation (EU) 2024/1689 creates a two-tier regime for general-purpose AI models. All providers carry the baseline duties of Article 53; providers of models classified as presenting systemic risk additionally carry the safety-oriented duties of Article 55. The classification mechanism in Article 51 therefore decides which tier a model falls into, and the most operationally important element of that mechanism is a number: 10 to the power of 25 floating point operations (FLOPs) of cumulative training compute.

The legislator chose training compute as the trigger because it is measurable before release, correlates broadly with capability, and does not depend on subjective judgement. Recital 111 explains that the threshold should capture the most advanced models at the time of adoption while remaining adjustable: the Commission can amend the threshold by delegated act under Article 51(3) to track technological developments such as algorithmic efficiency improvements.

The Legal Mechanics of Article 51

Article 51(1) provides two routes to classification as a GPAI model with systemic risk:

Route one — high-impact capabilities: the model has high-impact capabilities evaluated on the basis of appropriate technical tools and methodologies, including indicators and benchmarks. Article 51(2) presumes such capabilities when cumulative training compute exceeds 10^25 FLOPs.
Route two — Commission designation: the Commission decides, ex officio or following a qualified alert from the scientific panel, that a model has capabilities or an impact equivalent to route one, taking into account the criteria in Annex XIII.

Annex XIII lists the designation criteria: the number of parameters; the quality or size of the dataset; the amount of compute used for training; input and output modalities; benchmark and capability evaluations; reach, including the number of registered EU business users; and the number of registered end-users. Compute is thus a presumption, not the only path: a model below 10^25 FLOPs can still be designated, and a model above it can argue its way out.

Counting FLOPs: What Goes into the Cumulative Figure

The presumption refers to the cumulative amount of computation used for training, measured in floating point operations. Commission guidance published in July 2025 clarified that this includes compute spent across the activities that materially contribute to the model's capabilities — pre-training and capability-enhancing post-training such as fine-tuning and reinforcement learning carried out by or for the provider — estimated using documented methodologies. Providers are expected to track and document their estimates as part of Annex XI technical documentation, which expressly requires the amount of compute used for training. For most providers the practical step is simple: instrument training runs and retain the arithmetic, because the burden of knowing whether the threshold is met sits with the provider.

The Two-Week Notification Duty (Article 52)

A provider must notify the Commission without delay, and in any event within two weeks, after the threshold condition is met or after it becomes known that it will be met — which can be before training finishes, since compute budgets are typically planned in advance. The notification includes the information needed to demonstrate the situation, and the provider may include arguments that, despite meeting the threshold, the model does not present systemic risk due to its specific characteristics.

The Commission examines such rebuttal arguments and may accept them or reject them where they are not sufficiently substantiated. If classification stands, the provider can later request reassessment based on changed circumstances, no earlier than six months after the classification decision. The Commission publishes and maintains a list of GPAI models with systemic risk, with due regard for intellectual property and confidential business information.

What Classification Triggers

Once a model is classified as presenting systemic risk, Article 55 applies on top of Article 53: state-of-the-art model evaluations including adversarial testing; assessment and mitigation of possible systemic risks at Union level; tracking, documenting and reporting serious incidents to the AI Office; and an adequate level of cybersecurity protection for the model and its physical infrastructure. The open-source exemptions in Article 53(2) also fall away — an open-weights model above the threshold carries the full documentation duties.

Who Should Care About the Threshold

In 2026 the threshold concerns a small set of frontier developers directly, since 10^25 FLOPs corresponds to the largest training runs. But three wider groups should track it. First, well-funded scale-ups whose next training run may approach the threshold: the notification duty can crystallise at planning time. Second, downstream modifiers: an entity whose cumulative modification compute is large enough can itself trigger classification questions. Third, enterprise buyers: whether a vendor's model is on the Commission's systemic-risk list affects the assurances and documentation the vendor can be expected to provide.

Practical Steps for Providers

Instrument compute accounting now. Record FLOPs per training run with a documented estimation methodology, and aggregate across capability-relevant phases.
Build the notification trigger into planning. If a planned run will cross 10^25 FLOPs, the two-week clock can start when that becomes known — assign responsibility for the filing.
Prepare the Annex XIII picture. Benchmarks, user numbers and modalities feed both rebuttal arguments and designation risk; keep the evidence current.
If close to the threshold, pre-build Article 55 capability. Evaluations, incident processes and cybersecurity controls take quarters to stand up, not weeks.
Monitor delegated acts. The Commission can lower, raise or supplement the threshold, and benchmark-based criteria may gain weight over raw compute.

Common Pitfalls

The most common error is treating the threshold as a release-time question. The notification duty in Article 52 attaches when it becomes known that the threshold will be met, which for planned frontier runs is typically months before launch; a provider that waits until release day has usually missed the two-week window. A second error is sloppy compute accounting: estimates that exclude capability-relevant post-training, or that cannot be reconstructed from records, will not withstand an AI Office information request under Article 91. A third is assuming that staying at 9.9 times 10^24 FLOPs ends the analysis — the designation route through Annex XIII criteria exists precisely for models that are highly capable or widely deployed despite sitting under the presumption. Finally, groups sometimes overlook that the threshold is cumulative across the activities that build one model, not per training phase: splitting a run into stages does not reset the counter.

It is also worth distinguishing the EU figure from foreign benchmarks. The 2023 United States executive order on AI used 10^26 FLOPs as its reporting trigger for dual-use foundation models — ten times the EU presumption — so a model can sit below the American line while squarely inside the EU systemic-risk regime. Compliance teams tracking both jurisdictions need two counters, not one.

A Concrete Example

A developer plans a training run budgeted at 3 times 10^25 FLOPs. At budget approval, it becomes known that the threshold will be met, so the provider notifies the Commission within two weeks, well before launch. It includes its compute methodology and chooses not to contest classification. In parallel it adopts the Safety and Security chapter of the GPAI Code of Practice as its compliance framework, schedules external adversarial testing before release, and documents everything in Annex XI format. A smaller competitor training at 8 times 10^24 FLOPs stays below the presumption, but keeps its compute records ready because a future designation based on Annex XIII criteria remains possible.

Action Plan

Know your number. Every GPAI provider should be able to state its cumulative training compute, show how it was estimated, and say how far it sits from 10^25 FLOPs. Above the line, plan for notification within two weeks and Article 55 compliance; below it, maintain the records that prove it.

Looking ahead, expect the classification mechanism to evolve. The Commission has signalled that compute alone is an imperfect proxy as algorithmic efficiency improves, and Article 51(3) gives it the tool to supplement the presumption with benchmark-based indicators. The scientific panel established under Article 68 can trigger designations through qualified alerts, which means published capability evaluations and real-world incidents — not just training budgets — will increasingly shape who sits on the systemic-risk list. Providers who maintain honest, well-documented capability assessments will navigate those changes with far less friction than those who optimised for staying a rounding error under the line.

One closing reference point for planners: the regulation pairs the threshold with public accountability. The Commission's published list of GPAI models with systemic risk is consulted by enterprise buyers, insurers and downstream integrators, so classification carries commercial signalling effects in both directions — presence on the list demonstrates frontier capability under supervision, while a provider's unexplained absence despite visibly frontier-scale claims invites exactly the scrutiny the notification regime was designed to channel.

Check your AI compliance readiness — free.

Take the Readiness Check 3 minutes · 10 questions · no signup required

This article is for informational purposes only and does not constitute legal advice. Regulatory requirements change frequently — verify current rules with official sources. Built by Sawai Gyoseishoshi Office, Hiroshima, Japan.