What does the EU AI Act copyright policy obligation require from GPAI providers?

Article 53(1)(c) requires every GPAI model provider to put in place a documented policy to comply with EU copyright law, and in particular to identify and respect machine-readable text and data mining opt-outs reserved under Article 4(3) of Directive (EU) 2019/790, using state-of-the-art technologies where appropriate.

Does the copyright policy obligation apply if the model was trained outside the EU?

Yes. Recital 106 states that any provider placing a GPAI model on the EU market should comply with the copyright policy obligation regardless of the jurisdiction in which the training took place. Training abroad does not remove the duty.

Are open-source GPAI models exempt from the copyright policy?

No. The open-source exemption in Article 53(2) covers only technical documentation and downstream information duties. The copyright policy under Article 53(1)(c) and the public training data summary apply to all GPAI providers, including open-source projects.

What is a machine-readable rights reservation in practice?

The most established example is robots.txt, alongside other adopted opt-out protocols and metadata standards that express a text and data mining reservation in a form crawlers can process automatically. The Code of Practice copyright chapter commits signatories to identify and comply with such protocols.

Quick answer

Article 53(1)(c) of the EU AI Act requires every general-purpose AI model provider to put in place a policy to comply with EU copyright law, in particular to identify and respect text and data mining opt-outs reserved under Article 4(3) of Directive (EU) 2019/790. The duty has applied since August 2, 2025 and covers open-source models too.

Updated June 2026 · MmowW AI Compliance

EU AI Act Copyright Policy: What Article 53(1)(c) Requires from GPAI Providers

Q: What happens if a GPAI provider ignores text and data mining opt-outs?

The provider risks both regulatory enforcement and copyright litigation. From August 2, 2026 the Commission can fine GPAI providers up to 3 percent of worldwide annual turnover or 15 million euros, whichever is higher, and rightsholders can pursue infringement claims in national courts where the TDM exception does not apply because of a valid reservation.

Why the AI Act Contains a Copyright Obligation

The EU AI Act, Regulation (EU) 2024/1689, is not a copyright law, but it builds a bridge to one. Article 53(1)(c) obliges every provider of a general-purpose AI model placed on the EU market to put in place a policy to comply with Union law on copyright and related rights. The provision singles out one mechanism by name: providers must identify and respect reservations of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790, the Copyright in the Digital Single Market Directive.

The background is the text and data mining (TDM) regime. Article 4 of the 2019 directive allows reproductions of lawfully accessible works for text and data mining — the legal basis many model developers rely on for training-data collection in the EU — but only where the rightsholder has not expressly reserved that use. For content made available online, the reservation must be machine-readable to be effective. The AI Act turns respect for these reservations from a copyright question into a supervised regulatory duty for GPAI providers, enforceable by the Commission's AI Office.

What the Obligation Requires

The text of Article 53(1)(c) is short, but three components are clearly required:

A policy. Providers need a documented, operational internal policy — not merely de facto practices. The policy should be put in place before training-data collection and kept up to date.
Compliance with Union copyright law generally. This covers the lawfulness of reproductions and extractions during data gathering and training, and extends to addressing the risk that the model memorises and reproduces protected works in its outputs.
Opt-out identification and compliance. Providers must employ means, including state-of-the-art technologies where appropriate, to identify machine-readable reservations such as robots.txt directives and other recognised opt-out protocols, and to exclude reserved content from text and data mining.

Recital 106 adds an important territorial clarification: any provider placing a GPAI model on the EU market should comply with this obligation regardless of the jurisdiction in which the copyright-relevant acts underpinning the training occurred. Training the model entirely outside the EU does not switch the duty off.

The Code of Practice Copyright Chapter

The GPAI Code of Practice published on July 10, 2025 contains a dedicated copyright chapter that operationalises Article 53(1)(c). Signatories commit, among other things, to: draw up and keep up to date a copyright policy and assign responsibility for it; reproduce and extract only lawfully accessible content when crawling, and not circumvent paywalls or other technological protection measures; exclude sources persistently recognised for copyright infringement; identify and comply with machine-readable rights reservations, including robots.txt and other widely adopted protocols; make reasonable efforts to reduce the risk of models generating output that reproduces protected training content, including through appropriate technical safeguards and acceptable-use terms; and designate a contact point and complaint mechanism for rightsholders.

Adhering to the Code is voluntary, but Article 53(4) recognises adherence as a means of demonstrating compliance until a harmonised standard is published. Providers that do not sign remain fully bound by Article 53(1)(c) and should be ready to show an equivalent or better approach.

Who Must Comply

The obligation binds every provider of a GPAI model placed on the EU market, whether the model is commercial or free, proprietary or open. The open-source exemption in Article 53(2) covers only the technical documentation and downstream information duties; it never covers the copyright policy. Providers of GPAI models with systemic risk are bound as well, on top of their Article 55 duties. Downstream entities that fine-tune an existing model and thereby become providers in their own right must also have a policy, scoped to the content they use for the modification.

Companies that merely deploy third-party models through an API are not GPAI providers and do not carry this duty, although they retain ordinary copyright responsibility for how they use model outputs.

Practical Steps to Build the Policy

Inventory acquisition channels. List every way training content enters your pipeline: in-house crawlers, third-party datasets, licensed corpora, user-submitted data, synthetic generation. The policy must address each channel.
Implement opt-out compliance at crawl time. Configure crawlers to honour robots.txt and other adopted reservation protocols, log compliance, and review which emerging standards you will recognise as machine-readable reservations.
Deal with third-party datasets. For purchased or public datasets, document the diligence performed on their provenance and the contractual assurances obtained from suppliers.
Address output-side risk. Describe the technical measures — deduplication, alignment, filtering, refusal behaviours — used to reduce verbatim reproduction of protected works, and prohibit infringing uses in your acceptable-use terms.
Create a rightsholder contact point. Establish a published channel for opt-out notifications and complaints, with internal routing and response timelines.
Publish a meaningful version. The Code of Practice encourages providers to make a summary of the policy public; alignment between the policy, the public training data summary and the Annex XI documentation is essential.

A Concrete Example

A provider preparing a multilingual language model for the EU market adopts a policy with four pillars. First, acquisition: its crawler honours robots.txt and a named set of machine-readable reservation protocols, and crawl logs retain evidence of exclusions. Second, third-party data: every external corpus requires a documented provenance review before ingestion, and sources widely recognised for infringing content are excluded. Third, outputs: training-data deduplication and refusal tuning reduce memorised reproduction, and the terms of use prohibit prompting designed to extract protected works. Fourth, governance: a named owner reviews the policy quarterly, and a public web form receives rightsholder notices, which are answered within a stated period. When the provider later publishes its training data summary, the opt-out section simply mirrors what the policy already records.

Enforcement and What Is at Stake

The AI Office supervises GPAI obligations centrally. From August 2, 2026, the Commission can fine GPAI providers up to 3 percent of total worldwide annual turnover or 15 million euros, whichever is higher, for non-compliance with Chapter V — including the absence of a credible copyright policy. Separately, ordinary copyright litigation continues in national courts, and a provider's published policy and training summary will be read closely in those disputes. The policy is therefore both a regulatory requirement and a litigation posture.

Common Pitfalls

Four weaknesses recur in copyright policies reviewed during 2025 and 2026. First, the paper policy: a document drafted by the legal team that no crawler configuration actually implements — supervisors and courts will ask for the technical evidence behind each commitment. Second, the frozen snapshot: opt-out compliance is assessed at crawl time, so a policy that cannot say when its crawls happened and which protocols were honoured at that moment offers little protection. Third, ignoring acquired datasets: providers often control their own crawlers carefully while ingesting third-party corpora with unknown provenance, leaving the largest exposure undocumented. Fourth, silence on outputs: Article 53(1)(c) points to compliance with copyright law generally, and a policy that addresses only training inputs while the model can reproduce protected works on demand tells half the story.

It also pays to keep terminology precise. The TDM exception in Article 4 of Directive (EU) 2019/790 covers reproductions and extractions of lawfully accessible works; it is not a licence for everything a model later does. Where a rightsholder has validly reserved rights, training on that content in the EU requires authorisation — and the AI Act obliges the provider to have the machinery for noticing the reservation in the first place.

Action Plan

Treat the copyright policy as an engineering control system, not a statement of intent. Start from your actual data pipeline, encode opt-out compliance where collection happens, document supplier diligence, mitigate memorisation on the output side, and assign an owner. Providers that signed the Code of Practice copyright chapter have a ready-made checklist; everyone else should be able to demonstrate at least the same level of care.

A sensible ninety-day sequence looks like this: in the first month, inventory acquisition channels and appoint the policy owner; in the second, implement and log opt-out compliance in crawlers and complete provenance reviews of third-party corpora; in the third, stand up the rightsholder contact point, finalise output-side mitigations and publish the policy summary together with the training data summary. Document each step as you go — under Article 91 the AI Office can request the evidence at any time, and a dated implementation trail is worth more than a polished document written after the fact. For downstream modifiers who fine-tune existing models, the same sequence applies in miniature, scoped to the fine-tuning corpus they actually control.

Check your AI compliance readiness — free.

Take the Readiness Check 3 minutes · 10 questions · no signup required

This article is for informational purposes only and does not constitute legal advice. Regulatory requirements change frequently — verify current rules with official sources. Built by Sawai Gyoseishoshi Office, Hiroshima, Japan.