GPAI model providers must maintain two documentation sets under Article 53 of the EU AI Act: Annex XI technical documentation for the AI Office and national authorities, and Annex XII information for downstream providers integrating the model. The Code of Practice transparency chapter provides a single Model Documentation Form covering both.
EU AI Act GPAI Model Documentation: A Practical Guide to Annex XI and Annex XII
Two Documents, Two Audiences
Article 53(1)(a) and (b) of Regulation (EU) 2024/1689 create a documentation architecture with two distinct audiences. Annex XI defines the technical documentation that providers of general-purpose AI models must draw up, keep up to date and supply on request to the AI Office and national competent authorities. Annex XII defines the information and documentation that must be made available to downstream providers — the companies integrating the model into their own AI systems. The two annexes overlap substantially in subject matter but differ in depth and confidentiality: Annex XI goes deeper into training methodology and data, while Annex XII focuses on what an integrator needs to build responsibly and to meet its own obligations.
Both duties have applied since August 2, 2025. Models placed on the EU market before that date must be brought into compliance by August 2, 2027 under Article 111(3). Providers of models classified as presenting systemic risk owe additional documentation under Section 2 of Annex XI; qualifying open-source providers without systemic risk are exempt from both annexes but still owe a copyright policy and a public training data summary.
Annex XI Section 1: What Every Provider Documents
Section 1 of Annex XI asks for a general description of the model and a description of its development process. In practical terms the file should cover:
- Identity and purpose: the tasks the model is intended to perform, the type and nature of AI systems into which it can be integrated, and applicable acceptable use policies.
- Release facts: date of release, methods of distribution — API, downloadable weights, embedded in products — and the licence under which the model is provided.
- Architecture: model architecture and number of parameters, plus the modalities and formats of inputs and outputs.
- Training process: the methodologies and techniques used, the key design choices including the rationale and assumptions made, and what the model is designed to optimise for, with the relevance of different parameters.
- Data: information on the data used for training, testing and validation — type and provenance, curation methodologies such as cleaning and filtering, how data was obtained and selected, measures to detect unsuitable data sources and identifiable biases.
- Compute and energy: the computational resources used to train the model, expressed including in floating point operations, training time, and known or estimated energy consumption.
Annex XI Section 2: The Systemic Risk Supplement
Providers of GPAI models with systemic risk must additionally document their evaluation strategies and results, including against public benchmarks; the adversarial testing measures used, such as red-teaming, and model adaptations including alignment and fine-tuning performed for safety; where applicable, the system architecture explaining how software components build on each other and integrate into the overall processing; and detailed descriptions of the measures adopted under Article 55. This section turns the safety programme into an inspectable record — an evaluation that left no documentation effectively did not happen, as far as the AI Office is concerned.
Annex XII: The Downstream Package
Annex XII repeats the general description elements — tasks, acceptable use policies, release and distribution, modalities, licence, architecture and parameters — and adds the elements an integrator needs: a description of the technical means required for the model to be integrated into AI systems, instructions for use, infrastructure requirements, and information on the data used for training, testing and validation where relevant to the downstream provider's own obligations. The legal test in Article 53(1)(b) is functional: the package must enable downstream providers to have a good understanding of the model's capabilities and limitations and to comply with their own duties under the regulation — for instance when assembling the technical documentation of a high-risk AI system under Annex IV, which expressly anticipates referencing the model layer.
The Model Documentation Form
The transparency chapter of the GPAI Code of Practice, published in July 2025, consolidated both annexes into a single Model Documentation Form. Signatories complete one structured document and disclose each field to the appropriate audience — some fields to the AI Office on request, some to downstream providers, some to both. For providers deciding how to organise the work, the form is the obvious template even without signing the Code: it resolves field-by-field ambiguities, marks which audience sees what, and gives supervisors a familiar structure. Article 53(4) recognises adherence to the Code as a means of demonstrating compliance until harmonised standards are published.
Operating the Documentation: Process Requirements
Three process points are easy to miss. First, currency: both annex duties require documentation to be kept up to date, so a model update that changes capabilities, training data or acceptable use must propagate into the files — versioned, dated, with change logs. Second, availability on request: Annex XI material is not filed proactively but must be deliverable when the AI Office asks under Article 91, which implies the file exists at all times, not that it can be assembled in a scramble after the request. Third, confidentiality: Article 78 obliges authorities to protect intellectual property and trade secrets they receive, which is the counterweight that allows the annexes to demand genuinely sensitive detail; providers can mark material accordingly but cannot refuse categories of information on confidentiality grounds.
How the Documents Connect to Everything Else
The annexes are not standalone artefacts; they anchor the rest of the compliance system. The compute figure documented under Annex XI is the same number that determines proximity to the 10^25 FLOPs systemic-risk presumption and would support a notification under Article 52. The data provenance sections must tell the same story as the public training data summary published under Article 53(1)(d) and the copyright policy under Article 53(1)(c) — the AI Office can lay the three side by side. The limitations described in the downstream package will reappear in customers' Annex IV technical documentation for high-risk systems, and in their risk-management files under Article 9. And for systemic-risk models, the Section 2 evaluation record is the evidence base for every claim made in safety reports and every argument in a classification rebuttal. Documentation quality is therefore not a box to satisfy but the load-bearing structure of the provider's entire regulatory position.
There is also a forward-looking reason to invest: harmonised standards for GPAI documentation are under development, and providers whose files already follow the Model Documentation Form structure will absorb the transition to standards with minimal rework, while those with bespoke formats face a second migration.
A Concrete Example
A provider preparing its first compliant release adopts the Model Documentation Form as its master file. The machine learning team completes architecture, training methodology and compute fields from experiment-tracking records; the data engineering team writes the provenance and curation sections from pipeline manifests; product fills in distribution, licence and acceptable use; and the documentation owner derives two outputs from the master — a regulator-ready Annex XI file stored internally, and a downstream package published in the developer portal behind a customer login. When the model receives a significant update six months later, the change triggers a documented review of every field, and the public training data summary is refreshed in the same cycle so the three artefacts never diverge.
Common Pitfalls
The recurring failures are predictable. Marketing model cards mistaken for Annex XI files — they almost never contain design rationale, data curation detail or FLOPs figures. Documentation written once and never versioned, so the file describes a model two releases old. Downstream packages that omit limitations, because commercial teams resist documenting weaknesses — yet the limitations field is precisely what downstream providers need for their own risk management, and its absence shifts blame upstream when something fails. Compute and energy fields left blank because nobody instrumented the training run — these need engineering telemetry, not retrospective guesswork. And finally, fragmentation: three teams maintaining three overlapping documents that disagree with each other and with the public training summary, handing any future investigator a ready-made inconsistency to probe.
Action Plan
Stand up a single documentation pipeline per model family: one master file in Model Documentation Form structure, automated feeds from experiment tracking and data pipelines where possible, named owners per section, and derivation rules for the regulator file, the downstream package and the public summary. Schedule reviews at every significant release. Providers who treat the annexes as a documentation product with engineering discipline — rather than a compliance essay written under deadline — find the marginal cost per release drops quickly, and they enter any AI Office interaction with their story already straight.
Check your AI compliance readiness — free.
Take the Readiness Check 3 minutes · 10 questions · no signup requiredThis article is for informational purposes only and does not constitute legal advice. Regulatory requirements change frequently — verify current rules with official sources. Built by Sawai Gyoseishoshi Office, Hiroshima, Japan.