EU AI Act Article 12: Record-Keeping Requirements — What Your Logs Actually Need to Show by August 2026
The EU AI Act has a clock running on enterprise AI agents, and the question most teams are missing is not which Articles apply — it is whether they can prove compliance when the auditor walks in.
Article 12 is the part of the Act that decides whether you can. It requires automatic record-keeping of every high-risk AI system's behavior over the system's lifetime — not just the logs you happen to be writing today, but a defensible record an EU auditor would accept as evidence.
If you are running AI agents in any of the eight Annex III high-risk categories, the deadline is August 2, 2026. This post is what compliance teams need to know about Article 12 specifically: what it requires, what most observability stacks miss, and what "ready" actually looks like.
What Article 12 actually says
Article 12(1) of Regulation (EU) 2024/1689 — the EU AI Act — sets the core obligation:
"High-risk AI systems shall technically allow for the automatic recording of events ('logs') over the lifetime of the system."
Article 12(2) defines what those events must enable:
The key phrase is appropriate to the intended purpose of the system. Logging is not a fixed checklist — it is calibrated to what the AI system is doing and what risks it presents. A credit-scoring agent and an employment-recommendation agent will produce different kinds of logs because their risk surfaces are different.
For one specific category — biometric identification systems under Annex III point 1(a) — Article 12(3) lists explicit minimum fields:
These minimums are illustrative for the broader Article 12 obligation. The Article 14(5) reference to human oversight applies across all Annex III categories.
Who is actually on the hook
Article 12 is supported by two adjacent record-keeping obligations that apply to different roles:
Providers — companies that develop and place AI systems on the EU market. Under Article 19, providers must retain auto-generated logs under their control for at least 6 months, unless EU or Member State law specifies a longer period.
Deployers — companies that use AI systems in operational settings. Under Article 26(6), deployers must keep logs for a period appropriate to the intended purpose, with a minimum of 6 months.
Most enterprises running AI agents are deployers, not providers. That means Article 26 is the operative record-keeping obligation for most readers of this post.
There is a chained-responsibility wrinkle when you are both. If your company uses a third-party LLM (deployer relative to the LLM vendor) but you build your own agent on top and expose it to others inside or outside the company (provider relative to the agent), you inherit both sets of obligations. The 6-month minimum applies; longer retention may apply depending on the intended purpose and sector.
The observability gap
Standard observability tools capture telemetry. Article 12 requires evidence. These are different data shapes, and most teams discover the gap during their first internal mock audit.
| What standard observability captures | What Article 12 requires |
|---|---|
| Request latency, error rates, throughput | Risk-relevant decision events tagged with semantic meaning |
| HTTP traces and span hierarchies | Decision lineage — inputs, outputs, retrieved context, tool calls |
| Service-level logs (often free-form text) | Per-system lifetime event record, structured |
| Token spend, model usage counters | Identification of inputs that led to a system action |
| Auto-instrumented tool calls | Documented human oversight involvement per Article 14(5) |
| Sampling-based traces (1%, 10%) | Complete record — sampling is incompatible with audit |
The gap is not that observability tools cannot capture this data. The gap is that they are not configured to. Datadog, New Relic, Honeycomb, Dynatrace, and Splunk all support arbitrary tag attribution, custom event schemas, and extended retention. What they do not ship with by default is the semantic mapping from observability events to regulatory evidence categories.
Standard observability is designed for an engineer at 3 AM trying to debug an incident. Article 12 is designed for an auditor in a conference room six months later trying to reconstruct why an AI system made a specific decision and whether the human oversight required by Article 14 was actually exercised. The data shapes overlap. They are not the same.
What "lifetime of the system" actually means
The phrase "over the lifetime of the system" in Article 12(1) is doing significant work. It is not the same as "during the audit period" or "since this fiscal year."
In practice, lifetime starts when the AI system is placed on the market or put into service and ends when it is decommissioned. For a deployer this typically means:
This matters because most observability retention policies are tuned to debug-recency — keep traces for 7 days, full-fidelity logs for 30 days, downsampled metrics for a year. Article 12 requires a different retention tier specifically for the events that touch the regulatory categories. You do not need to keep every tracing span for 6 months; you need to keep the events relevant to risk identification, post-market monitoring, and high-risk operation.
Annex III point 1(a) is illustrative
The Article 12(3) minimums apply formally only to biometric identification systems. But the four fields named — period of use, reference database, input data, oversight personnel — translate cleanly to other Annex III categories:
The pattern repeats. The four-field minimum from biometrics is a working template for other categories where Article 12(2)'s general obligation applies.
A practical compliance checklist
For deployers running AI agents in Annex III scope, the path to Article 12 readiness has nine concrete steps:
Where OpenTelemetry fits
OpenTelemetry is the open standard for observability data — vendor-neutral, framework-neutral, deployable alongside every major observability tool. The OpenTelemetry community is actively developing semantic conventions in the gen_ai namespace specifically for AI agent telemetry: lifecycle events, decision lineage, tool calls, model routing decisions, and oversight involvement.
For Article 12 purposes, this matters because the same instrumentation can satisfy multiple obligations. Once your agent emits OpenTelemetry-native gen_ai spans, that data can flow into your existing observability stack for engineering visibility and into a separate evidence layer for regulatory record-keeping. You instrument once. The evidence layer handles the audit-shaped derivative.
This is the architecture pattern we recommend over either building a custom logging pipeline specifically for compliance or hoping your observability vendor adds compliance features. The standard layer is the durable answer.
What happens if you fail
Article 99 sets the penalty structure for non-compliance with Articles 12 and 26 obligations:
Penalty levels take into account the nature, gravity, and duration of the infringement, cooperation with authorities, the financial benefit gained or loss avoided, and whether the same operator has previously been penalized. Notified Body inspections are expected to begin within months of the August 2026 effective date.
The practical risk for most enterprises is not the maximum penalty — it is the cascading effect of being found non-compliant: contractual termination clauses triggering with EU customers, downstream procurement freezes, and the reputational cost of being named in an early enforcement action.
What "ready" looks like
If you are an enterprise team running AI agents in Article 6(2) / Annex III scope, the readiness check is simple. Three questions:
If any of these is "no" or "partially," you have lead time but not unlimited lead time. Article 12 compliance requires instrumentation work, retention infrastructure, and evidence-export tooling. None of these gets built overnight, and none of them gets built well under audit pressure.
MeshAI™ is building the OpenTelemetry-native evidence layer for exactly this gap — a deployer-tier record-keeping system that produces Article 12-shaped evidence bundles from the telemetry your existing observability stack already collects. See the four pillars or apply for the pilot partner program.