What is EU AI Act Article 12?

Article 12 of the EU AI Act requires high-risk AI systems to technically allow for the automatic recording of events ('logs') over the lifetime of the system. The logs must enable identification of risk-relevant situations, support post-market monitoring under Article 72, and enable monitoring of high-risk system operation under Article 26(5). The standard applies to providers and deployers of high-risk AI systems, including AI agents deployed in regulated sectors.

When does EU AI Act Article 12 take effect?

Article 12 enforceability for high-risk AI systems was provisionally postponed from 2 August 2026 to 2 December 2027 under the EU's 2026 Digital Omnibus agreement (provisional; formal adoption expected mid-2026). The Act entered into force on 1 August 2024, and some provisions for general-purpose AI models took effect earlier (2 August 2025). Because audit-ready evidence takes 12+ months of production history to build, enterprises deploying high-risk AI systems should put Article 12-compliant logging infrastructure in place well ahead of the December 2027 date rather than treating the postponement as a reason to wait.

How long must Article 12 logs be retained?

Providers must retain auto-generated logs under their control for at least 6 months under Article 19, unless EU or Member State law specifies a longer period. Deployers must retain logs for a period appropriate to the intended purpose of the AI system under Article 26(6), at minimum 6 months. For AI systems in sectors with longer regulatory retention (financial services, healthcare), the longer period governs.

Does Article 12 apply to non-EU companies?

Yes. Under Article 2(1)(c) of the EU AI Act, the regulation applies extraterritorially to providers and deployers established outside the EU if the output produced by the AI system is used in the Union. A US-based company deploying AI agents whose outputs reach EU customers or operations is subject to Article 12 record-keeping obligations regardless of where the company is incorporated.

What is the difference between Article 12 logs and general observability?

Standard observability tools (Datadog, New Relic, Honeycomb, Splunk) are designed for engineers debugging incidents — capturing latency, errors, and trace hierarchies. Article 12 record-keeping is designed for auditors reconstructing decision history months later, capturing risk-relevant events, decision lineage with inputs and outputs, human oversight involvement, and complete records (not sampling-based). The two data shapes overlap but are not equivalent.

Who is responsible for Article 12 compliance — the AI vendor or the company using it?

Both. Providers of high-risk AI systems are subject to Article 19 record-keeping obligations covering the design and deployment of logging capabilities. Deployers of those systems are subject to Article 26(6) obligations covering operational logs while the system is in use. Enterprises that deploy third-party LLMs to build their own AI agents typically inherit both roles — they are a deployer relative to the LLM provider and a provider relative to the agent they expose downstream.

What are the penalties for failing Article 12 record-keeping?

Under Article 99 of the EU AI Act, non-compliance with provider or deployer obligations including record-keeping is subject to administrative fines of up to €15 million or 3% of total worldwide annual turnover, whichever is higher. The highest tier of penalties (€35 million or 7%) applies to violations involving prohibited AI practices under Article 5. Penalty levels consider the nature, gravity, and duration of the infringement and whether the operator cooperated with authorities.

Can existing observability tools satisfy Article 12?

Existing observability tools provide the raw infrastructure (instrumentation, ingest, retention) but do not produce Article 12 evidence by default. The gap is in semantic tagging of risk-relevant events, capture of decision lineage with inputs and outputs, documentation of human oversight involvement, and evidence-export formats acceptable to auditors. OpenTelemetry-native instrumentation with the gen_ai semantic conventions captures the right data shape; a separate evidence layer produces the audit-ready bundles.

← Back to Blog

eu-ai-actcompliancearticle-12record-keepingaudit

EU AI Act Article 12: Record-Keeping Requirements — What Your Logs Actually Need to Show

Henrique Veiga Curi2026-05-2212 min read

The EU AI Act has a clock running on enterprise AI agents, and the question most teams are missing is not which Articles apply — it is whether they can prove compliance when the auditor walks in.

Article 12 is the part of the Act that decides whether you can. It requires automatic record-keeping of every high-risk AI system's behavior over the system's lifetime — not just the logs you happen to be writing today, but a defensible record an EU auditor would accept as evidence.

If you are running AI agents in any of the eight Annex III high-risk categories, the high-risk obligations were provisionally postponed to December 2, 2027 under the 2026 Digital Omnibus agreement, but audit-ready evidence takes 12+ months of production history to build — so the postponement is the window to get ready, not a reason to wait. This post is what compliance teams need to know about Article 12 specifically: what it requires, what most observability stacks miss, and what "ready" actually looks like.

What Article 12 actually says

Article 12(1) of Regulation (EU) 2024/1689 — the EU AI Act — sets the core obligation:

"High-risk AI systems shall technically allow for the automatic recording of events ('logs') over the lifetime of the system."

Article 12(2) defines what those events must enable:

Risk identification — events that may result in the AI system presenting a risk under Article 79(1) or a substantial modification

Post-market monitoring — events that facilitate the post-market monitoring referred to in Article 72

High-risk operation monitoring — events that support monitoring of high-risk system operation referred to in Article 26(5)

The key phrase is appropriate to the intended purpose of the system. Logging is not a fixed checklist — it is calibrated to what the AI system is doing and what risks it presents. A credit-scoring agent and an employment-recommendation agent will produce different kinds of logs because their risk surfaces are different.

For one specific category — biometric identification systems under Annex III point 1(a) — Article 12(3) lists explicit minimum fields:

Period of use — start and end timestamps of each use of the system

Reference database — what the input data was checked against

Input data — the data for which the search produced a match

Oversight personnel — identification of natural persons involved in verification per Article 14(5)

These minimums are illustrative for the broader Article 12 obligation. The Article 14(5) reference to human oversight applies across all Annex III categories.

Who is actually on the hook

Article 12 is supported by two adjacent record-keeping obligations that apply to different roles:

Providers — companies that develop and place AI systems on the EU market. Under Article 19, providers must retain auto-generated logs under their control for at least 6 months, unless EU or Member State law specifies a longer period.

Deployers — companies that use AI systems in operational settings. Under Article 26(6), deployers must keep logs for a period appropriate to the intended purpose, with a minimum of 6 months.

Most enterprises running AI agents are deployers, not providers. That means Article 26 is the operative record-keeping obligation for most readers of this post.

There is a chained-responsibility wrinkle when you are both. If your company uses a third-party LLM (deployer relative to the LLM vendor) but you build your own agent on top and expose it to others inside or outside the company (provider relative to the agent), you inherit both sets of obligations. The 6-month minimum applies; longer retention may apply depending on the intended purpose and sector.

The observability gap

Standard observability tools capture telemetry. Article 12 requires evidence. These are different data shapes, and most teams discover the gap during their first internal mock audit.

What standard observability captures	What Article 12 requires
Request latency, error rates, throughput	Risk-relevant decision events tagged with semantic meaning
HTTP traces and span hierarchies	Decision lineage — inputs, outputs, retrieved context, tool calls
Service-level logs (often free-form text)	Per-system lifetime event record, structured
Token spend, model usage counters	Identification of inputs that led to a system action
Auto-instrumented tool calls	Documented human oversight involvement per Article 14(5)
Sampling-based traces (1%, 10%)	Complete record — sampling is incompatible with audit

The gap is not that observability tools cannot capture this data. The gap is that they are not configured to. Datadog, New Relic, Honeycomb, Dynatrace, and Splunk all support arbitrary tag attribution, custom event schemas, and extended retention. What they do not ship with by default is the semantic mapping from observability events to regulatory evidence categories.

Standard observability is designed for an engineer at 3 AM trying to debug an incident. Article 12 is designed for an auditor in a conference room six months later trying to reconstruct why an AI system made a specific decision and whether the human oversight required by Article 14 was actually exercised. The data shapes overlap. They are not the same.

What "lifetime of the system" actually means

The phrase "over the lifetime of the system" in Article 12(1) is doing significant work. It is not the same as "during the audit period" or "since this fiscal year."

In practice, lifetime starts when the AI system is placed on the market or put into service and ends when it is decommissioned. For a deployer this typically means:

Beginning of lifetime — when you first deploy the agent into production

Throughout lifetime — continuous logging of relevant events

End of lifetime — retention continues for the minimum 6 months (or longer per intended purpose) after decommissioning

This matters because most observability retention policies are tuned to debug-recency — keep traces for 7 days, full-fidelity logs for 30 days, downsampled metrics for a year. Article 12 requires a different retention tier specifically for the events that touch the regulatory categories. You do not need to keep every tracing span for 6 months; you need to keep the events relevant to risk identification, post-market monitoring, and high-risk operation.

Annex III point 1(a) is illustrative

The Article 12(3) minimums apply formally only to biometric identification systems. But the four fields named — period of use, reference database, input data, oversight personnel — translate cleanly to other Annex III categories:

Employment AI (Annex III §4): period of use = candidate evaluation session; reference database = training data or rubric; input data = candidate profile and inputs; oversight personnel = the HR reviewer who signed off on or overrode the recommendation.

Credit scoring AI (Annex III §5): period of use = scoring session; reference database = model version and feature store snapshot; input data = applicant data; oversight personnel = the credit officer who acted on the score.

Educational assessment AI (Annex III §3): period of use = assessment window; reference database = rubric and historical exemplars; input data = student submission; oversight personnel = the educator who reviewed contested scores.

The pattern repeats. The four-field minimum from biometrics is a working template for other categories where Article 12(2)'s general obligation applies.

A practical compliance checklist

For deployers running AI agents in Annex III scope, the path to Article 12 readiness has nine concrete steps:

Inventory. Build a documented list of every AI system in production. Include shadow agents — the ones individual teams deployed without IT review. You cannot satisfy Article 12 for systems you do not know exist.

Classification. For each system, document which Annex III category applies (or document why none does). This becomes the input to Article 9 risk management and to your Article 12 logging scope. See our Article 6 breakdown for classification specifics.

Capability audit. Inventory what each agent currently logs. Be honest. "We have Datadog" is not an inventory; "we log latency and HTTP traces but not decision inputs" is.

Event mapping. Map current logs to Article 12(2)'s three event categories: risk-relevant, post-market monitoring, high-risk operation monitoring. Gaps become tickets.

Retention verification. Confirm retention meets the 6-month minimum under Article 26(6). For sectors with longer regulatory retention — financial services (SOX, GDPR), healthcare (HIPAA), payments (PCI-DSS) — the longer period governs and Article 12 inherits it.

Evidence export format. Define a structured format for evidence bundles your auditor can read. Free-form log dumps will not be accepted. The format should map fields to article references and include cryptographic integrity signals.

Human oversight events. Capture Article 14 oversight events explicitly: who reviewed what, when, and what action they took. This is the single most-missed category in our experience.

Risk-relevant tagging. Add semantic tags to log events so risk-relevant entries can be filtered without scanning full traces. OpenTelemetry semantic conventions in the gen_ai namespace are converging on this — adopting them early reduces rework.

Mock audit. Run an internal audit now, well ahead of enforcement (high-risk obligations were provisionally postponed to December 2, 2027 under the 2026 Digital Omnibus agreement). Pick a sample agent, pull 30 days of evidence, hand it to someone outside the team, and ask them to reconstruct what the agent did and whether human oversight was effective. If they cannot, you are not ready.

Where OpenTelemetry fits

OpenTelemetry is the open standard for observability data — vendor-neutral, framework-neutral, deployable alongside every major observability tool. The OpenTelemetry community is actively developing semantic conventions in the gen_ai namespace specifically for AI agent telemetry: lifecycle events, decision lineage, tool calls, model routing decisions, and oversight involvement.

For Article 12 purposes, this matters because the same instrumentation can satisfy multiple obligations. Once your agent emits OpenTelemetry-native gen_ai spans, that data can flow into your existing observability stack for engineering visibility and into a separate evidence layer for regulatory record-keeping. You instrument once. The evidence layer handles the audit-shaped derivative.

This is the architecture pattern we recommend over either building a custom logging pipeline specifically for compliance or hoping your observability vendor adds compliance features. The standard layer is the durable answer.

What happens if you fail

Article 99 sets the penalty structure for non-compliance with Articles 12 and 26 obligations:

Up to €15 million or 3% of total worldwide annual turnover for the preceding financial year, whichever is higher, for most provider and deployer obligation failures

Up to €35 million or 7% for violations involving prohibited practices under Article 5

Penalty levels take into account the nature, gravity, and duration of the infringement, cooperation with authorities, the financial benefit gained or loss avoided, and whether the same operator has previously been penalized. Notified Body inspections are expected to begin within months of the effective date, which was provisionally postponed for high-risk systems from August 2026 to December 2, 2027 under the 2026 Digital Omnibus agreement.

The practical risk for most enterprises is not the maximum penalty — it is the cascading effect of being found non-compliant: contractual termination clauses triggering with EU customers, downstream procurement freezes, and the reputational cost of being named in an early enforcement action.

What "ready" looks like

If you are an enterprise team running AI agents in Article 6(2) / Annex III scope, the readiness check is simple. Three questions:

Evidence on demand. If a Notified Body inspector walked in tomorrow and asked for evidence of every high-risk decision your AI agents made in the past 30 days, can you produce it in a structured, auditable format?

Human oversight visible. Within that evidence, can you show where humans exercised oversight per Article 14, and what they did?

Retention covered. Can you retain that evidence for at least 6 months after the AI system is decommissioned?

If any of these is "no" or "partially," you have lead time but not unlimited lead time. Article 12 compliance requires instrumentation work, retention infrastructure, and evidence-export tooling. None of these gets built overnight, and none of them gets built well under audit pressure.

MeshAI^™ is building the OpenTelemetry-native evidence layer for exactly this gap — a deployer-tier record-keeping system that produces Article 12-shaped evidence bundles from the telemetry your existing observability stack already collects. See the four pillars or apply for the pilot partner program.