EU AI Act deadline: Aug 2, 2026
← Back to Blog
eu-ai-actcompliancearticle-12record-keepingaudit

EU AI Act Article 12: Record-Keeping Requirements — What Your Logs Actually Need to Show by August 2026

Henrique Veiga Curi2026-05-2212 min read

The EU AI Act has a clock running on enterprise AI agents, and the question most teams are missing is not which Articles apply — it is whether they can prove compliance when the auditor walks in.

Article 12 is the part of the Act that decides whether you can. It requires automatic record-keeping of every high-risk AI system's behavior over the system's lifetime — not just the logs you happen to be writing today, but a defensible record an EU auditor would accept as evidence.

If you are running AI agents in any of the eight Annex III high-risk categories, the deadline is August 2, 2026. This post is what compliance teams need to know about Article 12 specifically: what it requires, what most observability stacks miss, and what "ready" actually looks like.

What Article 12 actually says

Article 12(1) of Regulation (EU) 2024/1689 — the EU AI Act — sets the core obligation:

"High-risk AI systems shall technically allow for the automatic recording of events ('logs') over the lifetime of the system."

Article 12(2) defines what those events must enable:

  • Risk identification — events that may result in the AI system presenting a risk under Article 79(1) or a substantial modification
  • Post-market monitoring — events that facilitate the post-market monitoring referred to in Article 72
  • High-risk operation monitoring — events that support monitoring of high-risk system operation referred to in Article 26(5)
  • The key phrase is appropriate to the intended purpose of the system. Logging is not a fixed checklist — it is calibrated to what the AI system is doing and what risks it presents. A credit-scoring agent and an employment-recommendation agent will produce different kinds of logs because their risk surfaces are different.

    For one specific category — biometric identification systems under Annex III point 1(a) — Article 12(3) lists explicit minimum fields:

  • Period of use — start and end timestamps of each use of the system
  • Reference database — what the input data was checked against
  • Input data — the data for which the search produced a match
  • Oversight personnel — identification of natural persons involved in verification per Article 14(5)
  • These minimums are illustrative for the broader Article 12 obligation. The Article 14(5) reference to human oversight applies across all Annex III categories.

    Who is actually on the hook

    Article 12 is supported by two adjacent record-keeping obligations that apply to different roles:

    Providers — companies that develop and place AI systems on the EU market. Under Article 19, providers must retain auto-generated logs under their control for at least 6 months, unless EU or Member State law specifies a longer period.

    Deployers — companies that use AI systems in operational settings. Under Article 26(6), deployers must keep logs for a period appropriate to the intended purpose, with a minimum of 6 months.

    Most enterprises running AI agents are deployers, not providers. That means Article 26 is the operative record-keeping obligation for most readers of this post.

    There is a chained-responsibility wrinkle when you are both. If your company uses a third-party LLM (deployer relative to the LLM vendor) but you build your own agent on top and expose it to others inside or outside the company (provider relative to the agent), you inherit both sets of obligations. The 6-month minimum applies; longer retention may apply depending on the intended purpose and sector.

    The observability gap

    Standard observability tools capture telemetry. Article 12 requires evidence. These are different data shapes, and most teams discover the gap during their first internal mock audit.

    What standard observability capturesWhat Article 12 requires
    Request latency, error rates, throughputRisk-relevant decision events tagged with semantic meaning
    HTTP traces and span hierarchiesDecision lineage — inputs, outputs, retrieved context, tool calls
    Service-level logs (often free-form text)Per-system lifetime event record, structured
    Token spend, model usage countersIdentification of inputs that led to a system action
    Auto-instrumented tool callsDocumented human oversight involvement per Article 14(5)
    Sampling-based traces (1%, 10%)Complete record — sampling is incompatible with audit

    The gap is not that observability tools cannot capture this data. The gap is that they are not configured to. Datadog, New Relic, Honeycomb, Dynatrace, and Splunk all support arbitrary tag attribution, custom event schemas, and extended retention. What they do not ship with by default is the semantic mapping from observability events to regulatory evidence categories.

    Standard observability is designed for an engineer at 3 AM trying to debug an incident. Article 12 is designed for an auditor in a conference room six months later trying to reconstruct why an AI system made a specific decision and whether the human oversight required by Article 14 was actually exercised. The data shapes overlap. They are not the same.

    What "lifetime of the system" actually means

    The phrase "over the lifetime of the system" in Article 12(1) is doing significant work. It is not the same as "during the audit period" or "since this fiscal year."

    In practice, lifetime starts when the AI system is placed on the market or put into service and ends when it is decommissioned. For a deployer this typically means:

  • Beginning of lifetime — when you first deploy the agent into production
  • Throughout lifetime — continuous logging of relevant events
  • End of lifetime — retention continues for the minimum 6 months (or longer per intended purpose) after decommissioning
  • This matters because most observability retention policies are tuned to debug-recency — keep traces for 7 days, full-fidelity logs for 30 days, downsampled metrics for a year. Article 12 requires a different retention tier specifically for the events that touch the regulatory categories. You do not need to keep every tracing span for 6 months; you need to keep the events relevant to risk identification, post-market monitoring, and high-risk operation.

    Annex III point 1(a) is illustrative

    The Article 12(3) minimums apply formally only to biometric identification systems. But the four fields named — period of use, reference database, input data, oversight personnel — translate cleanly to other Annex III categories:

  • Employment AI (Annex III §4): period of use = candidate evaluation session; reference database = training data or rubric; input data = candidate profile and inputs; oversight personnel = the HR reviewer who signed off on or overrode the recommendation.
  • Credit scoring AI (Annex III §5): period of use = scoring session; reference database = model version and feature store snapshot; input data = applicant data; oversight personnel = the credit officer who acted on the score.
  • Educational assessment AI (Annex III §3): period of use = assessment window; reference database = rubric and historical exemplars; input data = student submission; oversight personnel = the educator who reviewed contested scores.
  • The pattern repeats. The four-field minimum from biometrics is a working template for other categories where Article 12(2)'s general obligation applies.

    A practical compliance checklist

    For deployers running AI agents in Annex III scope, the path to Article 12 readiness has nine concrete steps:

  • Inventory. Build a documented list of every AI system in production. Include shadow agents — the ones individual teams deployed without IT review. You cannot satisfy Article 12 for systems you do not know exist.
  • Classification. For each system, document which Annex III category applies (or document why none does). This becomes the input to Article 9 risk management and to your Article 12 logging scope. See our Article 6 breakdown for classification specifics.
  • Capability audit. Inventory what each agent currently logs. Be honest. "We have Datadog" is not an inventory; "we log latency and HTTP traces but not decision inputs" is.
  • Event mapping. Map current logs to Article 12(2)'s three event categories: risk-relevant, post-market monitoring, high-risk operation monitoring. Gaps become tickets.
  • Retention verification. Confirm retention meets the 6-month minimum under Article 26(6). For sectors with longer regulatory retention — financial services (SOX, GDPR), healthcare (HIPAA), payments (PCI-DSS) — the longer period governs and Article 12 inherits it.
  • Evidence export format. Define a structured format for evidence bundles your auditor can read. Free-form log dumps will not be accepted. The format should map fields to article references and include cryptographic integrity signals.
  • Human oversight events. Capture Article 14 oversight events explicitly: who reviewed what, when, and what action they took. This is the single most-missed category in our experience.
  • Risk-relevant tagging. Add semantic tags to log events so risk-relevant entries can be filtered without scanning full traces. OpenTelemetry semantic conventions in the gen_ai namespace are converging on this — adopting them early reduces rework.
  • Mock audit. Run an internal audit before August 2026. Pick a sample agent, pull 30 days of evidence, hand it to someone outside the team, and ask them to reconstruct what the agent did and whether human oversight was effective. If they cannot, you are not ready.
  • Where OpenTelemetry fits

    OpenTelemetry is the open standard for observability data — vendor-neutral, framework-neutral, deployable alongside every major observability tool. The OpenTelemetry community is actively developing semantic conventions in the gen_ai namespace specifically for AI agent telemetry: lifecycle events, decision lineage, tool calls, model routing decisions, and oversight involvement.

    For Article 12 purposes, this matters because the same instrumentation can satisfy multiple obligations. Once your agent emits OpenTelemetry-native gen_ai spans, that data can flow into your existing observability stack for engineering visibility and into a separate evidence layer for regulatory record-keeping. You instrument once. The evidence layer handles the audit-shaped derivative.

    This is the architecture pattern we recommend over either building a custom logging pipeline specifically for compliance or hoping your observability vendor adds compliance features. The standard layer is the durable answer.

    What happens if you fail

    Article 99 sets the penalty structure for non-compliance with Articles 12 and 26 obligations:

  • Up to €15 million or 3% of total worldwide annual turnover for the preceding financial year, whichever is higher, for most provider and deployer obligation failures
  • Up to €35 million or 7% for violations involving prohibited practices under Article 5
  • Penalty levels take into account the nature, gravity, and duration of the infringement, cooperation with authorities, the financial benefit gained or loss avoided, and whether the same operator has previously been penalized. Notified Body inspections are expected to begin within months of the August 2026 effective date.

    The practical risk for most enterprises is not the maximum penalty — it is the cascading effect of being found non-compliant: contractual termination clauses triggering with EU customers, downstream procurement freezes, and the reputational cost of being named in an early enforcement action.

    What "ready" looks like

    If you are an enterprise team running AI agents in Article 6(2) / Annex III scope, the readiness check is simple. Three questions:

  • Evidence on demand. If a Notified Body inspector walked in tomorrow and asked for evidence of every high-risk decision your AI agents made in the past 30 days, can you produce it in a structured, auditable format?
  • Human oversight visible. Within that evidence, can you show where humans exercised oversight per Article 14, and what they did?
  • Retention covered. Can you retain that evidence for at least 6 months after the AI system is decommissioned?
  • If any of these is "no" or "partially," you have lead time but not unlimited lead time. Article 12 compliance requires instrumentation work, retention infrastructure, and evidence-export tooling. None of these gets built overnight, and none of them gets built well under audit pressure.


    MeshAI is building the OpenTelemetry-native evidence layer for exactly this gap — a deployer-tier record-keeping system that produces Article 12-shaped evidence bundles from the telemetry your existing observability stack already collects. See the four pillars or apply for the pilot partner program.