# The False Choice: Why Agentic Audit Trails Need Assurance Levels

*How draft-nennemann-wimse-ect-01 introduces L1/L2/L3 to bridge the gap between "no overhead" and "full regulatory compliance."*

---

When you're building a system where AI agents act on behalf of users — approving transactions, recommending treatments, executing trades — you face a question that doesn't have a clean answer: how much proof do you need that each step actually happened?

The first version of the Execution Context Token specification (draft-nennemann-wimse-ect-00) answered this with "all of it." Every ECT required a full JWS asymmetric signature, tightly coupled to WIMSE Workload Identity Tokens. That's the right answer for cross-organizational deployments where two companies need cryptographic non-repudiation. But it's wildly excessive for a team running an internal AI pipeline behind a service mesh. And it's not enough for a healthcare system that needs to prove, years later, that every step in a clinical decision was tamper-evidently recorded.

The -01 revision introduces three assurance levels: L1, L2, and L3. Same payload. Same DAG structure. Same claims. Different envelopes, different verification, different guarantees. Choose the one that fits your threat model.

## What Are ECTs?

Before diving into assurance levels, a quick recap. Execution Context Tokens are JWT-based records of what an agent actually did in a distributed workflow. Each ECT captures a single task — "recommend_treatment", "verify_compliance", "execute_trade" — and links it to its predecessor tasks through a directed acyclic graph (DAG).

ECTs originated as an extension to the IETF WIMSE (Workload Identity in Multi-System Environments) framework, but the -01 revision makes them identity-framework agnostic. Your identity layer handles "who is this agent?" ECTs handle "what did this agent do, and in what order?"

The key properties:

- **Per-task granularity.** One ECT per task, not one per session or per request chain.
- **DAG ordering.** Parent references (`par` claim) create a verifiable execution graph. Fan-out, fan-in, parallel branches — all representable.
- **Data integrity without data exposure.** Input and output hashes (`inp_hash`, `out_hash`) prove what was processed without revealing the data itself.
- **Identity-framework agnostic.** ECTs work with WIMSE WIT/WPT, X.509 certificates, OAuth credentials, or plain JWK sets. The spec defines abstract identity binding requirements and concrete profiles for each framework.

## The Problem with draft-00

The -00 draft defined one mode of operation: JWS Compact Serialization with asymmetric signing. Every ECT, always. This created two problems:

**Too heavy for internal deployments.** If all your agents run in the same Kubernetes cluster behind Istio with mTLS, requiring every agent to sign every ECT with ES256 is pure overhead. You already trust the transport. You already trust the agents (you operate them). The signature gives you non-repudiation, but you don't need non-repudiation — you need structured execution records for debugging and observability.

**Not heavy enough for regulated deployments.** A JWS signature proves who created the ECT and that it wasn't tampered with. But it doesn't prove the ECT was *recorded*. A malicious or buggy agent could create a valid signed ECT and then quietly drop it. For environments subject to FDA 21 CFR Part 11, MiFID II, or the EU AI Act, you need more: a tamper-evident ledger with cryptographic commitment, hash chains, and verifiable receipts proving that every ECT was committed to permanent storage at a specific time.

One size doesn't fit all. The -01 revision acknowledges this by introducing three assurance levels.

## Three Assurance Levels

### Level 1: Unsigned JSON

At L1, an ECT is a plain JSON object. No signature. No JWS envelope. Integrity comes from TLS.

**What you get:**
- Structured execution records with the full ECT payload (all the same claims)
- DAG validation (parent references, acyclicity, temporal ordering)
- Transport integrity via TLS/mTLS

**What you don't get:**
- Non-repudiation (no signature binds the ECT to the issuer)
- Tamper detection after delivery (a compromised intermediary could modify the ECT)
- Issuer authentication at the ECT layer

**Where it fits:** Internal microservice meshes, development environments, AI platforms where all agents are operated by the same team. Think of it as "structured tracing" — you get the DAG and the claims without the PKI overhead.

**Hard rule:** L1 MUST NOT cross trust domain boundaries. If agents from different organizations interact, L1 is not appropriate.

### Level 2: JOSE Asymmetric Signing

L2 is the -00 behavior, generalized. Each ECT is a signed JWT in JWS Compact Serialization format. The agent signs with a private key bound to its identity credential — whether that's a WIMSE WIT, an X.509 certificate, or a key from a JWK set. The verifier resolves the `kid` through the deployment's identity framework.

**What you get (beyond L1):**
- Non-repudiation (the cryptographic signature proves the issuer)
- Tamper detection (any modification invalidates the signature)
- Issuer authentication (the `kid` links to the agent's identity credential)
- Audience restriction (`aud` limits who can verify)

**Where it fits:** Cross-organization deployments, peer-to-peer agent communication, any environment requiring cryptographic proof of origin. L2 is the recommended default for production.

**Backward compatibility:** Implementations of draft-nennemann-wimse-ect-00 are L2-compatible. No changes needed.

### Level 3: JOSE Signing with Audit Ledger

L3 is L2 plus a mandatory audit ledger with cryptographic commitment. The JWS signing is identical to L2. The difference is what happens after signing: the ECT must be recorded in a ledger that provides hash chains, Merkle proofs (or equivalent), and verifiable receipts.

**What you get (beyond L2):**
- Tamper-evident history (hash chains detect modification, insertion, or deletion of ledger entries)
- Cryptographic commitment (Merkle proofs allow independent verification of inclusion)
- Non-repudiation of recording (the receipt proves the ECT was committed at a specific time and position)

**The ledger requirements:**
- Each entry includes a cryptographic hash of the previous entry (hash chain)
- The ledger provides verifiable commitment proofs
- Upon append, the ledger returns a receipt with the sequence number, entry hash, and commitment proof
- The ledger should be operated independently of the workflow agents

**Recording modes:** L3 supports both synchronous recording (wait for receipt before sending the ECT downstream) and asynchronous recording (send first, verify receipt later). Synchronous is stronger; asynchronous is faster. Deployments choose based on latency requirements.

**Where it fits:** Regulated environments — healthcare (FDA audit trails), financial services (MiFID II transaction reporting), any system subject to the EU AI Act's requirements for activity logs in high-risk AI systems.

## Decision Framework

Choosing the right level is straightforward. Ask three questions:

**1. Do agents cross trust domain boundaries?**
If yes, you need at least L2. Non-repudiation is essential when different organizations operate different agents. If no, L1 is an option.

**2. Do you need cryptographic non-repudiation?**
If you need to prove, after the fact, that a specific agent produced a specific ECT, you need L2 or L3. If TLS-level trust is sufficient, L1 works.

**3. Do you operate under regulatory requirements for tamper-evident audit trails?**
If yes, you need L3. The hash-chained ledger with cryptographic commitment provides the tamper evidence that regulators require. If no, L2 is sufficient for production deployments with non-repudiation.

The decision tree in practice:

```
Internal only, no non-repudiation needed?      → L1
Cross-org or non-repudiation needed?            → L2
Regulatory tamper-evident audit trail required?  → L3
```

A single deployment can use different levels for different workflows. Your internal data preprocessing pipeline might be L1, your cross-org partner integrations L2, and your regulated clinical decision workflows L3.

## Real-World Applications

**Healthcare — Clinical Decision Support (L3).** A clinical AI system recommends treatments, checks drug interactions, and routes cases for specialist review. Every step is an ECT. The DAG proves that the safety check happened before the treatment recommendation reached the physician. The L3 ledger provides the tamper-evident audit trail that FDA 21 CFR Part 11 requires for electronic records.

**Financial Services — Algorithmic Trading (L3).** An investment bank's agents coordinate with an external credit rating agency. Portfolio risk analysis, credit assessment, compliance verification, and trade execution form a cross-organizational DAG. L3 ensures every step is signed and committed to a tamper-evident ledger for MiFID II compliance.

**Internal AI Platform — Model Serving (L1).** A company's internal AI platform chains preprocessors, model inference, and postprocessors. All services run in the same Kubernetes cluster with mTLS. L1 ECTs provide structured execution tracing for debugging without PKI overhead.

**Multi-Organization API Gateway (L2).** A SaaS platform's agents interact with customer-operated agents. Both sides need to prove what happened, but there's no regulatory requirement for a tamper-evident ledger. L2 gives both organizations signed, verifiable execution records.

## The Upgrade Path

This is arguably the most important design property of the assurance levels: **the payload is the same at every level.** The same `jti`, `iss`, `aud`, `iat`, `exp`, `exec_act`, `par`, `inp_hash`, `out_hash`, and `ext` claims appear in every ECT, whether it's unsigned JSON or a ledger-committed JWS token.

What changes is the envelope and the verification procedure. This means upgrading from L1 to L2 means adding a JWS wrapper around the same payload. Upgrading from L2 to L3 means deploying an audit ledger and adding the ledger recording step after JWS verification.

Your DAG structure doesn't change. Your claim semantics don't change. Your workflow logic doesn't change. The agents still produce the same payload — they just wrap it differently and verify it more thoroughly.

This is intentional. A startup might begin with L1 during development, move to L2 when they start working with external partners, and add L3 when they enter regulated markets. At each step, the payload format is familiar, the DAG is the same, and only the security envelope changes.

## What Changed from -00 to -01

A summary of the structural changes:

**New Section 1.2: Assurance Levels and Applicability.** Defines L1, L2, L3 with applicability guidance.

**Restructured Section 3: ECT Format.** The payload structure is now level-independent (Section 3.1), with separate subsections for L1 (3.3), L2 (3.4), and L3 (3.5). Each level has its own transport, verification, and security properties subsections.

**iss/aud requirement relaxation.** The `iss` and `aud` claims move from REQUIRED to RECOMMENDED at the payload level. They remain REQUIRED for L2 and L3 (the verification procedure mandates them), but L1 deployments can omit them when identity is handled entirely at the transport layer.

**Updated verification.** Section 6 adds level detection as the first step. L1 gets a simplified procedure (no signature steps). L2 retains the -00 procedure. L3 adds ledger recording and receipt verification.

**Expanded audit ledger.** Section 7 distinguishes baseline ledger requirements (all levels) from L3-specific requirements (hash chains, commitment proofs, receipts). New subsections cover ledger independence and L3 verification procedures.

**Level-specific security analysis.** Section 8 adds a new subsection analyzing each level's security properties, including L1 limitations and L3 ledger properties.

**Identity-framework agnostic design.** The -00 draft hardcoded WIMSE WIT/WPT as the identity layer. The -01 draft introduces an abstract "Identity Binding" section defining what any identity framework must provide (key association, issuer correspondence, algorithm consistency, revocation checking) and then defines concrete profiles for WIMSE, X.509, and JWK sets. The `typ` header moves from `wimse-exec+jwt` to `exec+jwt` (with backward compatibility for the old value). This means you can use ECTs without deploying WIMSE — bring your own identity framework.

## What's Next

The -01 draft is open for working group feedback. Key questions for the community:

- Is the three-level structure the right granularity, or should there be intermediate levels?
- Should L3 ledger requirements reference SCITT (Supply Chain Integrity, Transparency, and Trust) more directly?
- Are the L1 restrictions (MUST NOT cross trust domain boundaries) sufficient, or should L1 be further constrained?
- Are the identity binding profiles (WIMSE, X.509, JWK set) sufficient, or should additional profiles be defined (e.g., OAuth client credentials, DID-based)?

The reference implementations (Go and Python) will be updated to support all three levels and the pluggable identity binding model. Community contributions and feedback are welcome — the specification is developed in the open, and the goal is a standard that works for real deployments, from internal microservices to regulated clinical AI systems.

---

*Christian Nennemann is an independent researcher focused on workload identity and execution accountability in agentic AI systems. The ECT specification originated within the IETF WIMSE working group and is designed for use with any identity framework.*