Files
ietf-draft-analyzer/workspace/drafts/ect-landscape-paper/main.tex
Christian Nennemann 2506b6325a
Some checks failed
CI / test (3.11) (push) Failing after 1m37s
CI / test (3.12) (push) Failing after 57s
feat: add draft data, gap analysis report, and workspace config
2026-04-06 18:47:15 +02:00

611 lines
39 KiB
TeX

\documentclass[11pt,a4paper]{article}
% ── Packages ──────────────────────────────────────────────────────────────
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[margin=1in]{geometry}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{amsmath}
\usepackage{float}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{tabularx}
\usepackage{enumitem}
\usepackage{pifont}
\usepackage{listings}
\usepackage{tikz}
\usetikzlibrary{arrows.meta, positioning, shapes.geometric}
\hypersetup{
colorlinks=true,
linkcolor=blue!60!black,
citecolor=blue!60!black,
urlcolor=blue!60!black,
}
\lstset{
basicstyle=\ttfamily\small,
frame=single,
breaklines=true,
columns=flexible,
keepspaces=true,
}
% ── Title ─────────────────────────────────────────────────────────────────
\title{%
\textbf{Execution Context Tokens for Distributed Agentic Workflows:\\
A Data-Driven Design Grounded in 260 IETF Internet-Drafts}%
}
\author{
Christian Nennemann\\
Independent Researcher\\
\texttt{ietf@nennemann.de}
}
\date{February 2026}
\begin{document}
\maketitle
% ── Abstract ──────────────────────────────────────────────────────────────
\begin{abstract}
The Internet Engineering Task Force (IETF) has seen 260 Internet-Drafts addressing AI agents and autonomous systems between June 2025 and February 2026---a 36$\times$ increase in monthly submissions over nine months. Yet a quantitative analysis of this corpus reveals a striking gap: while 98 drafts address agent \emph{identity}, 92 address agent-to-agent \emph{communication}, and 60 address \emph{authorization}, effectively zero proposals provide a standard format for recording what agents \emph{actually did}. We introduce Execution Context Tokens (ECTs), a JWT-based extension to the WIMSE (Workload Identity in Multi-System Environments) architecture that records task execution across distributed agentic workflows. Each ECT is a cryptographically signed record documenting a single task, with predecessor tasks linked through a directed acyclic graph (DAG). Using embedding-based similarity analysis, LLM-assisted multi-dimensional rating, and automated gap detection across the full 260-draft landscape, we demonstrate that ECTs address three identified gaps---agent behavior verification (critical), error recovery (critical), and data provenance (medium)---while maintaining low overlap (2/5) with existing proposals. A head-to-head comparison against eight competing drafts shows that ECTs are the only proposal combining DAG-based workflow modeling, cryptographic input/output integrity, and native WIMSE integration.
\end{abstract}
\noindent\textbf{Keywords:} execution context, agentic workflows, WIMSE, JWT, directed acyclic graph, IETF standardization, landscape analysis
% ══════════════════════════════════════════════════════════════════════════
% 1. INTRODUCTION
% ══════════════════════════════════════════════════════════════════════════
\section{Introduction}
The rapid deployment of autonomous AI agents---software systems that can independently plan, execute tasks, and collaborate with other agents---has created urgent demand for infrastructure standards. The IETF, as the primary venue for Internet protocol standardization, has responded with unprecedented speed: between June 2025 and February 2026, submissions grew from 2 AI-related Internet-Drafts per month to 72, a 36$\times$ increase in nine months.
However, this activity has not been evenly distributed. A quantitative survey of 260 drafts reveals that the community has focused heavily on three questions:
\begin{enumerate}[nosep]
\item \textbf{Who is this agent?} (Identity/authentication: 98 drafts)
\item \textbf{What may this agent do?} (Authorization/policy: 60 drafts)
\item \textbf{How do agents talk?} (Communication protocols: 92 drafts)
\end{enumerate}
\noindent A fourth, equally critical question remains effectively unanswered:
\begin{quote}
\textbf{What did this agent actually do?}
\end{quote}
\noindent Of 1,262 technical ideas extracted from the corpus, only 6 address error recovery in agentic workflows, 52 partially touch behavior verification, and zero provide hash-based data lineage tracking. This paper addresses this gap with two contributions:
\begin{itemize}[nosep]
\item \textbf{Execution Context Tokens (ECTs)}: A JWT-based extension to the WIMSE architecture~\cite{wimse-arch} that records task execution as cryptographically signed, DAG-linked tokens. ECTs answer the ``what did it do?'' question with verifiable execution records.
\item \textbf{Quantitative landscape positioning}: Using a purpose-built analysis pipeline (260 drafts, 33,670 pairwise similarity computations, 12 identified gaps), we demonstrate that ECTs occupy a genuinely novel niche---overlap score 2/5, composite quality 4.0/5---and are the only proposal simultaneously addressing all three execution-layer gaps.
\end{itemize}
\noindent The remainder of this paper is organized as follows. Section~\ref{sec:background} provides background on the WIMSE architecture and related standardization efforts. Section~\ref{sec:landscape} presents the quantitative landscape analysis that motivates ECTs. Section~\ref{sec:ect} describes the ECT specification. Section~\ref{sec:comparison} provides a head-to-head comparison against eight competing proposals. Section~\ref{sec:gaps} maps ECT mechanisms to the 12 identified landscape gaps. Section~\ref{sec:discussion} discusses limitations and adoption considerations. Sections~\ref{sec:related}--\ref{sec:conclusion} cover related work and conclusions.
% ══════════════════════════════════════════════════════════════════════════
% 2. BACKGROUND
% ══════════════════════════════════════════════════════════════════════════
\section{Background}
\label{sec:background}
\subsection{IETF AI Agent Standardization}
The IETF's AI-related work spans multiple working groups and areas. The WIMSE (Workload Identity in Multi-System Environments) working group focuses on cross-system identity for workloads and services. The SPICE (Secure Patterns for Internet CrEdentials) group addresses verifiable credentials. The RATS (Remote ATtestation procedureS) group develops attestation evidence formats. The OAuth working group has produced several agent-specific extensions for delegated authorization.
These groups collectively address the identity and authorization layers but leave execution accountability to individual implementations. No IETF working group currently has a charter covering execution record formats for agentic workflows.
\subsection{WIMSE Architecture}
The WIMSE architecture~\cite{wimse-arch} provides identity infrastructure for workloads operating across trust domains. Its core components are:
\begin{itemize}[nosep]
\item \textbf{Workload Identity Token (WIT)}: A JWT asserting the identity of a workload, typically using SPIFFE IDs~\cite{spiffe} as subject identifiers.
\item \textbf{Workload Proof Token (WPT)}: A proof-of-possession token binding a request to a specific WIT, preventing token theft.
\item \textbf{Trust Domains}: Administrative boundaries within which workload identities are issued and recognized. Cross-domain federation follows established patterns.
\end{itemize}
\noindent WIMSE answers ``who is this workload?'' and ``can it prove its identity?'' but does not answer ``what did it do?'' ECTs extend WIMSE to record execution context, reusing its signing infrastructure and trust model.
\subsection{The Identity--Authorization--Execution Stack}
We identify three layers of agent accountability:
\begin{enumerate}[nosep]
\item \textbf{Identity Layer}: Establishes \emph{who} the agent is (WIMSE WIT, X.509, DID).
\item \textbf{Authorization Layer}: Determines \emph{what} the agent may do (OAuth tokens, capability tokens, policy frameworks).
\item \textbf{Execution Layer}: Records \emph{what} the agent actually did (execution context, audit trails, provenance).
\end{enumerate}
\noindent The IETF landscape heavily invests in layers 1 and 2 (98 + 60 = 158 drafts) while layer 3 remains effectively vacant. This asymmetry means agents can be authenticated and authorized but their actual behavior cannot be independently audited---a significant gap for production deployments.
% ══════════════════════════════════════════════════════════════════════════
% 3. LANDSCAPE ANALYSIS
% ══════════════════════════════════════════════════════════════════════════
\section{Landscape Analysis: Motivating Evidence}
\label{sec:landscape}
To ground the ECT design in empirical evidence, we conducted a systematic analysis of 260 IETF Internet-Drafts related to AI agents, published between June 2025 and February 2026. The methodology, dataset, and analysis toolkit are described in a companion paper~\cite{landscape-survey} and released as open source.
\subsection{Corpus Overview}
Table~\ref{tab:corpus} summarizes the dataset.
\begin{table}[h]
\centering
\caption{Corpus summary statistics.}
\label{tab:corpus}
\begin{tabular}{lr}
\toprule
\textbf{Metric} & \textbf{Value} \\
\midrule
Internet-Drafts analyzed & 260 \\
Unique authors & 403 \\
Author--draft relationships & 742 \\
Technical ideas extracted & 1,262 \\
Semantic categories & 19 \\
Pairwise similarity pairs & 33,670 \\
Identified landscape gaps & 12 \\
Time span & Jun 2025 -- Feb 2026 \\
\bottomrule
\end{tabular}
\end{table}
\noindent Each draft was rated on five dimensions (novelty, maturity, overlap, momentum, relevance; scale 1--5) using LLM-assisted analysis (Anthropic Claude Sonnet~4), embedded using a local model (nomic-embed-text via Ollama), and processed for technical idea extraction.
\subsection{Category Distribution and the Safety Deficit}
LLM-assisted classification assigned each draft to one or more of 19 categories. Table~\ref{tab:tiers} organizes the top categories into three accountability tiers.
\begin{table}[h]
\centering
\caption{Category distribution organized by accountability tier.}
\label{tab:tiers}
\begin{tabular}{llrr}
\toprule
\textbf{Tier} & \textbf{Category} & \textbf{Drafts} & \textbf{Avg Score} \\
\midrule
Infrastructure & Data formats / interop & 102 & 3.3 \\
Infrastructure & A2A protocols & 92 & 3.4 \\
Infrastructure & Agent discovery / reg & 57 & 3.5 \\
\midrule
Authorization & Agent identity / auth & 98 & 3.4 \\
Authorization & Policy / governance & 60 & 3.3 \\
Authorization & Human-agent interaction & 22 & 3.3 \\
\midrule
Accountability & AI safety / alignment & 36 & 3.4 \\
Accountability & Autonomous netops & 60 & 3.3 \\
\bottomrule
\end{tabular}
\end{table}
\noindent The infrastructure and authorization tiers collectively account for 431 category assignments (noting that multi-assignment is possible). The accountability tier, despite including safety-critical concerns, accounts for only 96---a ratio of roughly \textbf{4:1}. Within the accountability tier, only 36 drafts address AI safety/alignment, the category most relevant to execution auditing.
\subsection{The Overlap Problem}
Pairwise cosine similarity analysis across 33,670 draft pairs reveals significant redundancy:
\begin{itemize}[nosep]
\item 56 pairs (0.2\%) exceed 0.90 similarity (near-duplicate)
\item 344 pairs (1.0\%) exceed 0.85 (highly similar)
\item 2,668 pairs (7.9\%) exceed 0.80 (significantly overlapping)
\end{itemize}
\noindent The redundancy concentrates in crowded areas. The OAuth-for-agents cluster contains 13 drafts proposing variations of delegated agent authorization. The agent-gateway cluster contains 10 drafts. Meanwhile, \textbf{zero} dedicated drafts exist for execution record formats prior to ECTs. The community is duplicating effort in well-explored spaces while leaving critical gaps unfilled.
\subsection{Gaps Relevant to Execution Tracking}
\label{sec:landscape-gaps}
Automated gap analysis identified 12 under-addressed areas (3 critical, 6 high, 3 medium severity). Three directly concern the execution layer:
\begin{table}[h]
\centering
\caption{Execution-layer gaps in the IETF landscape.}
\label{tab:exec-gaps}
\begin{tabularx}{\textwidth}{llcX}
\toprule
\textbf{Gap} & \textbf{Severity} & \textbf{Ideas\textsuperscript{a}} & \textbf{Problem} \\
\midrule
Behavior Verification & Critical & 52 & No mechanisms to verify agents behave per declared policies. 36 safety drafts vs 260 total. \\
Error Recovery & Critical & 6 & No standards for cascading failure recovery. Only 6 of 1,262 ideas address this. \\
Data Provenance & Medium & 0\textsuperscript{b} & No hash-based data lineage tracking despite 102 data format drafts. \\
\bottomrule
\multicolumn{4}{l}{\footnotesize\textsuperscript{a}Number of extracted ideas (of 1,262 total) that partially address the gap.} \\
\multicolumn{4}{l}{\footnotesize\textsuperscript{b}79 ideas mention ``data'' broadly, but none implement cryptographic lineage.} \\
\end{tabularx}
\end{table}
\noindent The error recovery gap is particularly stark: of 1,262 technical ideas extracted from the entire corpus, only \textbf{6} touch error handling in agentic workflows. The two proposals that do exist address the problem partially but lack a common execution record format for systematic post-mortem analysis.
% ══════════════════════════════════════════════════════════════════════════
% 4. ECT SPECIFICATION
% ══════════════════════════════════════════════════════════════════════════
\section{Execution Context Token Specification}
\label{sec:ect}
\subsection{Design Principles}
ECTs are guided by four principles:
\begin{enumerate}[nosep]
\item \textbf{Records, not permissions.} ECTs document what an agent did, not what it may do. They complement rather than replace authorization tokens.
\item \textbf{DAG, not chain.} Real-world workflows involve parallel execution and convergence (fan-out/fan-in). A directed acyclic graph captures this; a linear chain cannot.
\item \textbf{WIMSE-native.} ECTs reuse WIMSE's signing keys, identity model, and trust domains. No additional key infrastructure is required.
\item \textbf{Minimal mandatory claims.} Only three execution-specific claims are required (\texttt{jti}, \texttt{exec\_act}, \texttt{par}), keeping the base token compact.
\end{enumerate}
\subsection{Token Structure}
An ECT is a JWT~\cite{rfc7519} using JWS Compact Serialization~\cite{rfc7515}. It comprises a JOSE header, standard JWT claims, and execution-specific claims.
\subsubsection{JOSE Header}
\begin{lstlisting}[caption={ECT JOSE Header.}]
{
"alg": "ES256",
"typ": "wimse-exec+jwt",
"kid": "agent-a-key-id-123"
}
\end{lstlisting}
\noindent The \texttt{typ} parameter \texttt{wimse-exec+jwt} distinguishes ECTs from other JWTs. The \texttt{kid} references the agent's WIMSE public key, binding the ECT to the agent's verified identity.
\subsubsection{Claims}
\begin{lstlisting}[caption={Complete ECT payload example.}]
{
"iss": "spiffe://trust-domain.example/agent/payment-processor",
"aud": "spiffe://trust-domain.example/agent/compliance-check",
"iat": 1740600000,
"exp": 1740600900,
"jti": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"wid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"exec_act": "process_payment",
"par": ["c9d2e3f4-a5b6-7890-cdef-123456789012",
"d8e9f0a1-b2c3-4567-89ab-cdef01234567"],
"inp_hash": "base64url(SHA-256(input_data))",
"out_hash": "base64url(SHA-256(output_data))",
"ext": {
"com.example.amount_range": "1000-5000"
}
}
\end{lstlisting}
\noindent The execution-specific claims are:
\begin{itemize}[nosep]
\item \textbf{\texttt{exec\_act}}: A structured identifier for the action performed (e.g., \texttt{process\_payment}, \texttt{validate\_input}).
\item \textbf{\texttt{par}}: An array of parent task identifiers (\texttt{jti} values), forming the DAG edges. An empty array indicates a root task.
\item \textbf{\texttt{wid}}: Optional workflow identifier grouping related ECTs.
\item \textbf{\texttt{inp\_hash}} / \textbf{\texttt{out\_hash}}: SHA-256 hashes of input and output data, providing integrity without exposing content.
\item \textbf{\texttt{ext}}: Optional extension object (max 4,096 bytes, max 5 nesting levels) using reverse-domain notation.
\end{itemize}
\subsection{DAG Construction and Validation}
ECTs form a directed acyclic graph through the \texttt{par} claim. Each \texttt{par} entry references the \texttt{jti} of a predecessor task. Validation enforces five rules:
\begin{enumerate}[nosep]
\item \textbf{Uniqueness}: Each \texttt{jti} must be unique within the workflow scope (or globally if \texttt{wid} is absent).
\item \textbf{Parent existence}: All \texttt{par} entries must reference existing ECTs in the store.
\item \textbf{Temporal ordering}: Parent \texttt{iat} timestamps must not exceed child \texttt{iat} plus clock skew tolerance (recommended 30 seconds).
\item \textbf{Acyclicity}: Parent chains must not cycle back to the current task.
\item \textbf{Trust domain consistency}: Parents should belong to the same or federated trust domains.
\end{enumerate}
\noindent A maximum ancestor traversal limit of 10,000 nodes prevents denial-of-service via deep DAGs.
Figure~\ref{fig:dag} illustrates a five-task workflow with fan-out and fan-in.
\begin{figure}[h]
\centering
\begin{tikzpicture}[
node distance=1.5cm and 2.5cm,
task/.style={rectangle, draw, rounded corners, minimum width=2.2cm, minimum height=0.8cm, align=center, font=\footnotesize},
edge/.style={-{Stealth[length=3mm]}, thick}
]
\node[task, fill=blue!10] (t1) {Task A\\{\tiny\texttt{par:[]}}\\{\tiny fetch\_data}};
\node[task, fill=green!10, below left=of t1] (t2) {Task B\\{\tiny\texttt{par:[A]}}\\{\tiny analyze\_risk}};
\node[task, fill=green!10, below right=of t1] (t3) {Task C\\{\tiny\texttt{par:[A]}}\\{\tiny check\_credit}};
\node[task, fill=orange!10, below right=of t2] (t4) {Task D\\{\tiny\texttt{par:[B,C]}}\\{\tiny verify\_compliance}};
\node[task, fill=red!10, below=of t4] (t5) {Task E\\{\tiny\texttt{par:[D]}}\\{\tiny execute\_trade}};
\draw[edge] (t1) -- (t2);
\draw[edge] (t1) -- (t3);
\draw[edge] (t2) -- (t4);
\draw[edge] (t3) -- (t4);
\draw[edge] (t4) -- (t5);
\end{tikzpicture}
\caption{A five-task DAG representing a cross-organization financial workflow. Task~A (root) fetches data; Tasks~B and~C execute in parallel (fan-out); Task~D merges results (fan-in); Task~E performs the final action. Each task references its predecessors via the \texttt{par} claim.}
\label{fig:dag}
\end{figure}
\subsection{Verification Procedure}
ECT verification proceeds in 13 steps, grouped into four phases:
\begin{table}[h]
\centering
\caption{ECT verification procedure (13 steps in 4 phases).}
\label{tab:verify}
\small
\begin{tabularx}{\textwidth}{clX}
\toprule
\textbf{Step} & \textbf{Phase} & \textbf{Action} \\
\midrule
1 & Serialization & Parse JWS Compact Serialization (header.payload.signature) \\
2 & & Verify \texttt{typ} = \texttt{wimse-exec+jwt} \\
3 & & Verify \texttt{alg} $\in$ allowlist (must include ES256; reject \texttt{none}) \\
\midrule
4 & Identity & Verify \texttt{kid} references valid public key from agent's WIT \\
5 & Binding & Verify JWS signature per RFC~7515 \\
6 & & Confirm signing key not revoked \\
7 & & Verify \texttt{alg} matches WIT algorithm \\
8 & & Verify \texttt{iss} matches WIT \texttt{sub} claim \\
\midrule
9 & Temporal \& & Verify verifier's identity $\in$ \texttt{aud} \\
10 & Audience & Verify \texttt{exp} not passed \\
11 & & Verify \texttt{iat} $\leq$ now + 30s and $\geq$ now $-$ 15min \\
\midrule
12 & Structural & Verify \texttt{jti}, \texttt{exec\_act}, \texttt{par} present and well-formed \\
13 & & Perform DAG validation (Section~4.3) \\
\bottomrule
\end{tabularx}
\end{table}
\noindent Failed verification returns HTTP 403 (invalid ECT with valid WIT) or HTTP 401 (signature failure). The overall complexity is $O(1)$ per token except DAG traversal, which is $O(|V|+|E|)$ bounded by the 10,000-node limit.
\subsection{HTTP Transport}
ECTs travel via a new \texttt{Execution-Context} HTTP header alongside WIMSE identity:
\begin{lstlisting}[caption={HTTP request with WIMSE identity and ECT.}]
GET /api/compliance-check HTTP/1.1
Host: compliance-agent.example.com
Workload-Identity: eyJhbGci...WIT...
Execution-Context: eyJhbGci...ECT...
\end{lstlisting}
\noindent Multiple \texttt{Execution-Context} headers may appear when multiple parent tasks contribute context (fan-in). Receivers must individually verify each ECT and reject the request if any fails.
\subsection{Audit Ledger Interface}
An optional audit ledger provides immutable storage. Implementations must satisfy four requirements:
\begin{enumerate}[nosep]
\item \textbf{Append-only}: No modification or deletion after recording.
\item \textbf{Ordering}: Monotonically increasing sequence numbers.
\item \textbf{Lookup}: Efficient retrieval by \texttt{jti}.
\item \textbf{Integrity}: Hash chains or Merkle trees for tamper detection.
\end{enumerate}
\noindent Ledgers should be maintained independently of workflow agents to reduce collusion risk.
% ══════════════════════════════════════════════════════════════════════════
% 5. COMPARATIVE ANALYSIS
% ══════════════════════════════════════════════════════════════════════════
\section{Comparative Analysis Against the Landscape}
\label{sec:comparison}
\subsection{Comparison Framework}
We compare ECTs against eight proposals from the landscape along eight dimensions relevant to execution accountability:
\begin{enumerate}[nosep]
\item \textbf{Execution Recording}: Does it record what an agent did?
\item \textbf{DAG Support}: Can it represent parallel/fan-in workflows?
\item \textbf{I/O Integrity}: Does it protect input/output data integrity?
\item \textbf{Audit Trail}: Does it support immutable audit storage?
\item \textbf{WIMSE Integration}: Does it build on the WIMSE trust model?
\item \textbf{Authorization Scope}: Does it handle delegation/permission?
\item \textbf{Token Format}: Is it based on established standards (JWT/CWT)?
\item \textbf{Verification Depth}: How thorough is the verification procedure?
\end{enumerate}
\subsection{Head-to-Head Comparison}
Table~\ref{tab:comparison} presents the feature matrix. Composite scores are from our landscape analysis (1--5 scale, higher is better); overlap is the LLM-assessed redundancy with other drafts (lower is more unique).
\begin{table}[h]
\centering
\caption{Feature comparison: ECT vs.\ eight competing/complementary proposals. \ding{51} = full, $\sim$ = partial, \ding{55} = none.}
\label{tab:comparison}
\footnotesize
\renewcommand{\arraystretch}{1.15}
\begin{tabularx}{\textwidth}{l c c c c c c c c c c}
\toprule
\textbf{Proposal} & \textbf{Score} & \textbf{Ovlp} & \rotatebox{70}{\textbf{Exec Record}} & \rotatebox{70}{\textbf{DAG}} & \rotatebox{70}{\textbf{I/O Hash}} & \rotatebox{70}{\textbf{Audit}} & \rotatebox{70}{\textbf{WIMSE}} & \rotatebox{70}{\textbf{AuthZ}} & \rotatebox{70}{\textbf{JWT/CWT}} & \rotatebox{70}{\textbf{Verif.\ Depth}} \\
\midrule
\textbf{ECT (this work)} & 4.0 & 2 & \ding{51} & \ding{51} & \ding{51} & \ding{51} & \ding{51} & \ding{55} & \ding{51} & 13 steps \\
DAAP v2~\cite{daap-v2} & 4.8 & 1 & $\sim$ & \ding{55} & \ding{55} & \ding{51} & \ding{55} & \ding{51} & \ding{51} & Broad \\
STAMP~\cite{stamp} & 4.6 & 1 & \ding{55} & \ding{55} & $\sim$ & \ding{51} & \ding{55} & \ding{51} & \ding{51} & 8 steps \\
Agentic JWT~\cite{agentic-jwt} & 4.5 & 2 & \ding{55} & $\sim$ & \ding{55} & \ding{55} & \ding{55} & \ding{51} & \ding{51} & 6 steps \\
Verif.\ Conv.~\cite{verif-conv} & 4.5 & 2 & $\sim$ & \ding{55} & $\sim$ & \ding{51} & \ding{55} & \ding{55} & COSE & Schema \\
Trans.\ Attest.~\cite{trans-att} & 4.3 & 2 & \ding{55} & \ding{55} & \ding{55} & \ding{55} & \ding{51} & \ding{55} & \ding{51} & Env.\ only \\
Txn Tokens~\cite{txn-tokens} & 4.2 & 3 & \ding{55} & \ding{55} & \ding{55} & \ding{55} & \ding{55} & \ding{51} & \ding{51} & Linear \\
Actor Chain~\cite{actor-chain} & 4.1 & 2 & \ding{55} & \ding{55} & \ding{55} & $\sim$ & \ding{55} & \ding{51} & \ding{51} & Linear \\
HJS~\cite{hjs} & 3.5 & 1 & \ding{51} & \ding{55} & \ding{55} & \ding{51} & \ding{55} & \ding{55} & Custom & Blockchain \\
\bottomrule
\end{tabularx}
\end{table}
\noindent Key observations:
\textbf{ECT is the only proposal combining execution recording with DAG support.} The Actor Chain~\cite{actor-chain} and Transaction Tokens~\cite{txn-tokens} propagate linear context but cannot represent fan-in workflows where multiple parent tasks converge. The ECT DAG model is strictly more expressive.
\textbf{No other proposal provides cryptographic I/O integrity.} ECTs' \texttt{inp\_hash} and \texttt{out\_hash} claims create a verifiable data lineage chain. STAMP~\cite{stamp} includes message-level proofs but focuses on delegation authorization, not execution data.
\textbf{DAAP v2~\cite{daap-v2} is the most comprehensive competitor.} Scoring 4.8, it covers authentication, behavioral monitoring, and remote shutdown---a broader scope than ECTs. However, it does not define a structured execution record format or support DAG workflow modeling. ECTs are complementary: DAAP could use ECTs as its execution record layer.
\textbf{Transitive Attestation~\cite{trans-att} is a natural WIMSE sibling.} It binds identities to execution environments (TEEs). Combined with ECTs, the WIMSE stack would provide a complete accountability chain: \emph{who} (WIT), \emph{where} (transitive attestation), \emph{what} (ECT).
\textbf{HJS~\cite{hjs} shares ECT's goals but uses a heavier trust model.} Blockchain-anchored timestamps provide decentralized accountability but introduce latency and infrastructure requirements that ECTs avoid through standard JWT infrastructure.
\subsection{Uniqueness Quantification}
To quantify ECT's positioning beyond subjective comparison, we use embedding-based similarity from the landscape analysis:
\begin{itemize}[nosep]
\item \textbf{Maximum pairwise similarity}: 0.836 (with \texttt{draft-nederveld-adl}, an agent definition language---a different problem domain).
\item \textbf{Overlap rating}: 2/5 (low---among the most unique 25\% of all 260 drafts).
\item \textbf{Cluster membership}: ECT does not belong to any of the high-similarity clusters identified at the 0.85 threshold.
\item \textbf{Composite score}: 4.0/5, placing ECT in the top 20\% of the corpus.
\end{itemize}
\noindent These metrics confirm that ECTs occupy a genuinely novel niche rather than duplicating existing work.
% ══════════════════════════════════════════════════════════════════════════
% 6. GAP COVERAGE ANALYSIS
% ══════════════════════════════════════════════════════════════════════════
\section{Gap Coverage Analysis}
\label{sec:gaps}
We map ECT mechanisms to each of the 12 identified landscape gaps. Table~\ref{tab:all-gaps} provides the full matrix.
\begin{table}[h]
\centering
\caption{ECT coverage of the 12 identified landscape gaps.}
\label{tab:all-gaps}
\small
\begin{tabularx}{\textwidth}{llccX}
\toprule
\textbf{Gap} & \textbf{Sev.} & \textbf{Exist.\textsuperscript{a}} & \textbf{ECT} & \textbf{ECT Mechanism} \\
\midrule
Agent Behavior Verification & Crit. & 52 & \ding{51} & Signed \texttt{exec\_act} records what agent claimed to do \\
Agent Error Recovery & Crit. & 6 & \ding{51} & DAG enables post-mortem tracing; \texttt{inp\_hash}/\texttt{out\_hash} locate data corruption \\
Agent Resource Mgmt & Crit. & 117 & \ding{55} & --- \\
Cross-Protocol Translation & High & 0 & \ding{55} & --- \\
Agent Lifecycle Mgmt & High & 90 & \ding{55} & --- \\
Multi-Agent Consensus & High & 5 & \ding{55} & --- \\
Human Override & High & 4 & \ding{55} & --- \\
Cross-Domain Security & High & 10 & $\sim$ & DAG validation rule~5 (trust domain consistency) \\
Dynamic Trust & High & 5 & $\sim$ & Execution history enables reputation assessment \\
Agent Performance Mon. & Med. & 26 & $\sim$ & Timestamp-based execution timing from \texttt{iat} of parent/child \\
Agent Explainability & Med. & 5 & \ding{55} & --- \\
Agent Data Provenance & Med. & 0 & \ding{51} & \texttt{inp\_hash}/\texttt{out\_hash} chains create verifiable data lineage \\
\bottomrule
\multicolumn{5}{l}{\footnotesize\textsuperscript{a}Number of existing ideas (of 1,262) that partially address the gap.} \\
\end{tabularx}
\end{table}
\noindent ECTs fully address 3 gaps and partially address 3 more. Notably, ECTs are the \textbf{only single proposal in the 260-draft landscape that simultaneously addresses all three execution-layer gaps} (behavior verification, error recovery, data provenance). No other draft combines signed execution records, DAG-based workflow modeling, and cryptographic data lineage.
The error recovery coverage is particularly significant: with only 6 of 1,262 ideas touching this topic, the existing landscape provides essentially no support for debugging failed multi-agent workflows. ECT DAGs directly enable the kind of ``execution replay'' analysis that production deployments require.
% ══════════════════════════════════════════════════════════════════════════
% 7. DISCUSSION
% ══════════════════════════════════════════════════════════════════════════
\section{Discussion and Limitations}
\label{sec:discussion}
\subsection{The Self-Assertion Limitation}
ECTs are fundamentally self-asserted: an agent creates its own execution record. A compromised or malicious agent can claim to have performed actions it did not, or omit actions it did perform. This is the most significant limitation of the design.
Mitigations include:
\begin{itemize}[nosep]
\item \textbf{Independent audit ledgers}: Maintained separately from workflow agents, enabling cross-verification of ECT sequences.
\item \textbf{Multi-replica comparison}: Multiple observers independently record ECTs and flag discrepancies.
\item \textbf{TEE integration}: Combining ECTs with transitive attestation~\cite{trans-att} to verify the execution environment. If the agent runs in a Trusted Execution Environment, the ECT's credibility increases significantly.
\item \textbf{Input/output hash verification}: While agents can falsify \texttt{exec\_act}, the \texttt{inp\_hash} and \texttt{out\_hash} claims can be independently verified by recipients who possess the actual data.
\end{itemize}
\subsection{Scalability Considerations}
Large-scale workflows may generate thousands of ECTs. The 10,000-node traversal limit constrains verification cost but may prove insufficient for long-running industrial workflows. Practical deployments may require:
\begin{itemize}[nosep]
\item \textbf{Checkpoint ECTs}: Periodic summarization of sub-DAGs into single checkpoint tokens, resetting traversal depth.
\item \textbf{Lazy verification}: Verifying only the immediate parents during normal operation, with full DAG traversal reserved for audit and debugging.
\item \textbf{Ledger-assisted verification}: Offloading DAG validation to the audit ledger, which maintains an indexed view of the complete workflow graph.
\end{itemize}
\subsection{Adoption Path}
ECTs are designed for incremental adoption:
\begin{enumerate}[nosep]
\item Agents can emit ECTs even if not all recipients verify them (the header is simply ignored by unaware systems).
\item The JWT format leverages existing libraries and tooling in every major programming language.
\item Integration with WIMSE requires only that the agent already possesses a WIT---no additional key infrastructure.
\item The single new HTTP header (\texttt{Execution-Context}) requires no changes to request routing or load balancing.
\end{enumerate}
\subsection{Limitations of the Landscape Analysis}
The quantitative evidence supporting ECTs inherits limitations from the analysis methodology:
\begin{itemize}[nosep]
\item \textbf{Keyword selection}: Six seed keywords may miss relevant drafts using different terminology.
\item \textbf{Single-LLM assessment}: Claude Sonnet~4 may have systematic biases in its ratings.
\item \textbf{Snapshot}: The analysis captures February 2026; the landscape evolves continuously.
\item \textbf{Disambiguation}: Author affiliations may be inconsistent (e.g., ``Huawei'' vs.\ ``Huawei Technologies'' are counted separately).
\end{itemize}
% ══════════════════════════════════════════════════════════════════════════
% 8. RELATED WORK
% ══════════════════════════════════════════════════════════════════════════
\section{Related Work}
\label{sec:related}
\textbf{Distributed tracing.} OpenTelemetry~\cite{opentelemetry} provides observability for microservice architectures with trace/span hierarchies. However, spans are unsigned and designed for monitoring, not accountability. ECTs provide cryptographic integrity and are designed for audit, not debugging (though they enable both).
\textbf{Provenance standards.} The W3C PROV data model~\cite{w3c-prov} provides a semantic framework for provenance tracking. PROV is comprehensive but heavyweight---ECTs provide a runtime-embeddable JWT format suitable for HTTP request flows rather than offline provenance databases.
\textbf{Supply chain transparency.} SCITT (Supply Chain Integrity, Transparency, and Trust)~\cite{scitt} defines transparent ledgers for signed statements about supply chain artifacts. ECT audit ledgers share design principles but target runtime execution rather than build-time artifacts. ECTs could correlate with SCITT Signed Statements for end-to-end accountability.
\textbf{Blockchain-based accountability.} Various proposals use blockchain for agent accountability (e.g., HJS~\cite{hjs}). ECTs deliberately avoid blockchain dependency to minimize latency and infrastructure requirements, using optional append-only ledgers instead.
% ══════════════════════════════════════════════════════════════════════════
% 9. CONCLUSION
% ══════════════════════════════════════════════════════════════════════════
\section{Conclusion and Future Work}
\label{sec:conclusion}
We have presented Execution Context Tokens (ECTs), a JWT-based extension to the WIMSE architecture for recording task execution in distributed agentic workflows. By analyzing 260 IETF Internet-Drafts using embedding similarity, LLM-assisted rating, and automated gap detection, we demonstrated that the execution accountability layer is the most under-served area in the current standardization landscape.
ECTs address this gap with a design that is both technically sound (DAG-based workflow modeling, cryptographic I/O integrity, 13-step verification) and practically deployable (JWT format, single HTTP header, incremental adoption). A head-to-head comparison against eight competing proposals confirms that ECTs are the only specification combining execution recording, DAG support, and WIMSE-native integration.
\textbf{Future work} includes:
\begin{enumerate}[nosep]
\item \textbf{Reference implementation}: An open-source library for ECT creation, verification, and DAG visualization.
\item \textbf{Formal verification}: Applying ProVerif or Tamarin to the ECT security model.
\item \textbf{Integration testing}: Deploying ECTs alongside WIMSE WIT/WPT and transitive attestation in a multi-agent testbed to validate the ``WIMSE trinity'' concept.
\item \textbf{Privacy-preserving ECTs}: Selective disclosure of execution details using zero-knowledge proofs or redactable signatures.
\item \textbf{Longitudinal tracking}: Monitoring the landscape as it evolves and assessing whether ECTs catalyze additional execution-layer proposals.
\end{enumerate}
\noindent The ECT specification is available as IETF Internet-Draft \texttt{draft-nennemann-wimse-ect-00}~\cite{ect-draft}. The landscape analysis toolkit and dataset are released as open source.\footnote{Repository: \url{https://github.com/TODO/ietf-draft-analyzer}}
% ── Acknowledgments ──────────────────────────────────────────────────────
\section*{Acknowledgments}
The landscape analysis was performed using Anthropic Claude (Sonnet 4) for rating and idea extraction, and Ollama with nomic-embed-text for embedding generation. The author thanks the IETF community for maintaining the open Datatracker API and the WIMSE working group for providing the identity foundation on which ECTs build.
% ── References ───────────────────────────────────────────────────────────
\bibliographystyle{plain}
\bibliography{references}
\end{document}