feat: add draft data, gap analysis report, and workspace config
Some checks failed
CI / test (3.11) (push) Failing after 1m37s
CI / test (3.12) (push) Failing after 57s

This commit is contained in:
2026-04-06 18:47:15 +02:00
parent 4f310407b0
commit 2506b6325a
189 changed files with 62649 additions and 0 deletions

View File

@@ -0,0 +1,604 @@
% Switch to IEEEtran for final arXiv submission:
% \documentclass[conference]{IEEEtran}
\documentclass[11pt,twocolumn]{article}
\usepackage[margin=0.75in]{geometry}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
% \usepackage{cite} % uncomment for IEEEtran
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{url}
\usepackage{hyperref}
\usepackage{booktabs}
% \usepackage{multirow} % uncomment if available
\usepackage{xcolor}
\usepackage{array}
\hypersetup{
colorlinks=true,
linkcolor=blue,
citecolor=blue,
urlcolor=blue,
}
\begin{document}
\title{Gap Analysis of IETF Standards for\\Autonomous AI Agent Protocols}
\author{
Christian Nennemann\\
Independent Researcher\\
\texttt{ietf@nennemann.de}
}
\maketitle
% ==================================================================
\begin{abstract}
Autonomous software agents---powered by large language models and
traditional AI planning systems---are rapidly being deployed for
network management, cloud orchestration, supply-chain logistics, and
multi-step AI-driven workflows. A survey of the IETF document
corpus reveals over 260 Internet-Drafts and RFCs that touch on
aspects of agent communication, identity, safety, and operations.
Despite this breadth, these efforts remain fragmented: no single
reference architecture ties them together, and several critical
capabilities---including behavioral verification, failure cascade
prevention, multi-agent consensus, and standardized human
override---lack any standardization whatsoever.
This paper presents a systematic gap analysis of IETF standards
with respect to autonomous agent protocol requirements. We propose
a four-layer reference architecture for agent ecosystems, identify
eleven specific gaps organized by severity (Critical, High, Medium),
and map these gaps to six companion Internet-Drafts that address the
most pressing deficiencies. We further contextualize these gaps
against industry protocols such as Google's Agent-to-Agent (A2A)
protocol and Anthropic's Model Context Protocol (MCP), as well as
academic literature on multi-agent systems, federated learning, and
fault-tolerant distributed systems. Our analysis provides a
structured roadmap for the standards work needed to enable safe,
interoperable, and auditable autonomous agent ecosystems.
\end{abstract}
\medskip
\noindent\textbf{Keywords:} autonomous agents, multi-agent systems, IETF standards, gap analysis,
agent safety, protocol standardization, AI governance
% ==================================================================
\section{Introduction}
\label{sec:intro}
The emergence of large language model (LLM)-based agents~\cite{openai2023gpt4,wang2024survey-llm-agents} has transformed autonomous software agents from a long-studied academic concept~\cite{wooldridge2009multiagent,jennings1998agent-applications} into a practical engineering reality. Modern agent frameworks such as AutoGen~\cite{autogen} and CrewAI~\cite{crewai} orchestrate multiple agents that collaborate on complex tasks, delegate sub-tasks to one another, invoke external tools, and make decisions with limited or no human supervision. Industry protocols including Google's Agent-to-Agent (A2A) protocol~\cite{a2a-protocol} and Anthropic's Model Context Protocol (MCP)~\cite{mcp-protocol} have emerged to standardize agent communication at the application layer.
Simultaneously, agents are being deployed for critical infrastructure operations---network management under the IETF's Network Management Operations (NMOP) working group, cloud orchestration across trust domains, and supply-chain workflows that span organizational boundaries. These deployments demand protocol-level guarantees for identity, authorization, safety, auditability, and fault tolerance that go far beyond what any single existing standard provides.
A survey of the IETF document corpus reveals over 260 Internet-Drafts and RFCs touching on agent-relevant topics across multiple working groups including WIMSE (Workload Identity in Multi-System Environments), RATS (Remote ATtestation procedureS), OAuth/GNAP, SCITT (Supply Chain Integrity, Transparency, and Trust), and NMOP. Yet these efforts remain fragmented, addressing individual facets of the problem without a unifying architecture.
\subsection{Contributions}
This paper makes three contributions:
\begin{enumerate}
\item \textbf{Reference Architecture.} We propose a four-layer reference architecture (Section~\ref{sec:architecture}) that organizes agent capabilities into Human Control, Agent Interaction, Execution, Policy \& Governance, and Infrastructure layers.
\item \textbf{Gap Analysis.} We identify eleven specific gaps in the current IETF standards landscape (Section~\ref{sec:gaps}), classified by severity (Critical, High, Medium), with formal problem statements, impact assessments, and analysis of existing partial coverage.
\item \textbf{Solution Roadmap.} We present six companion Internet-Drafts (Section~\ref{sec:solutions}) that address the most critical gaps, together with a dependency analysis and prioritization rationale.
\end{enumerate}
\subsection{Paper Organization}
Section~\ref{sec:background} surveys existing IETF work and industry protocols relevant to autonomous agents. Section~\ref{sec:architecture} presents the reference architecture. Section~\ref{sec:gaps} details the eleven identified gaps. Section~\ref{sec:solutions} describes the companion draft roadmap. Section~\ref{sec:discussion} discusses cross-cutting themes and limitations. Section~\ref{sec:conclusion} concludes.
% ==================================================================
\section{Background and Related Work}
\label{sec:background}
\subsection{IETF Standards Landscape}
\subsubsection{WIMSE --- Workload Identity in Multi-System Environments}
The WIMSE working group addresses workload identity for services that span multiple systems and trust domains. Of particular relevance is the Execution Context Token (ECT) framework~\cite{id-wimse-ect}, which provides cryptographically signed tokens carrying task identity, delegated authority, and constraints across agent boundaries. ECTs build on the JSON Web Token (JWT)~\cite{rfc7519} format and are designed to propagate execution context through chains of delegated actions---a fundamental requirement for multi-agent workflows. However, ECTs address identity and context propagation without defining failure semantics, behavioral verification, or consensus mechanisms.
\subsubsection{RATS --- Remote ATtestation procedureS}
The RATS architecture~\cite{rfc9334} defines procedures for remote attestation, enabling a relying party to appraise the trustworthiness of a remote peer based on attestation evidence. RATS provides the conceptual foundation for verifiable claims about system state, but its scope is limited to platform and firmware integrity. It does not address the higher-level question of whether an agent's \emph{behavior}---as opposed to its platform---conforms to a declared policy. Extending RATS-style attestation to behavioral claims is one of the critical gaps identified in this analysis.
\subsubsection{OAuth 2.0 and GNAP}
The OAuth 2.0 authorization framework and the Grant Negotiation and Authorization Protocol (GNAP) provide mechanisms for delegated authorization. Transaction tokens and token exchange mechanisms are relevant to agent-to-agent delegation chains, where Agent~A delegates a subset of its authority to Agent~B. However, OAuth and GNAP are designed for human-initiated authorization flows and do not natively support the fully autonomous, multi-hop delegation patterns characteristic of agent ecosystems.
\subsubsection{SCITT --- Supply Chain Integrity, Transparency, and Trust}
SCITT defines transparency services based on append-only cryptographic logs. Its model is directly relevant to agent audit trails: each agent action could be recorded as a signed statement in a transparency log, enabling tamper-evident provenance tracking. However, SCITT does not define agent-specific audit semantics, causal ordering across agent actions, or cross-domain audit correlation.
\subsubsection{NMOP --- Network Management Operations}
The NMOP working group focuses on intent-based network management and autonomous network functions. Agent-driven network management is a primary use case for the gaps identified in this analysis: network agents that autonomously configure devices, respond to incidents, and optimize traffic must operate within strict safety boundaries with reliable override and rollback capabilities.
\subsection{Industry Protocols}
\subsubsection{Google A2A Protocol}
The Agent-to-Agent (A2A) protocol~\cite{a2a-protocol} defines a JSON-RPC-based mechanism for agents to discover each other's capabilities via ``Agent Cards'' and exchange tasks through structured messages. A2A provides useful abstractions for agent discovery and task delegation but does not address behavioral verification, cascade prevention, or cross-domain audit trails. Its capability advertisement mechanism is a partial solution to Gap~10 (Capability Negotiation) but lacks the policy-constrained semantics required for autonomous operations.
\subsubsection{Anthropic MCP}
The Model Context Protocol (MCP)~\cite{mcp-protocol} standardizes how LLM-based applications access external tools and data sources. MCP defines a client-server architecture where the LLM agent acts as a client requesting tool invocations, file access, and prompt templates from MCP servers. While MCP addresses the tool integration layer effectively, it operates within a single trust domain and does not define mechanisms for multi-agent coordination, cross-domain operations, or safety controls.
\subsubsection{Multi-Agent Frameworks}
AutoGen~\cite{autogen} and CrewAI~\cite{crewai} are representative of the emerging class of multi-agent orchestration frameworks. AutoGen provides a conversation-based programming model where multiple LLM agents collaborate through structured dialogues. CrewAI organizes agents into ``crews'' with defined roles, goals, and task assignments. Both frameworks demonstrate the practical need for the capabilities identified in our gap analysis but implement them through framework-specific mechanisms that are not interoperable.
\subsection{Academic Foundations}
\subsubsection{Multi-Agent Systems}
The multi-agent systems (MAS) literature~\cite{wooldridge2009multiagent,dorri2018mas-iot,jennings1998agent-applications} provides foundational models for agent communication, coordination, and negotiation. Classical work on contract nets, auction-based allocation, and belief-desire-intention (BDI) architectures informs the design of agent consensus protocols. However, translating these theoretical models into interoperable protocol standards for heterogeneous, cross-domain agent deployments remains an open problem.
\subsubsection{Distributed Consensus}
Consensus protocols such as Raft~\cite{ongaro2014raft}, Paxos~\cite{lamport1998paxos}, and PBFT~\cite{castro1999pbft} solve the problem of agreement in distributed systems. These protocols are designed for replicated state machines with homogeneous participants and well-defined failure models. Agent consensus differs fundamentally: participants are heterogeneous (different capabilities, trust levels, policies), the decision space is richer than choosing a single value, and the failure model includes semantic errors (an agent ``agrees'' but acts differently) in addition to crash and Byzantine failures.
\subsubsection{Federated Learning and Privacy}
Federated learning~\cite{mcmahan2017fedavg,kairouz2021fedlearning-advances} enables distributed model training without centralizing data. Differential privacy~\cite{dwork2006diffprivacy} provides formal privacy guarantees for statistical queries. Both are directly relevant to agents that share operational telemetry or learned models across organizational boundaries. However, no existing standard defines how these privacy-preserving techniques should be applied to agent-specific data types such as execution traces, behavioral profiles, and performance metrics.
\subsubsection{Circuit Breakers and Fault Tolerance}
The circuit breaker pattern, popularized by Nygard~\cite{nygard2018releaseit}, provides a mechanism for preventing cascade failures in distributed systems by detecting repeated failures and temporarily halting requests to a failing service. While widely adopted in microservice architectures, no protocol standard exists for circuit breaker semantics in agent-to-agent interactions, where the failure modes are richer (partial results, semantic errors, policy violations) and the containment boundaries span trust domains.
% ==================================================================
\section{Reference Architecture}
\label{sec:architecture}
We propose a layered reference architecture for autonomous agent ecosystems. The architecture comprises four principal layers plus an overarching human control layer, as depicted in Fig.~\ref{fig:arch}.
\begin{figure}[htbp]
\centering
\small
\begin{verbatim}
+-----------------------------------------------------------+
| HUMAN OPERATORS |
| [Override & HITL Layer -- GAP 7] |
+-----------------------------------------------------------+
| AGENT INTERACTION LAYER |
| +--------+ +--------+ +--------+ +--------+ |
| |Agent A |<>|Agent B |<>|Agent C |<>|Agent D | |
| +---+----+ +---+----+ +---+----+ +---+----+ |
| | GAP 3: | GAP 10: | GAP 1: | |
| | Consens. | Cap.Neg. | Behav.V. | |
+------+----------+----------+----------+-------------------+
| EXECUTION LAYER (ECT) |
| DAG Execution | Checkpoints | Rollback | Circuit Breakers |
| [GAP 2: Cascade Prevention] [GAP 4: Rollback] |
+-----------------------------------------------------------+
| POLICY & GOVERNANCE LAYER |
| ACP-DAG-HITL | Trust Scoring | Assurance Profiles |
| [GAP 5: Federated Privacy] [GAP 6: Cross-Domain Audit] |
+-----------------------------------------------------------+
| INFRASTRUCTURE LAYER |
| Identity | Discovery | Registration | Protocol Bridges |
| [GAP 8: Cross-Protocol] [GAP 9: Resource Accounting] |
| [GAP 11: Performance Benchmarking] |
+-----------------------------------------------------------+
\end{verbatim}
\caption{Agent Ecosystem Reference Architecture. Each layer identifies the gap areas addressed by this analysis.}
\label{fig:arch}
\end{figure}
\subsection{Human Control Layer}
The topmost layer provides human-in-the-loop (HITL) controls and override capabilities. This layer ensures that autonomous agents remain subject to human authority at all times. Gap~7 (Human Override Standardization) resides here. The layer interfaces with all lower layers: an override signal may halt execution (Execution Layer), revoke delegation (Policy Layer), or disconnect an agent from infrastructure services (Infrastructure Layer).
\subsection{Agent Interaction Layer}
This layer is where agents communicate, negotiate capabilities, reach consensus, and undergo behavioral verification. Three gaps reside here:
\begin{itemize}
\item \textbf{Gap~1 (Behavioral Verification):} Runtime verification that an agent's observed behavior conforms to its declared policy.
\item \textbf{Gap~3 (Consensus):} Multi-agent agreement on shared decisions.
\item \textbf{Gap~10 (Capability Negotiation):} Dynamic discovery and negotiation of agent capabilities.
\end{itemize}
Industry protocols A2A~\cite{a2a-protocol} and MCP~\cite{mcp-protocol} partially address this layer but lack the safety and governance semantics required for autonomous operation.
\subsection{Execution Layer}
The Execution Layer manages DAG-structured agent workflows. Execution Context Tokens (ECTs)~\cite{id-wimse-ect} carry delegated authority and task context through the execution graph. Two gaps are critical at this layer:
\begin{itemize}
\item \textbf{Gap~2 (Cascade Prevention):} Circuit breakers, failure isolation, and cascade containment for multi-agent workflows.
\item \textbf{Gap~4 (Rollback):} Standardized checkpointing and undo semantics that work across agent and domain boundaries.
\end{itemize}
\subsection{Policy and Governance Layer}
This layer enforces organizational policies, privacy requirements, and compliance constraints. The Agent Context Policy (ACP) framework~\cite{id-dag-hitl-safety} defines per-agent policy documents specifying permitted behaviors, resource limits, and escalation rules. Gaps at this layer include:
\begin{itemize}
\item \textbf{Gap~5 (Federated Privacy):} Privacy guarantees for agents that share operational data or participate in federated learning across domains.
\item \textbf{Gap~6 (Cross-Domain Audit):} End-to-end tamper-evident audit trails across organizational boundaries.
\end{itemize}
\subsection{Infrastructure Layer}
The bottom layer provides foundational services: identity, discovery, registration, and protocol bridging. Remaining gaps reside here:
\begin{itemize}
\item \textbf{Gap~8 (Cross-Protocol Migration):} Preserving agent context across heterogeneous protocol environments.
\item \textbf{Gap~9 (Resource Accounting):} Tracking and reconciling agent resource consumption across domains.
\item \textbf{Gap~11 (Performance Benchmarking):} Standardized metrics for evaluating agent performance.
\end{itemize}
% ==================================================================
\section{Gap Analysis}
\label{sec:gaps}
We identify eleven gaps in the current standards landscape, classified into three severity levels:
\begin{itemize}
\item \textbf{CRITICAL:} No existing standard addresses the problem; failure to standardize poses immediate safety or interoperability risks.
\item \textbf{HIGH:} Partial coverage exists but is insufficient for production autonomous agent deployments.
\item \textbf{MEDIUM:} The gap affects efficiency or completeness but does not pose immediate safety risks.
\end{itemize}
Table~\ref{tab:gap-summary} provides an overview.
\begin{table}[htbp]
\centering
\caption{Summary of Identified Gaps}
\label{tab:gap-summary}
\small
\begin{tabular}{@{}clll@{}}
\toprule
\textbf{Gap} & \textbf{Name} & \textbf{Severity} & \textbf{Category} \\
\midrule
1 & Behavioral Verification & CRITICAL & AI Safety \\
2 & Cascade Prevention & CRITICAL & Safety/Resilience \\
\midrule
3 & Multi-Agent Consensus & HIGH & A2A Protocols \\
4 & Real-Time Rollback & HIGH & Resilience \\
5 & Federated Privacy & HIGH & Privacy \\
6 & Cross-Domain Audit & HIGH & Compliance \\
7 & Human Override & HIGH & AI Safety \\
\midrule
8 & Cross-Protocol Migration & MEDIUM & Interoperability \\
9 & Resource Accounting & MEDIUM & Operations \\
10 & Capability Negotiation & MEDIUM & A2A Protocols \\
11 & Performance Benchmarking & MEDIUM & Metrics \\
\bottomrule
\end{tabular}
\end{table}
% ------------------------------------------------------------------
\subsection{CRITICAL Gaps}
\subsubsection{Gap 1: Agent Behavioral Verification}
\label{sec:gap1}
\paragraph{Problem Statement}
Autonomous agents operating in production environments lack any standardized mechanism for runtime verification of policy compliance. While RATS~\cite{rfc9334} provides attestation for platform integrity---verifying that firmware and software have not been tampered with---no equivalent exists for verifying that an agent's \emph{observed behavior} conforms to its declared behavioral profile or policy constraints.
\paragraph{Evidence and Examples}
Consider a network management agent authorized to modify BGP route policies within defined parameters. Without behavioral verification, there is no protocol-level mechanism to detect that the agent has begun modifying routes outside its authorized scope, whether due to prompt injection, model drift, or adversarial manipulation. The operator learns of the violation only through its downstream effects---potentially after significant damage.
In multi-agent workflows, the problem compounds: Agent~B trusts the output of Agent~A because Agent~A holds valid credentials, but those credentials attest only to Agent~A's identity, not to the correctness of its behavior. A misbehaving Agent~A can corrupt the entire downstream workflow while remaining ``authenticated.''
\paragraph{Impact Assessment}
Undetected policy violations could cause safety incidents, data breaches, or cascading failures in critical infrastructure. In regulated industries, the inability to verify agent compliance creates an insurmountable barrier to deployment.
\paragraph{Existing Partial Solutions}
RATS~\cite{rfc9334} provides the conceptual model (Attester, Verifier, Relying Party) but scopes it to platform integrity. The ACP-DAG-HITL framework~\cite{id-dag-hitl-safety} defines policies but not verification mechanisms. Runtime monitoring tools exist in practice but use proprietary, non-interoperable formats.
% ------------------------------------------------------------------
\subsubsection{Gap 2: Agent Failure Cascade Prevention}
\label{sec:gap2}
\paragraph{Problem Statement}
Multi-agent workflows create dependency chains where a failure in one agent propagates to downstream agents, causing cascade failures. No standardized mechanism exists for circuit breakers~\cite{nygard2018releaseit}, failure isolation, or cascade containment in agent-to-agent interactions.
\paragraph{Evidence and Examples}
Current practice relies on ad-hoc timeout and retry logic that is neither interoperable nor sufficient for complex DAG-structured workflows. In a network management scenario, an agent responsible for collecting telemetry data may fail due to a device timeout. Without cascade containment, the configuration agent waiting for this telemetry proceeds with stale data, the validation agent rubber-stamps the stale configuration, and the deployment agent pushes an incorrect configuration to production routers.
The microservices community has adopted circuit breaker patterns~\cite{nygard2018releaseit}, but these operate at the HTTP request level and do not capture the richer failure semantics of agent interactions: partial results (an agent completed 3 of 5 sub-tasks), semantic errors (an agent returned syntactically valid but logically incorrect output), and policy violations that should trigger containment.
\paragraph{Impact Assessment}
A single agent failure could cascade through an entire multi-agent deployment, causing widespread service disruption with no automated containment. This risk is especially acute in network management, where agent failures can propagate to affect live network operations.
\paragraph{Existing Partial Solutions}
ECTs~\cite{id-wimse-ect} provide execution context but no failure containment semantics. Framework-specific implementations (e.g., AutoGen's~\cite{autogen} error handling) are not interoperable across vendors.
% ------------------------------------------------------------------
\subsection{HIGH Gaps}
\subsubsection{Gap 3: Multi-Agent Consensus Protocols}
\label{sec:gap3}
\paragraph{Problem Statement}
When multiple agents must agree on a shared decision---a network configuration change, a resource allocation plan, or a coordinated incident response---no standardized consensus protocol exists for agent-to-agent agreement.
\paragraph{Evidence and Examples}
Distributed systems consensus protocols (Raft~\cite{ongaro2014raft}, Paxos~\cite{lamport1998paxos}, PBFT~\cite{castro1999pbft}) are designed for replicated state machines with homogeneous participants. Agent consensus differs fundamentally: participants are heterogeneous with different capabilities, trust levels, and policy constraints. Agent consensus requires additional semantics such as weighted voting based on expertise or trust scores, capability-based participation where only qualified agents vote on domain-specific decisions, and policy-constrained proposals where proposed decisions must satisfy all participants' policy constraints.
\paragraph{Impact Assessment}
Without standard consensus protocols, multi-vendor agent deployments cannot coordinate decisions, limiting autonomous agents to single-vendor silos or requiring expensive custom integration.
\paragraph{Existing Partial Solutions}
No existing IETF work directly addresses multi-agent consensus. The MAS literature~\cite{wooldridge2009multiagent} provides theoretical models but not interoperable protocol specifications.
% ------------------------------------------------------------------
\subsubsection{Gap 4: Real-Time Agent Rollback Mechanisms}
\label{sec:gap4}
\paragraph{Problem Statement}
When an autonomous agent takes an action with unintended consequences, no standardized mechanism exists for rolling back the action and restoring the previous state. Rollback requires standardized checkpointing, state snapshots, and undo semantics that work across agent boundaries and administrative domains.
\paragraph{Evidence and Examples}
NETCONF~\cite{rfc6241} provides confirmed-commit with automatic rollback for configuration changes, but this is specific to the NETCONF protocol and device configurations. An agent that has invoked an API, sent a notification, allocated cloud resources, and modified a database cannot undo these heterogeneous actions using NETCONF's rollback. In multi-agent workflows, coordinated rollback is even harder: multiple agents may have taken dependent actions that must be reversed as a unit (a distributed saga pattern) without a standard protocol for coordinating the undo sequence.
\paragraph{Impact Assessment}
Operators cannot safely deploy autonomous agents for critical operations without maintaining manual intervention capability for every action, negating much of the value of autonomy.
\paragraph{Existing Partial Solutions}
NETCONF confirmed-commit provides rollback for configuration changes only~\cite{rfc6241}. The saga pattern is well-known in distributed systems but lacks a standard protocol binding for agent interactions.
% ------------------------------------------------------------------
\subsubsection{Gap 5: Federated Agent Learning Privacy}
\label{sec:gap5}
\paragraph{Problem Statement}
Agents participating in federated learning~\cite{mcmahan2017fedavg} or sharing operational data across administrative domains require privacy guarantees beyond transport encryption. No IETF specification addresses differential privacy parameters~\cite{dwork2006diffprivacy} for shared agent telemetry, data minimization for cross-domain agent data, or consent management for federated agent learning.
\paragraph{Evidence and Examples}
Network management agents across ISPs could benefit from federated learning of anomaly detection models without sharing raw traffic data. However, even model updates can leak information about network topology and traffic patterns~\cite{kairouz2021fedlearning-advances}. Without standardized privacy controls, organizations must choose between participating in federated ecosystems (accepting privacy risks) or operating in isolation (forgoing collaborative benefits).
\paragraph{Impact Assessment}
Organizations will be unable to participate in federated agent ecosystems without unacceptable privacy risks, limiting the value of multi-domain agent deployments.
\paragraph{Existing Partial Solutions}
General privacy frameworks exist but none address the specific data types and threat models of agent-to-agent federated learning.
% ------------------------------------------------------------------
\subsubsection{Gap 6: Cross-Domain Agent Audit Trails}
\label{sec:gap6}
\paragraph{Problem Statement}
When agents operate across multiple administrative domains, their actions must be auditable end-to-end. No standardized format exists for cross-domain agent audit trails that preserves causal ordering, links related actions across domain boundaries, and provides tamper-evident logging.
\paragraph{Evidence and Examples}
Execution Audit Tokens~\cite{id-exec-audit} provide per-action audit records, and SCITT provides transparency log primitives, but no standard defines how these records are aggregated, correlated, and queried across domains. A compliance auditor investigating an incident involving agents from three organizations currently has no standard way to reconstruct the end-to-end sequence of agent actions, verify that no records are missing, or confirm the causal relationships between actions in different domains.
Regulatory and compliance requirements (e.g., the EU AI Act) increasingly demand end-to-end audit trails for automated decision-making, making this gap urgent for enterprise deployments.
\paragraph{Impact Assessment}
Organizations cannot demonstrate compliance for cross-domain agent operations, blocking adoption in regulated industries including financial services, healthcare, and telecommunications.
\paragraph{Existing Partial Solutions}
SCITT provides transparency log primitives. Execution Audit Tokens~\cite{id-exec-audit} define per-action audit records. Neither addresses cross-domain correlation or causal ordering.
% ------------------------------------------------------------------
\subsubsection{Gap 7: Human Override Standardization}
\label{sec:gap7}
\paragraph{Problem Statement}
Autonomous agents must always be subject to human override, but no cross-vendor protocol exists for sending override signals to agents. Required override types include emergency stop (immediate halt), graceful pause (complete current step then halt), parameter modification (adjust constraints while running), and forced rollback (undo recent actions).
\paragraph{Evidence and Examples}
The ACP-DAG-HITL framework~\cite{id-dag-hitl-safety} defines \emph{when} human approval is required (policy gates in the DAG execution plan) but does not specify the \emph{protocol} for delivering override signals to a running agent. In a multi-vendor deployment, if Agent~A (from Vendor~X) misbehaves and the management platform (from Vendor~Y) needs to issue an emergency stop, there is no standard message format, delivery mechanism, or acknowledgment protocol.
The absence of standard override creates an asymmetric safety risk: more capable agents that can take more impactful actions are precisely the ones that are hardest to stop if something goes wrong.
\paragraph{Impact Assessment}
Operators lose the ability to control autonomous agents in emergency situations, creating unacceptable safety risks for any deployment beyond sandboxed experimentation.
\paragraph{Existing Partial Solutions}
ACP-DAG-HITL~\cite{id-dag-hitl-safety} defines HITL policies but not override delivery. Vendor-specific kill switches exist but are not interoperable.
% ------------------------------------------------------------------
\subsection{MEDIUM Gaps}
\subsubsection{Gap 8: Cross-Protocol Agent Migration}
\label{sec:gap8}
\paragraph{Problem Statement}
Agents may need to migrate between protocol environments (e.g., from an A2A-based system~\cite{a2a-protocol} to an MCP-based system~\cite{mcp-protocol}) while preserving execution context, identity, and accumulated state. No standard defines how agent context is translated or preserved across protocol boundaries.
\paragraph{Impact Assessment}
Agent deployments become fragmented across protocol silos, reducing interoperability and increasing operational complexity. As the protocol landscape matures and consolidates, lack of migration support will strand early adopters.
\paragraph{Existing Partial Solutions}
ECTs~\cite{id-wimse-ect} provide a protocol-neutral context token but do not define migration procedures.
% ------------------------------------------------------------------
\subsubsection{Gap 9: Agent Resource Accounting and Billing}
\label{sec:gap9}
\paragraph{Problem Statement}
Autonomous agents consume computational, network, and API resources across administrative domains. No standardized mechanism exists for tracking, reporting, and reconciling resource consumption by agents, especially in multi-domain scenarios where an agent's actions incur costs in domains other than its own.
\paragraph{Impact Assessment}
Organizations cannot accurately track or bill for agent resource consumption, hindering the development of sustainable commercial multi-domain agent ecosystems.
\paragraph{Existing Partial Solutions}
No existing IETF work addresses agent-specific resource accounting. Cloud provider billing APIs exist but are domain-specific and do not correlate consumption with agent identity or execution context.
% ------------------------------------------------------------------
\subsubsection{Gap 10: Agent Capability Negotiation}
\label{sec:gap10}
\paragraph{Problem Statement}
When agents interact, they need to discover and negotiate each other's capabilities dynamically. No standardized capability negotiation protocol exists for agents to advertise their functions, agree on interaction protocols, and establish compatible communication parameters.
Well-Known URIs~\cite{rfc8615} and HTTP content negotiation~\cite{rfc9110} provide basic discovery primitives, but agent capability negotiation requires richer semantics: versioned capability declarations, conditional capabilities that depend on policy or context, and mutual negotiation where both parties agree on a compatible interaction profile.
\paragraph{Impact Assessment}
Agent interactions require pre-configured knowledge of peer capabilities, limiting dynamic composition and ad-hoc agent collaboration.
\paragraph{Existing Partial Solutions}
A2A Agent Cards~\cite{a2a-protocol} provide capability advertisement but without policy-constrained negotiation semantics. HTTP content negotiation~\cite{rfc9110} provides basic media type negotiation but not agent-capability-level negotiation.
% ------------------------------------------------------------------
\subsubsection{Gap 11: Agent Performance Benchmarking}
\label{sec:gap11}
\paragraph{Problem Statement}
No standardized metrics or benchmarking methodology exists for evaluating autonomous agent performance. Agent performance spans multiple dimensions: task completion accuracy, latency, resource efficiency, safety compliance rate, and behavioral consistency. Without common metrics, operators cannot compare agent implementations, set performance baselines, or detect performance degradation over time.
\paragraph{Impact Assessment}
Operators cannot objectively evaluate or compare autonomous agent implementations, hindering procurement and deployment decisions.
\paragraph{Existing Partial Solutions}
No existing IETF work addresses agent performance benchmarking. ML model evaluation benchmarks exist but do not address the operational performance dimensions unique to autonomous agents.
% ==================================================================
\section{Proposed Solution Framework}
\label{sec:solutions}
To address the identified gaps, we have developed six companion Internet-Drafts. Table~\ref{tab:roadmap} maps each draft to the gaps it addresses.
\begin{table}[htbp]
\centering
\caption{Companion Draft Roadmap}
\label{tab:roadmap}
\small
\begin{tabular}{@{}p{4.2cm}cc@{}}
\toprule
\textbf{Companion Draft} & \textbf{Gaps} & \textbf{Priority} \\
\midrule
Behavioral Verification~\cite{id-behavioral-verification} & 1, 11 & CRITICAL \\
Cascade Prevention~\cite{id-cascade-prevention} & 2, 4 & CRITICAL \\
\midrule
Consensus Protocol~\cite{id-consensus} & 3, 10 & HIGH \\
Cross-Domain Audit~\cite{id-cross-domain-audit} & 6, 9 & HIGH \\
Override Protocol~\cite{id-override-protocol} & 7 & HIGH \\
Federation Privacy~\cite{id-federation-privacy} & 5, 8 & HIGH \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Companion A: Behavioral Verification}
The Agent Behavioral Verification draft~\cite{id-behavioral-verification} extends the RATS attestation model to agent behavior. It defines behavioral profiles (machine-readable descriptions of permitted agent actions), verification evidence formats (signed attestations of observed behavior), and appraisal procedures for comparing observed behavior against declared profiles. This draft also addresses Gap~11 by defining standardized performance metrics that can be included in behavioral attestations.
\subsection{Companion B: Cascade Prevention}
The Agent Failure Cascade Prevention draft~\cite{id-cascade-prevention} defines protocol-level circuit breakers, failure isolation boundaries, and coordinated rollback for multi-agent DAG workflows. Circuit breaker states (Closed, Open, Half-Open) are communicated between agents using standardized health signals. The draft also addresses Gap~4 (Rollback) by defining checkpoint and undo semantics for agent actions.
\subsection{Companion C: Consensus Protocol}
The Multi-Agent Consensus draft~\cite{id-consensus} defines a consensus protocol tailored to heterogeneous agents. It supports weighted voting, capability-based participation, and policy-constrained proposals. The protocol builds on classical consensus theory~\cite{ongaro2014raft,lamport1998paxos} while adding agent-specific extensions. This draft also addresses Gap~10 by defining capability negotiation as a precursor to consensus formation.
\subsection{Companion D: Cross-Domain Audit}
The Cross-Domain Agent Audit Trails draft~\cite{id-cross-domain-audit} defines a standard format for cross-domain audit records with causal ordering (Lamport timestamps and vector clocks), domain boundary linkage, and tamper-evident aggregation using Merkle trees. The draft builds on SCITT transparency log primitives and Execution Audit Tokens~\cite{id-exec-audit}. It also addresses Gap~9 by including resource consumption records in the audit trail.
\subsection{Companion E: Override Protocol}
The Standardized Human Override Protocol~\cite{id-override-protocol} defines message formats and delivery mechanisms for four override types: emergency stop, graceful pause, parameter modification, and forced rollback. The protocol supports multi-vendor environments with standard acknowledgment semantics and escalation procedures when an agent fails to respond to an override signal.
\subsection{Companion F: Federation Privacy}
The Federated Agent Learning Privacy draft~\cite{id-federation-privacy} defines privacy controls for agents sharing data across domains, including differential privacy parameters for agent telemetry, data minimization profiles, and consent management. The draft also addresses Gap~8 (Cross-Protocol Migration) by defining context preservation mechanisms for agents moving between protocol environments.
\subsection{Dependency Analysis}
The companion drafts have the following dependency structure:
\begin{itemize}
\item \textbf{Behavioral Verification} (A) is foundational: its attestation format is used by Cascade Prevention (B) and Cross-Domain Audit (D).
\item \textbf{Cascade Prevention} (B) defines failure containment that Override Protocol (E) builds upon.
\item \textbf{Consensus} (C) extends behavioral verification with multi-agent agreement.
\item \textbf{Cross-Domain Audit} (D) provides the audit infrastructure that Federation Privacy (F) adds privacy controls to.
\end{itemize}
This dependency ordering implies a natural implementation sequence: A $\rightarrow$ B $\rightarrow$ E and A $\rightarrow$ D $\rightarrow$ F, with C depending on A.
% ==================================================================
\section{Discussion}
\label{sec:discussion}
\subsection{Cross-Cutting Themes}
Several themes cut across multiple gaps:
\paragraph{Trust Propagation}
Gaps~1, 3, 5, 6, and 7 all involve trust relationships that must be established, verified, and maintained across agent and domain boundaries. The ECT~\cite{id-wimse-ect} provides a foundation for trust propagation, but the gaps reveal that identity-based trust is necessary but not sufficient---behavioral trust, consensus-based trust, and audit-verified trust are equally important.
\paragraph{Safety vs.\ Autonomy Trade-off}
Gaps~1, 2, 4, and 7 reflect the fundamental tension between agent autonomy and safety. Greater autonomy enables more valuable agent applications but increases the risk and blast radius of failures. The companion drafts collectively define a ``safety envelope'' that enables autonomy within verified boundaries.
\paragraph{Cross-Domain Operations}
Gaps~5, 6, 8, and 9 all involve operations that cross organizational boundaries. Cross-domain agent operations are where the most valuable applications lie (federated network management, multi-cloud orchestration, supply-chain automation) but also where the standards gaps are most acute.
\subsection{Prioritization Rationale}
The severity classification reflects two criteria:
\begin{enumerate}
\item \textbf{Safety Impact:} Gaps that could lead to safety incidents if unaddressed are rated CRITICAL or HIGH.
\item \textbf{Blocking Effect:} Gaps that prevent entire classes of agent deployments are rated higher than those that merely reduce efficiency.
\end{enumerate}
Gaps~1 and 2 are CRITICAL because they represent fundamental safety requirements: without behavioral verification (Gap~1), operators cannot trust agents, and without cascade prevention (Gap~2), a single failure can cause widespread disruption. Gap~7 (Human Override) is rated HIGH rather than CRITICAL because manual, vendor-specific overrides exist as imperfect stopgaps; the gap is in interoperability, not in the complete absence of override capability.
\subsection{Relationship to Industry Protocols}
The A2A~\cite{a2a-protocol} and MCP~\cite{mcp-protocol} protocols address important aspects of agent communication but operate at a different layer than the gaps identified here. A2A focuses on task-level agent interaction; MCP focuses on tool integration. Neither addresses the safety, governance, and cross-domain concerns that constitute the majority of our identified gaps. We view the companion drafts as complementary to industry protocols: A2A and MCP handle the ``what'' of agent communication, while the companion drafts address the ``how safely'' and ``how accountably.''
\subsection{Limitations}
This analysis has several limitations:
\begin{enumerate}
\item \textbf{Evolving Landscape.} The agent protocol landscape is evolving rapidly. New standards and industry protocols may address some identified gaps by the time the companion drafts reach maturity.
\item \textbf{Implementation Validation.} The gap analysis is based on specification review and architectural analysis, not on experimental evaluation of prototype implementations. Some gaps may prove easier or harder to address in practice than our analysis suggests.
\item \textbf{Scope.} We focus on IETF-style protocol standards and do not analyze gaps in other standardization bodies (W3C, IEEE, ISO) that may also be relevant to autonomous agents.
\item \textbf{Single-Author Perspective.} While informed by discussions in multiple IETF working groups, this analysis reflects a single researcher's assessment. Community review may identify additional gaps or disagree with severity classifications.
\end{enumerate}
% ==================================================================
\section{Conclusion and Future Work}
\label{sec:conclusion}
We have presented a systematic gap analysis of IETF standards for autonomous AI agent protocols. Our analysis identified eleven specific gaps across four severity categories, organized within a four-layer reference architecture. The two CRITICAL gaps---behavioral verification and cascade prevention---represent fundamental safety requirements that must be addressed before autonomous agents can be deployed responsibly in production environments.
Six companion Internet-Drafts have been developed to address the most pressing gaps, with a dependency-ordered implementation roadmap. These drafts are designed to be complementary to existing IETF work (WIMSE, RATS, SCITT) and to industry protocols (A2A, MCP).
Future work includes:
\begin{itemize}
\item \textbf{Prototype Implementation:} Building reference implementations of the companion drafts to validate feasibility and identify specification gaps.
\item \textbf{Interoperability Testing:} Developing an interoperability test suite for multi-vendor agent deployments using the proposed standards.
\item \textbf{Formal Verification:} Applying formal methods to verify safety properties of the proposed protocols, particularly the cascade prevention and override mechanisms.
\item \textbf{Performance Evaluation:} Measuring the overhead introduced by behavioral verification, audit logging, and consensus protocols in realistic agent workloads.
\item \textbf{Community Engagement:} Presenting this analysis to relevant IETF working groups (WIMSE, RATS, NMOP) to solicit feedback and build consensus on prioritization.
\end{itemize}
The autonomous agent ecosystem is at an inflection point: capable enough to deliver real value, but lacking the protocol-level safety and governance infrastructure needed for responsible deployment. Closing the gaps identified in this analysis is essential for realizing the potential of autonomous agents while maintaining the safety and accountability that critical infrastructure demands.
% ==================================================================
\section*{Acknowledgments}
The author thanks the participants of the WIMSE, RATS, and NMOP working groups for discussions that informed this analysis.
% ==================================================================
% Switch to IEEEtran for final arXiv submission:
% \bibliographystyle{IEEEtran}
\bibliographystyle{plain}
\bibliography{references}
\end{document}