feat: add IETF landscape paper source (LaTeX + BibTeX + Makefile)

New LaTeX paper analyzing the AI-agent standardization landscape across IETF Internet-Drafts. Includes bibliography, updated Makefile for pdflatex+bibtex build, and gitignore entries for build artifacts.
2026-04-12 12:43:15 +00:00
parent 56f2ce669c
commit 45cb13fbe8
4 changed files with 1258 additions and 11 deletions
--- a/paper/ietf-landscape.tex
+++ b/paper/ietf-landscape.tex
@@ -0,0 +1,899 @@
+\documentclass[11pt,a4paper]{article}
+
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{lmodern}
+\usepackage[margin=2.5cm]{geometry}
+\usepackage{amsmath,amssymb}
+\usepackage{graphicx}
+\usepackage{booktabs}
+\usepackage{tabularx}
+\usepackage{hyperref}
+\usepackage{xcolor}
+\usepackage{natbib}
+\usepackage{enumitem}
+\usepackage{float}
+\usepackage{caption}
+
+\hypersetup{
+  colorlinks=true,
+  linkcolor=blue!60!black,
+  citecolor=green!50!black,
+  urlcolor=blue!70!black,
+}
+
+\setlength{\parskip}{0.4em}
+\setlength{\parindent}{0em}
+
+\title{%
+  Mapping the AI-Agent Standardization Landscape:\\
+  An LLM-Assisted Analysis of IETF Internet-Drafts%
+}
+\author{
+  Christian Nennemann\\
+  Independent Researcher\\
+  \texttt{write@nennemann.de}
+}
+\date{April 2026}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+The Internet Engineering Task Force (IETF) is experiencing an unprecedented
+surge in standardization activity around AI agents. Between January~2024 and
+March~2026, AI- and agent-related Internet-Drafts grew from 0.5\% to 9.3\%
+of all IETF submissions. We present a systematic, LLM-assisted analysis of
+this landscape, covering 475 drafts from 713 authors across more than 230
+organizations. Our pipeline combines keyword-based corpus construction from
+the IETF Datatracker API, multi-dimensional quality rating via Claude
+(Anthropic) as an LLM-as-judge, semantic embedding and clustering via a
+local embedding model (nomic-embed-text), LLM-based extraction of 501
+discrete technical ideas, and gap analysis against the assembled corpus.
+Key findings include: (1)~a persistent capability-to-safety deficit, with
+roughly four capability-building drafts for every safety-oriented one;
+(2)~extreme protocol fragmentation, including 14~competing OAuth-for-agents
+proposals and 155~agent-to-agent protocol drafts with no interoperability
+layer; (3)~high organizational concentration, with a single vendor
+contributing approximately 16\% of all drafts; (4)~132 cross-organization
+convergent ideas independently proposed by multiple organizations, signaling
+latent consensus beneath the fragmentation; and (5)~11 identified
+standardization gaps, three rated critical, centered on behavioral
+verification, capability degradation detection, and emergency override
+protocols. The total analysis cost approximately \$9--15\,USD in API fees.
+We discuss implications for AI-agent standardization strategy, the
+limitations of LLM-as-judge methodologies applied to technical document
+corpora, and organizational dynamics shaping the standards landscape.
+\end{abstract}
+
+\textbf{Keywords:} IETF, Internet-Drafts, AI agents, standardization,
+LLM-as-judge, landscape analysis, multi-agent systems, protocol
+fragmentation
+
+
+% =========================================================================
+\section{Introduction}
+\label{sec:intro}
+% =========================================================================
+
+The deployment of autonomous AI agents---software systems that perceive
+their environment, make decisions, and take actions with limited human
+supervision---has accelerated dramatically since 2023. Commercial
+offerings from Anthropic, Google, OpenAI, and others have moved AI agents
+from research prototypes to production systems that browse the web,
+execute code, manage cloud infrastructure, and interact with external
+services on behalf of users. This proliferation raises fundamental
+questions about identity, authentication, delegation, safety, and
+interoperability that fall squarely within the purview of Internet
+standards bodies.
+
+The IETF, responsible for the core protocols of the Internet, has
+responded with an extraordinary burst of activity. In 2024, just 9
+AI- or agent-related Internet-Drafts were submitted---0.5\% of all
+submissions. By the first quarter of 2026, that figure reached 9.3\%:
+nearly one in ten new drafts addressed AI agents in some capacity.
+Monthly submissions surged from 5 in June~2025 to 85 in February~2026,
+a growth rate without precedent in the IETF's recent history.
+
+This rapid expansion creates an analytical challenge. The volume of
+drafts, the diversity of working groups involved, the overlapping scope
+of competing proposals, and the speed of new submissions make manual
+tracking infeasible. A standards participant seeking to understand the
+landscape---which problems are being addressed, which are being
+neglected, where proposals converge and where they conflict---faces a
+corpus of hundreds of technical documents evolving on a weekly basis.
+
+We address this challenge with an LLM-assisted analysis pipeline that
+automates the collection, rating, clustering, idea extraction, and gap
+identification for the full corpus of AI-agent-related IETF
+Internet-Drafts. The pipeline combines three complementary analytical
+approaches: (1)~LLM-as-judge rating of drafts on five quality
+dimensions, using Claude (Anthropic) with structured prompts;
+(2)~embedding-based semantic similarity and clustering, using a locally
+hosted nomic-embed-text model via Ollama; and (3)~LLM-based extraction
+of discrete technical ideas and identification of landscape gaps.
+
+Our contributions are:
+
+\begin{itemize}[nosep]
+  \item A comprehensive, quantitative map of the IETF's AI-agent
+    standardization landscape as of March~2026, covering 475 drafts,
+    713 authors, 501 extracted technical ideas, and 11 identified gaps.
+  \item A replicable, cost-effective methodology for LLM-assisted
+    standards corpus analysis (\$9--15 total), with explicit
+    documentation of limitations and methodological caveats.
+  \item Empirical findings on organizational concentration,
+    protocol fragmentation, cross-organization convergence, and
+    the capability-to-safety imbalance in the current landscape.
+  \item An open-source tool (the IETF Draft Analyzer) that makes the
+    pipeline, database, and all derived reports available for
+    independent verification and extension.
+\end{itemize}
+
+The remainder of this paper is organized as follows.
+Section~\ref{sec:related} reviews related work on standards landscape
+analysis, NLP for technical documents, and technology mapping.
+Section~\ref{sec:method} describes the data collection and analysis
+pipeline in detail. Section~\ref{sec:results} presents our findings
+across five analytical dimensions. Section~\ref{sec:discussion}
+discusses implications, limitations, and organizational dynamics.
+Section~\ref{sec:conclusion} concludes.
+
+
+% =========================================================================
+\section{Related Work}
+\label{sec:related}
+% =========================================================================
+
+Our work sits at the intersection of three research areas: standards
+ecosystem analysis, NLP applied to technical document corpora, and
+technology landscape mapping.
+
+\subsection{Standards Analysis}
+
+The economics and dynamics of technical standardization have been
+studied extensively. \citet{simcoe2012} analyzes consensus governance
+in standard-setting committees, showing how committee structure
+influences the trajectory of shared technology platforms.
+\citet{blind2017} examine the impact of standards and regulation on
+innovation in uncertain markets, a framing directly applicable to the
+nascent AI-agent ecosystem where both the technology and the regulatory
+environment are in flux. \citet{lerner2014} study standard-essential
+patents, a concern that is beginning to surface in the AI-agent space
+as organizations file IPR declarations on agent-related protocols.
+
+Prior quantitative analyses of IETF activity have typically focused on
+participation patterns, working group dynamics, or the trajectory of
+individual RFCs through the standards process. Our work differs in
+scope: rather than analyzing the IETF as an institution, we analyze a
+specific cross-cutting topic (AI agents) that spans multiple working
+groups and is evolving too rapidly for traditional manual survey methods.
+
+\subsection{NLP for Technical Documents}
+
+The application of natural language processing to technical and legal
+document corpora has expanded significantly with the advent of large
+language models. \citet{devlin2019} introduced BERT-based approaches
+that enabled transfer learning for domain-specific text
+classification. More recently, \citet{brown2020} demonstrated that
+large language models exhibit strong few-shot and zero-shot performance
+on diverse text understanding tasks, opening the possibility of using
+LLMs as automated annotators for technical documents.
+
+The ``LLM-as-judge'' paradigm---using language models to evaluate or
+rate text artifacts---has been systematically studied by
+\citet{zheng2023}, who introduced MT-Bench and Chatbot Arena to
+evaluate LLM judges against human preferences. Their work establishes
+both the promise (high correlation with human judgment on structured
+evaluation tasks) and the limitations (position bias, verbosity bias,
+self-enhancement bias) of LLM-based evaluation. Our use of Claude as a
+rater for IETF drafts follows this paradigm, with the specific
+limitation that no human calibration study has been performed on our
+rating outputs (see Section~\ref{sec:limitations}).
+
+Embedding-based document similarity using models such as
+Sentence-BERT~\citep{nussbaumer2024} and its successors has become
+standard practice for document clustering and retrieval. We use
+nomic-embed-text~\citep{nomic2024}, a general-purpose text embedding
+model, for computing pairwise cosine similarity across the draft corpus.
+The resulting similarity matrix enables both cluster detection and
+visualization via t-SNE~\citep{vandermaaten2008}.
+
+\subsection{Technology Landscape Surveys}
+
+Technology landscape mapping---the systematic identification and
+organization of technical activities within a domain---has a long
+history in foresight and innovation studies.
+\citet{porter2005} introduced ``tech mining'' as a methodology for
+extracting competitive intelligence from patent and publication
+databases. \citet{roper2011} extended these methods to broader
+technology management contexts. Our work adapts these approaches to
+the standards domain, replacing patent databases with the IETF
+Datatracker and augmenting keyword-based search with LLM-driven
+semantic analysis.
+
+The AI agent research community has produced several recent surveys.
+\citet{wang2024} and \citet{xi2023} survey the rapidly growing
+literature on LLM-based autonomous agents, covering architectures,
+capabilities, and evaluation. These academic surveys focus on
+research contributions; our work complements them by mapping the
+parallel standardization effort, where research ideas meet the
+engineering constraints of Internet protocol design.
+
+The multi-agent systems (MAS) research tradition, surveyed
+comprehensively by \citet{wooldridge2009} and \citet{dorri2018},
+provides historical context. The FIPA Agent Communication
+Language~\citep{fipa-acl} and Agent Management
+Specification~\citep{fipa-ams}, developed between 1996 and 2005,
+addressed many of the same problems---agent discovery, communication
+protocols, platform interoperability---that the current IETF drafts
+tackle. The near-complete absence of FIPA references in the
+contemporary IETF corpus suggests limited awareness of this prior art,
+a finding we quantify in Section~\ref{sec:results}.
+
+
+% =========================================================================
+\section{Methodology}
+\label{sec:method}
+% =========================================================================
+
+The analysis pipeline consists of six sequential stages, each building
+on the output of the previous. All intermediate results are stored in
+a SQLite database (28\,MB) with FTS5 full-text search, enabling both
+pipeline idempotency and ad-hoc querying. The complete pipeline is
+implemented as a Python CLI tool (approximately 6,100 lines across 12
+modules) using Click, httpx, the Anthropic SDK, and Ollama.
+
+\subsection{Data Collection}
+\label{sec:datacollection}
+
+\subsubsection{Corpus Construction}
+
+Drafts were retrieved from the IETF Datatracker
+API\footnote{\url{https://datatracker.ietf.org/api/v1/doc/document/}}
+using keyword search across both draft names
+(\texttt{name\_\_contains}) and abstracts
+(\texttt{abstract\_\_contains}). Twelve search terms were used:
+\textit{agent}, \textit{ai-agent}, \textit{agentic},
+\textit{autonomous}, \textit{mcp}, \textit{inference},
+\textit{generative}, \textit{intelligent}, \textit{large language
+model}, \textit{multi-agent}, and \textit{trustworth}.
+Only drafts with \texttt{type\_\_slug=draft} and submission date
+$\geq$~2024-01-01 were included. Full text was downloaded from the
+IETF archive.\footnote{\url{https://www.ietf.org/archive/id/}}
+
+The keyword set was expanded iteratively. An initial set of 6 keywords
+yielded 260 drafts; adding 6 further terms captured 174 additional
+drafts in categories initially underrepresented, including MCP-related
+work, generative AI infrastructure, and the nascent \texttt{aipref}
+working group. A polite delay of 0.5\,seconds was applied between API
+requests.
+
+The resulting corpus contains 475 drafts. After false-positive
+filtering (removing drafts about ``user agents,'' ``autonomous
+systems'' in routing, and other non-AI uses of matched keywords), 361
+drafts were retained as AI/agent-relevant based on a relevance
+rating threshold.
+
+\subsubsection{Supplementary Standards Bodies}
+
+To contextualize the IETF landscape, we ingested a supplementary
+corpus of standards and specifications from five additional bodies:
+ISO/IEC (including ISO~22989~\citep{iso22989} and
+ISO~42001~\citep{iso42001}), ITU-T (including
+Y.3172~\citep{itu-y3172}), ETSI (ENI, ZSM), W3C (Web of Things,
+Verifiable Credentials, WebNN), and NIST (AI RMF~\citep{nist-ai-rmf}).
+These documents were included in the gap analysis (Section~\ref{sec:gaps})
+to identify areas where non-IETF bodies provide coverage that the IETF
+corpus lacks, and vice versa.
+
+\subsubsection{Author and Affiliation Data}
+
+Author records were fetched from the Datatracker's
+\texttt{documentauthor} and \texttt{person} endpoints. Organizational
+affiliations were normalized using a hand-curated alias table of 40+
+mappings (e.g., ``Huawei Technologies Co., Ltd.''
+$\rightarrow$~``Huawei'') supplemented by automatic suffix stripping
+for common corporate suffixes.
+
+\subsection{LLM-Based Analysis}
+\label{sec:llm-analysis}
+
+\subsubsection{Multi-Dimensional Rating}
+
+Each draft was rated by Claude (Anthropic; Sonnet model) on five
+dimensions using a structured prompt containing the draft's name,
+title, submission date, page count, and abstract (truncated to 2,000
+characters). The five rating dimensions are:
+
+\begin{itemize}[nosep]
+  \item \textbf{Novelty} (1--5): Originality relative to existing
+    standards and proposals.
+  \item \textbf{Maturity} (1--5): Completeness of the technical
+    specification.
+  \item \textbf{Overlap} (1--5): Redundancy with other known drafts
+    (5 indicates near-duplication).
+  \item \textbf{Momentum} (1--5): Community engagement, revisions,
+    and working group adoption signals.
+  \item \textbf{Relevance} (1--5): Importance to the AI/agent
+    ecosystem specifically.
+\end{itemize}
+
+The prompt instructs Claude to return structured JSON with integer
+scores and brief justification notes for each dimension, plus a 2--3
+sentence summary and one or more category labels drawn from a
+predefined taxonomy of 11 categories (Table~\ref{tab:categories}).
+A composite quality score is computed as the arithmetic mean of
+novelty, maturity, momentum, and relevance (excluding overlap, which
+measures redundancy rather than quality).
+
+To reduce API costs, drafts were rated in batches of five using a
+batch prompt variant. Each draft's abstract was truncated to 1,500
+characters in batch mode. All API responses were cached in an
+\texttt{llm\_cache} table keyed by SHA-256 hash of the full prompt,
+making the pipeline idempotent on re-runs.
+
+\subsubsection{Idea Extraction}
+
+Discrete technical ideas---mechanisms, protocols, architectural
+patterns, extensions, and requirements---were extracted from each
+draft using Claude. For individual extraction, the prompt included
+the abstract and the first 3,000 characters of full text (Sonnet
+model). For batch extraction, groups of five drafts were processed
+per API call using the cheaper Haiku model with abstracts truncated
+to 800 characters. The prompt requested 1--4 top-level novel
+contributions per draft, with explicit instructions to merge
+sub-features into parent ideas and to return an empty array for
+drafts lacking substantive technical content.
+
+Extracted ideas were deduplicated within each draft using
+embedding-based cosine similarity (threshold~0.85), removing ideas
+that were restatements of the same concept. Cross-draft idea overlap
+was analyzed using Python's \texttt{SequenceMatcher} with a fuzzy
+matching threshold of~0.75 on idea titles, enabling detection of
+convergent ideas across organizational boundaries.
+
+\subsubsection{Gap Analysis}
+
+A single Claude Sonnet call received a compressed landscape summary
+containing category distribution counts, the 20 most frequently
+occurring idea titles, overlap cluster statistics, and summaries of
+relevant non-IETF standards. The prompt instructed the model to
+identify 8--15 standardization gaps---areas, problems, or technical
+challenges not adequately addressed by the existing corpus---with
+structured output including topic, description, severity rating
+(critical/high/medium/low), evidence, and partial coverage from
+existing standards.
+
+\subsection{Embedding and Clustering}
+\label{sec:embedding}
+
+Vector embeddings were generated locally using Ollama with the
+nomic-embed-text model~\citep{nomic2024}. For each draft, the input
+combined the title, abstract, and first 4,000 characters of full text
+(when available), producing a 768-dimensional vector stored as a
+binary blob in SQLite.
+
+Pairwise cosine similarity was computed across all embedded drafts,
+producing an $n \times n$ similarity matrix (cached to disk as a
+NumPy array). Clustering used a greedy single-linkage algorithm: for
+each unvisited draft, all unvisited drafts with cosine similarity
+$\geq \tau$ to the seed were added to its cluster. Three empirically
+determined thresholds were applied:
+
+\begin{itemize}[nosep]
+  \item $\tau = 0.85$: Topically overlapping drafts (42 clusters).
+  \item $\tau = 0.90$: Near-duplicates or same-author variants (34
+    clusters).
+  \item $\tau = 0.98$: Functionally identical drafts (25+ pairs).
+\end{itemize}
+
+These thresholds were selected by manual inspection of draft pairs at
+each level; no systematic sensitivity analysis was performed (see
+Section~\ref{sec:limitations}).
+
+\subsection{Supplementary Analyses}
+
+Three additional analysis passes operate on the stored data with zero
+API cost:
+
+\begin{enumerate}[nosep]
+  \item \textbf{RFC cross-references}: Regex-based extraction of
+    RFC, BCP, and draft citations from full text, yielding 4,231
+    cross-references across 360 drafts.
+  \item \textbf{Category trends}: SQL-based monthly breakdown of new
+    drafts per category with growth rates.
+  \item \textbf{Co-authorship network}: Team bloc detection via
+    pairwise author overlap ($\geq$70\% shared drafts, $\geq$2 shared
+    drafts), with connected components forming blocs.
+\end{enumerate}
+
+\subsection{Cost}
+
+Table~\ref{tab:cost} summarizes the total pipeline cost for 475 drafts.
+
+\begin{table}[H]
+\centering
+\caption{Pipeline cost breakdown.}
+\label{tab:cost}
+\begin{tabular}{llrr}
+\toprule
+\textbf{Stage} & \textbf{Model} & \textbf{Items} & \textbf{Cost (USD)} \\
+\midrule
+Rating        & Claude Sonnet     & 475 drafts & \$5.50--8.00 \\
+Idea extract. & Claude Haiku      & 475 drafts & \$0.80 \\
+Gap analysis  & Claude Sonnet     & 1 call     & \$0.20 \\
+Embeddings    & Ollama (local)    & 475 drafts & \$0.00 \\
+RFC refs      & Regex (local)     & 475 drafts & \$0.00 \\
+Trends        & SQL (local)       & 475 drafts & \$0.00 \\
+Idea overlap  & SequenceMatcher   & 501 ideas  & \$0.00 \\
+\midrule
+\textbf{Total} & & & \textbf{\$6.50--9.00} \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+
+% =========================================================================
+\section{Results}
+\label{sec:results}
+% =========================================================================
+
+\subsection{Corpus Overview and Growth Trajectory}
+
+The final corpus comprises 475 Internet-Drafts submitted between
+January~2024 and March~2026. After false-positive filtering (drafts
+with relevance score $\leq$~2 or manually flagged), 361 drafts were
+retained as substantively related to AI agents.
+
+The growth trajectory is striking. In 2024, 9 AI/agent drafts were
+submitted (0.5\% of 1,651 total IETF drafts). In 2025, 190 were
+submitted (7.0\% of 2,696). In Q1~2026 alone, 162 were submitted
+(9.3\% of 1,748). Monthly submissions followed a step function:
+5~drafts in June~2025, 61 in October~2025, 85 in February~2026.
+The acceleration has not plateaued as of March~2026.
+
+\begin{table}[H]
+\centering
+\caption{Growth of AI/agent-related IETF Internet-Drafts.}
+\label{tab:growth}
+\begin{tabular}{rrrr}
+\toprule
+\textbf{Year} & \textbf{Total IETF} & \textbf{AI/Agent} & \textbf{Share (\%)} \\
+\midrule
+2024          & 1,651 &   9 & 0.5 \\
+2025          & 2,696 & 190 & 7.0 \\
+2026 (Q1)     & 1,748 & 162 & 9.3 \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\subsection{Thematic Distribution}
+\label{sec:categories}
+
+Drafts were classified into 11 non-exclusive categories
+(Table~\ref{tab:categories}). A single draft may belong to multiple
+categories; percentages therefore exceed 100\%.
+
+\begin{table}[H]
+\centering
+\caption{Category distribution across 475 drafts. Drafts may appear in
+multiple categories.}
+\label{tab:categories}
+\begin{tabular}{lrr}
+\toprule
+\textbf{Category} & \textbf{Drafts} & \textbf{Share (\%)} \\
+\midrule
+Data formats / interoperability   & 214 & 45 \\
+Policy / governance               & 214 & 45 \\
+Agent identity / authentication   & 160 & 34 \\
+A2A protocols                     & 157 & 33 \\
+Autonomous network operations     & 124 & 26 \\
+Agent discovery / registration    &  89 & 19 \\
+ML traffic management             &  79 & 17 \\
+Human--agent interaction          &  57 & 12 \\
+AI safety / alignment             & 112 & 24 \\
+Model serving / inference         &  42 &  9 \\
+Other AI/agent                    &  -- & -- \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+The dominance of infrastructure categories---data formats, identity,
+communication protocols---is expected for an early-stage standards
+effort. The comparatively low representation of safety/alignment and
+human--agent interaction categories is a structural finding we examine
+in Section~\ref{sec:safety-deficit}.
+
+\subsection{The Capability-to-Safety Deficit}
+\label{sec:safety-deficit}
+
+The ratio of capability-building drafts (A2A protocols, autonomous
+network operations, agent discovery, model serving) to safety-oriented
+drafts (AI safety/alignment, human--agent interaction) is
+approximately 4:1 on aggregate. This ratio varies significantly by
+month, ranging from 1.5:1 in months with concentrated safety
+submissions to over 20:1 in months dominated by protocol proposals.
+
+The drafts that do address safety are among the highest-rated in the
+corpus. The Verifiable Observation Logging for Transparency
+(VOLT)~\citep{draft-cowles-volt} protocol scored 4.75/5.0 on the
+four-dimension composite (excluding overlap), as did the Distributed
+AI Accountability Protocol (DAAP)~\citep{draft-aylward-daap}. The
+STAMP protocol~\citep{draft-guy-bary-stamp} for cryptographic
+delegation and proof scored 4.5. The quality of safety-focused work
+is high; the quantity is not.
+
+An analysis of RFC cross-references reinforces this finding. Across
+4,231 parsed citations, the most-referenced standards after the
+boilerplate RFC~2119/8174 conventions are TLS~1.3~\citep{rfc8446}
+(42 citations), OAuth~2.0~\citep{rfc6749} (36), HTTP
+Semantics~\citep{rfc9110} (34), and JWT~\citep{rfc7519} (22). The
+agent standards ecosystem is being constructed on the web's existing
+security infrastructure---OAuth, TLS, HTTP, JWT---yet the safety
+layer that should accompany this security foundation remains
+underdeveloped.
+
+\subsection{Protocol Fragmentation}
+\label{sec:fragmentation}
+
+Embedding-based similarity analysis reveals extensive duplication and
+fragmentation across the corpus.
+
+\subsubsection{Near-Duplicates}
+
+At the 0.98 cosine similarity threshold, 25+ draft pairs are
+functionally identical---the same proposal submitted under different
+names, to different working groups, or as renamed revisions. A
+taxonomy of near-duplicates includes: same draft submitted to
+different working groups (14 pairs), renamed drafts (5), evolutionary
+versions (3), and genuinely competing proposals from different
+organizations (2+).
+
+\subsubsection{Competing Clusters}
+
+At the 0.85 threshold, 42 topical clusters emerge. The most crowded
+is OAuth for AI agents, with 14 distinct proposals all addressing
+how AI agents authenticate and receive authorization via the OAuth
+framework. These range from broad profile proposals to narrow scope
+extensions to comprehensive accountability systems. None are
+interoperable.
+
+The A2A protocol space encompasses 157 drafts with no
+interoperability layer. The most common technical idea in the entire
+extracted corpus---``Multi-Agent Communication Protocol''---appears
+independently in 8 drafts from different teams. A 10-draft cluster
+addresses agent gateway and multi-agent collaboration, with
+approaches ranging from semantic routing gateways to cross-domain
+interoperability frameworks.
+
+\subsubsection{Causes of Fragmentation}
+
+The data distinguishes three causes: (1)~working group shopping, where
+authors submit the same draft to multiple working groups seeking
+adoption; (2)~parallel invention, where isolated teams independently
+solve the same problem; and (3)~strategic surface-area expansion,
+where organizations submit multiple related drafts to maximize
+presence in the standards landscape.
+
+\subsection{Organizational Dynamics}
+\label{sec:orgs}
+
+\subsubsection{Concentration}
+
+Authorship is heavily concentrated. Huawei leads with 53 authors
+contributing to 69 drafts---approximately 16\% of the entire corpus
+across all Huawei entities. China Mobile (24~authors, 35~drafts),
+Cisco (24~authors, 26~drafts), and China Telecom (24~authors,
+24~drafts) follow. Chinese-linked institutions (Huawei, China
+Mobile, China Telecom, China Unicom, Tsinghua University, ZTE, BUPT,
+and associated laboratories) collectively account for over 160
+authors.
+
+Western technology companies are dramatically underrepresented
+relative to their market positions. Google is present with 5 authors
+on 9 drafts. Microsoft, Apple, and Meta have minimal direct
+participation. Amazon's 6 authors focus on post-quantum cryptography
+rather than agent-specific work.
+
+\subsubsection{Team Blocs}
+
+Co-authorship analysis identifies 18 team blocs among the 713 authors,
+covering approximately 25\% of all authors. The largest bloc is a
+13-person Huawei team sharing 22 drafts with 94\% average cohesion
+(measured as pairwise overlap of draft portfolios). The team's core
+of 7 members each appear on 13--23 drafts.
+
+Cross-organizational collaboration is sparse. The most productive
+cross-team pair shares only 3 drafts. Chinese organizations form a
+tightly linked ecosystem: Huawei--China Unicom shares 6 drafts,
+Tsinghua--Zhongguancun Lab shares 5, China Mobile--ZTE shares 4.
+European telecoms (Deutsche Telekom, Telef\'onica, Orange) act as
+bridges between Chinese and Western institutions.
+
+\subsection{Cross-Organization Convergence}
+\label{sec:convergence}
+
+Despite the fragmentation, significant latent consensus exists. Using
+fuzzy title matching (\texttt{SequenceMatcher} at 0.75 threshold) on
+the 501 extracted ideas, 132 ideas (approximately 33\% of unique idea
+clusters) have been independently proposed by two or more organizations.
+
+The strongest convergence signals include ``A2A Communication
+Paradigm'' (proposed by 8 organizations from 5 countries),
+``AI Agent Network Architecture'' (8 organizations), and
+``Multi-Agent Communication Protocol'' (7 organizations). An
+examination of organizational pairs reveals that 180 convergent ideas
+cross the boundary between Chinese-linked and Western organizations,
+indicating genuine cross-cultural consensus on technical directions
+despite the sparse direct collaboration noted in
+Section~\ref{sec:orgs}.
+
+The coexistence of convergence and fragmentation has a specific
+structure: organizations agree on \textit{what} needs building (the
+convergent ideas) but disagree on \textit{how} to build it (the
+competing protocol proposals). This gap between problem consensus and
+solution divergence is where architectural coordination is most needed.
+
+\subsection{Gap Analysis}
+\label{sec:gaps}
+
+The gap analysis identified 11 standardization gaps, distributed across
+severity levels as shown in Table~\ref{tab:gaps}.
+
+\begin{table}[H]
+\centering
+\caption{Identified standardization gaps by severity.}
+\label{tab:gaps}
+\begin{tabularx}{\textwidth}{llX}
+\toprule
+\textbf{Severity} & \textbf{Topic} & \textbf{Description} \\
+\midrule
+Critical & Agent legal liability &
+  No standard addresses liability assignment when autonomous agents
+  cause harm or make binding commitments across creators, operators,
+  and users. \\
+Critical & Capability degradation detection &
+  No standard defines detection mechanisms for gradual capability
+  degradation due to concept drift, adversarial inputs, or model
+  corruption. \\
+Critical & Emergency override protocols &
+  No standard defines distributed emergency-stop mechanisms for
+  autonomous agents exhibiting dangerous behavior across
+  multi-system deployments. \\
+\midrule
+High & Cross-domain identity portability &
+  Agents cannot maintain consistent identity across organizational
+  domains with different identity systems. \\
+High & Real-time behavior explanation &
+  No standard for interactive, real-time explanations of agent
+  decision-making during operation. \\
+High & Multi-agent conflict resolution &
+  No protocol for resolving conflicts when multiple agents have
+  competing objectives or contend for shared resources. \\
+High & Inter-standards-body bridging &
+  Protocols from IETF, ITU-T, and ISO cannot interoperate, creating
+  silos across network, internet, and industrial domains. \\
+High & Behavioral audit trails &
+  Missing standards for immutable, decision-level audit logs
+  supporting forensic analysis and regulatory compliance. \\
+\midrule
+Medium & Resource consumption limits &
+  No self-regulation standards for agent computational, network, and
+  energy resource usage. \\
+Medium & Training data provenance &
+  Missing standards for tracking data lineage as it flows between
+  agents in federated learning scenarios. \\
+Medium & Content attribution &
+  No cryptographic attribution standards for agent-generated content.\\
+\bottomrule
+\end{tabularx}
+\end{table}
+
+The three critical gaps share a common theme: they address what happens
+when autonomous agents fail or misbehave. The capability-building
+majority of the corpus assumes cooperative, well-functioning agent
+systems; the critical gaps expose the absence of standards for the
+adversarial, degraded, and emergency cases that inevitably arise in
+production deployment.
+
+Cross-referencing gaps with extracted ideas quantifies the coverage
+deficit. The ``emergency override'' gap has only 15 partially
+addressing ideas across the corpus. The ``multi-agent conflict
+resolution'' and ``inter-standards-body bridging'' gaps have zero
+directly related extracted ideas---they are entirely unaddressed.
+
+
+% =========================================================================
+\section{Discussion}
+\label{sec:discussion}
+% =========================================================================
+
+\subsection{Implications for Standardization Strategy}
+
+The landscape reveals a standards ecosystem in a characteristic
+early-stage pattern: rapid expansion, parallel invention, and
+insufficient coordination. The IETF has navigated such patterns
+before---the early web, IoT, DNS security---and the historical
+resolution involves convergence of competing proposals, working group
+consolidation, and the emergence of a small number of lasting
+standards from a large initial field.
+
+Three strategic priorities emerge from the data:
+
+\textbf{Safety-first coordination.} The 4:1 capability-to-safety
+ratio is a structural risk. The critical gaps---behavioral verification,
+capability degradation detection, emergency override---are precisely
+the areas where standardization failure has the highest real-world
+consequence. Unlike protocol fragmentation, which causes confusion and
+implementation cost, safety gaps create liability and harm. The
+EU AI Act~\citep{eu-ai-act}, which mandates real-time explainability
+and human oversight for high-risk AI systems, will make several of
+these gaps regulatory obligations rather than optional best practices.
+
+\textbf{Architectural connective tissue.} The landscape needs not more
+protocols but a shared execution model. The convergence data shows that
+organizations agree on the components; they disagree on the
+integration. Proposals like VOLT~\citep{draft-cowles-volt} (execution
+traces), DAAP~\citep{draft-aylward-daap} (accountability),
+STAMP~\citep{draft-guy-bary-stamp} (cryptographic delegation), and
+Verifiable Agent Conversations~\citep{draft-birkholz-vac} (signed
+conversation records) address complementary parts of the same
+architectural problem. An overarching agent execution architecture
+that composes these components would accelerate convergence more
+effectively than continued parallel invention.
+
+\textbf{Cross-organization coordination.} The team bloc structure
+produces drafts that are internally consistent but externally
+incompatible. The 18 detected blocs function as islands; the bridges
+between them are thin. Mechanisms that encourage cross-bloc
+collaboration---joint design teams, interop testing events,
+shared reference implementations---are more likely to produce lasting
+standards than the current pattern of parallel submission.
+
+\subsection{Relationship to Prior Agent Standards}
+
+A notable finding is the near-complete absence of references to FIPA
+(Foundation for Intelligent Physical Agents) in the contemporary IETF
+corpus. FIPA's Agent Communication Language~\citep{fipa-acl} and Agent
+Management Specification~\citep{fipa-ams}, developed between 1996 and
+2005, addressed agent discovery, communication, platform
+interoperability, and interaction protocols---the same problem space
+that the current wave of IETF drafts tackles.
+
+The absence of FIPA references does not necessarily indicate ignorance;
+the web-native technical context of 2025 differs substantially from the
+Java/CORBA context of 2002. However, the recurrence of problems
+FIPA addressed (agent naming, message semantics, directory services,
+interaction protocols) suggests that explicit engagement with the
+FIPA legacy could help the IETF community avoid re-learning lessons
+from two decades ago.
+
+\subsection{Limitations}
+\label{sec:limitations}
+
+The methodology has several limitations that affect the confidence and
+generalizability of the findings.
+
+\textbf{LLM-as-judge validity.} All quality ratings are generated by a
+single LLM (Claude Sonnet) from draft abstracts truncated to 2,000
+characters. No human calibration study has been performed; no
+inter-rater reliability is established. The ratings should be treated
+as relative rankings within this corpus, not absolute quality measures.
+Maturity scores are particularly affected by abstract-only input, as
+abstracts may not convey the full technical depth of a specification.
+The overlap dimension is limited because Claude rates each draft
+independently without access to the full corpus, meaning it reflects
+the model's general knowledge rather than corpus-specific similarity.
+A validation study using domain expert ratings on a sample of 25--30
+drafts would substantially strengthen confidence.
+
+\textbf{Corpus selection bias.} Keyword-based selection introduces both
+false positives (``agent'' matching ``user agent,'' ``autonomous''
+matching ``autonomous systems'' in routing) and false negatives
+(relevant drafts using terminology outside the keyword set). We
+estimate 30--50 false positives remain despite relevance filtering.
+The temporal cutoff of January~2024 excludes earlier foundational work.
+
+\textbf{Clustering thresholds.} The similarity thresholds (0.85, 0.90,
+0.98) are empirically chosen by manual inspection, not derived from
+principled analysis. The embedding model (nomic-embed-text) is a
+general-purpose model not fine-tuned for standards document similarity.
+Sensitivity analysis across thresholds and comparison with alternative
+clustering methods (DBSCAN, hierarchical agglomerative) would
+strengthen the clustering results.
+
+\textbf{Gap analysis methodology.} Gap identification relies on a
+single-shot LLM analysis of compressed landscape statistics, not
+systematic comparison against a reference taxonomy. A rigorous
+approach would compare the corpus against an explicit reference
+architecture such as NIST AI RMF~\citep{nist-ai-rmf}, the FIPA agent
+platform model, or a purpose-built agent ecosystem reference model.
+Gap severity is assigned by Claude without defined quantitative
+thresholds.
+
+\textbf{Idea extraction consistency.} Batch extraction using Haiku
+with abstract-only input produces different results from individual
+extraction using Sonnet with full text. No precision/recall measurement
+has been performed. The extraction prompt limits output to 1--4 ideas
+per draft, potentially under-counting contributions from comprehensive
+specifications.
+
+\textbf{Organizational normalization.} Cross-organization analysis
+depends on the accuracy of a hand-curated alias table. Boundary cases
+(e.g., joint ventures, university--industry affiliations, subsidiary
+relationships) introduce judgment calls that affect concentration
+statistics.
+
+Despite these limitations, the findings are robust in their broad
+contours: the growth trajectory, the safety deficit, the protocol
+fragmentation, and the organizational concentration are visible
+across multiple analytical methods and are not sensitive to the
+specific threshold or model choices within reasonable ranges.
+
+\subsection{Reproducibility and Openness}
+
+The complete pipeline, database, and derived reports are released as
+open-source software (the IETF Draft Analyzer). The SQLite database
+contains all raw data, ratings, embeddings, ideas, gaps, author
+records, and cached LLM responses, enabling independent verification
+of every finding reported in this paper. The caching mechanism ensures
+that re-running the pipeline produces identical results without
+additional API cost.
+
+
+% =========================================================================
+\section{Conclusion}
+\label{sec:conclusion}
+% =========================================================================
+
+We have presented a systematic, LLM-assisted analysis of the IETF's
+AI-agent standardization landscape, covering 475 Internet-Drafts from
+713 authors across more than 230 organizations. The analysis reveals a
+standards ecosystem experiencing unprecedented growth---from 0.5\% to
+9.3\% of all IETF submissions in fifteen months---accompanied by
+significant structural challenges.
+
+The capability-to-safety ratio of approximately 4:1, the extreme
+protocol fragmentation (14 competing OAuth proposals, 155 A2A drafts
+with no interoperability layer), and the concentration of authorship
+(one vendor contributing $\sim$16\% of all drafts) are findings that
+have direct implications for the trajectory of AI-agent
+standardization. The 11 identified gaps, with three critical gaps
+centered on what happens when agents fail, highlight the areas where
+standardization effort is most urgently needed.
+
+At the same time, the 132 cross-organization convergent ideas
+demonstrate that latent consensus exists beneath the fragmentation.
+Organizations agree on the problems; they disagree on the solutions.
+This gap between problem consensus and solution divergence defines the
+current phase of the standards race and points toward the needed
+intervention: not more protocol proposals, but architectural
+connective tissue that composes the existing high-quality components
+into a coherent ecosystem.
+
+The methodology itself contributes a replicable, cost-effective
+approach to standards landscape analysis. At \$9--15 total, the
+pipeline demonstrates that LLM-assisted document analysis at scale is
+practical for research and policy applications. The explicit
+documentation of limitations---no human calibration, empirical
+thresholds, single-judge ratings---provides a template for the
+responsible use of LLM-as-judge methodologies in technical document
+analysis.
+
+The IETF has navigated standardization sprints before, and the lasting
+standards have consistently emerged from efforts that prioritized
+interoperability and safety alongside capability. Whether the current
+AI-agent wave follows this historical pattern depends on whether the
+community can shift from parallel invention to coordinated
+architecture before the capability work ships without the safety work
+that should accompany it.
+
+
+% =========================================================================
+% References
+% =========================================================================
+\bibliographystyle{plainnat}
+\bibliography{ietf-refs}
+
+\end{document}