feat: add IETF landscape paper source (LaTeX + BibTeX + Makefile)
New LaTeX paper analyzing the AI-agent standardization landscape across IETF Internet-Drafts. Includes bibliography, updated Makefile for pdflatex+bibtex build, and gitignore entries for build artifacts.
This commit is contained in:
@@ -1,16 +1,26 @@
|
||||
# Paper build targets
|
||||
TEX = pdflatex
|
||||
BIB = bibtex
|
||||
MAIN = ietf-landscape
|
||||
SOURCES = $(MAIN).tex ietf-refs.bib
|
||||
|
||||
.PHONY: all figures pdf clean
|
||||
.PHONY: all clean watch
|
||||
|
||||
all: figures pdf
|
||||
all: $(MAIN).pdf
|
||||
|
||||
figures:
|
||||
python3 export_figures.py
|
||||
|
||||
pdf: figures
|
||||
pdflatex -interaction=nonstopmode main.tex
|
||||
pdflatex -interaction=nonstopmode main.tex # second pass for references
|
||||
$(MAIN).pdf: $(SOURCES)
|
||||
$(TEX) $(MAIN)
|
||||
$(BIB) $(MAIN)
|
||||
$(TEX) $(MAIN)
|
||||
$(TEX) $(MAIN)
|
||||
|
||||
clean:
|
||||
rm -f main.aux main.log main.out main.bbl main.blg main.pdf
|
||||
rm -rf figures/
|
||||
rm -f $(MAIN).aux $(MAIN).bbl $(MAIN).blg $(MAIN).log \
|
||||
$(MAIN).out $(MAIN).pdf $(MAIN).toc $(MAIN).fls \
|
||||
$(MAIN).fdb_latexmk $(MAIN).synctex.gz
|
||||
|
||||
watch:
|
||||
@echo "Rebuilding on change..."
|
||||
@while true; do \
|
||||
inotifywait -q -e modify $(SOURCES) 2>/dev/null || sleep 2; \
|
||||
$(MAKE) all; \
|
||||
done
|
||||
|
||||
899
paper/ietf-landscape.tex
Normal file
899
paper/ietf-landscape.tex
Normal file
@@ -0,0 +1,899 @@
|
||||
\documentclass[11pt,a4paper]{article}
|
||||
|
||||
\usepackage[utf8]{inputenc}
|
||||
\usepackage[T1]{fontenc}
|
||||
\usepackage{lmodern}
|
||||
\usepackage[margin=2.5cm]{geometry}
|
||||
\usepackage{amsmath,amssymb}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{booktabs}
|
||||
\usepackage{tabularx}
|
||||
\usepackage{hyperref}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{natbib}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{float}
|
||||
\usepackage{caption}
|
||||
|
||||
\hypersetup{
|
||||
colorlinks=true,
|
||||
linkcolor=blue!60!black,
|
||||
citecolor=green!50!black,
|
||||
urlcolor=blue!70!black,
|
||||
}
|
||||
|
||||
\setlength{\parskip}{0.4em}
|
||||
\setlength{\parindent}{0em}
|
||||
|
||||
\title{%
|
||||
Mapping the AI-Agent Standardization Landscape:\\
|
||||
An LLM-Assisted Analysis of IETF Internet-Drafts%
|
||||
}
|
||||
\author{
|
||||
Christian Nennemann\\
|
||||
Independent Researcher\\
|
||||
\texttt{write@nennemann.de}
|
||||
}
|
||||
\date{April 2026}
|
||||
|
||||
\begin{document}
|
||||
\maketitle
|
||||
|
||||
\begin{abstract}
|
||||
The Internet Engineering Task Force (IETF) is experiencing an unprecedented
|
||||
surge in standardization activity around AI agents. Between January~2024 and
|
||||
March~2026, AI- and agent-related Internet-Drafts grew from 0.5\% to 9.3\%
|
||||
of all IETF submissions. We present a systematic, LLM-assisted analysis of
|
||||
this landscape, covering 475 drafts from 713 authors across more than 230
|
||||
organizations. Our pipeline combines keyword-based corpus construction from
|
||||
the IETF Datatracker API, multi-dimensional quality rating via Claude
|
||||
(Anthropic) as an LLM-as-judge, semantic embedding and clustering via a
|
||||
local embedding model (nomic-embed-text), LLM-based extraction of 501
|
||||
discrete technical ideas, and gap analysis against the assembled corpus.
|
||||
Key findings include: (1)~a persistent capability-to-safety deficit, with
|
||||
roughly four capability-building drafts for every safety-oriented one;
|
||||
(2)~extreme protocol fragmentation, including 14~competing OAuth-for-agents
|
||||
proposals and 155~agent-to-agent protocol drafts with no interoperability
|
||||
layer; (3)~high organizational concentration, with a single vendor
|
||||
contributing approximately 16\% of all drafts; (4)~132 cross-organization
|
||||
convergent ideas independently proposed by multiple organizations, signaling
|
||||
latent consensus beneath the fragmentation; and (5)~11 identified
|
||||
standardization gaps, three rated critical, centered on behavioral
|
||||
verification, capability degradation detection, and emergency override
|
||||
protocols. The total analysis cost approximately \$9--15\,USD in API fees.
|
||||
We discuss implications for AI-agent standardization strategy, the
|
||||
limitations of LLM-as-judge methodologies applied to technical document
|
||||
corpora, and organizational dynamics shaping the standards landscape.
|
||||
\end{abstract}
|
||||
|
||||
\textbf{Keywords:} IETF, Internet-Drafts, AI agents, standardization,
|
||||
LLM-as-judge, landscape analysis, multi-agent systems, protocol
|
||||
fragmentation
|
||||
|
||||
|
||||
% =========================================================================
|
||||
\section{Introduction}
|
||||
\label{sec:intro}
|
||||
% =========================================================================
|
||||
|
||||
The deployment of autonomous AI agents---software systems that perceive
|
||||
their environment, make decisions, and take actions with limited human
|
||||
supervision---has accelerated dramatically since 2023. Commercial
|
||||
offerings from Anthropic, Google, OpenAI, and others have moved AI agents
|
||||
from research prototypes to production systems that browse the web,
|
||||
execute code, manage cloud infrastructure, and interact with external
|
||||
services on behalf of users. This proliferation raises fundamental
|
||||
questions about identity, authentication, delegation, safety, and
|
||||
interoperability that fall squarely within the purview of Internet
|
||||
standards bodies.
|
||||
|
||||
The IETF, responsible for the core protocols of the Internet, has
|
||||
responded with an extraordinary burst of activity. In 2024, just 9
|
||||
AI- or agent-related Internet-Drafts were submitted---0.5\% of all
|
||||
submissions. By the first quarter of 2026, that figure reached 9.3\%:
|
||||
nearly one in ten new drafts addressed AI agents in some capacity.
|
||||
Monthly submissions surged from 5 in June~2025 to 85 in February~2026,
|
||||
a growth rate without precedent in the IETF's recent history.
|
||||
|
||||
This rapid expansion creates an analytical challenge. The volume of
|
||||
drafts, the diversity of working groups involved, the overlapping scope
|
||||
of competing proposals, and the speed of new submissions make manual
|
||||
tracking infeasible. A standards participant seeking to understand the
|
||||
landscape---which problems are being addressed, which are being
|
||||
neglected, where proposals converge and where they conflict---faces a
|
||||
corpus of hundreds of technical documents evolving on a weekly basis.
|
||||
|
||||
We address this challenge with an LLM-assisted analysis pipeline that
|
||||
automates the collection, rating, clustering, idea extraction, and gap
|
||||
identification for the full corpus of AI-agent-related IETF
|
||||
Internet-Drafts. The pipeline combines three complementary analytical
|
||||
approaches: (1)~LLM-as-judge rating of drafts on five quality
|
||||
dimensions, using Claude (Anthropic) with structured prompts;
|
||||
(2)~embedding-based semantic similarity and clustering, using a locally
|
||||
hosted nomic-embed-text model via Ollama; and (3)~LLM-based extraction
|
||||
of discrete technical ideas and identification of landscape gaps.
|
||||
|
||||
Our contributions are:
|
||||
|
||||
\begin{itemize}[nosep]
|
||||
\item A comprehensive, quantitative map of the IETF's AI-agent
|
||||
standardization landscape as of March~2026, covering 475 drafts,
|
||||
713 authors, 501 extracted technical ideas, and 11 identified gaps.
|
||||
\item A replicable, cost-effective methodology for LLM-assisted
|
||||
standards corpus analysis (\$9--15 total), with explicit
|
||||
documentation of limitations and methodological caveats.
|
||||
\item Empirical findings on organizational concentration,
|
||||
protocol fragmentation, cross-organization convergence, and
|
||||
the capability-to-safety imbalance in the current landscape.
|
||||
\item An open-source tool (the IETF Draft Analyzer) that makes the
|
||||
pipeline, database, and all derived reports available for
|
||||
independent verification and extension.
|
||||
\end{itemize}
|
||||
|
||||
The remainder of this paper is organized as follows.
|
||||
Section~\ref{sec:related} reviews related work on standards landscape
|
||||
analysis, NLP for technical documents, and technology mapping.
|
||||
Section~\ref{sec:method} describes the data collection and analysis
|
||||
pipeline in detail. Section~\ref{sec:results} presents our findings
|
||||
across five analytical dimensions. Section~\ref{sec:discussion}
|
||||
discusses implications, limitations, and organizational dynamics.
|
||||
Section~\ref{sec:conclusion} concludes.
|
||||
|
||||
|
||||
% =========================================================================
|
||||
\section{Related Work}
|
||||
\label{sec:related}
|
||||
% =========================================================================
|
||||
|
||||
Our work sits at the intersection of three research areas: standards
|
||||
ecosystem analysis, NLP applied to technical document corpora, and
|
||||
technology landscape mapping.
|
||||
|
||||
\subsection{Standards Analysis}
|
||||
|
||||
The economics and dynamics of technical standardization have been
|
||||
studied extensively. \citet{simcoe2012} analyzes consensus governance
|
||||
in standard-setting committees, showing how committee structure
|
||||
influences the trajectory of shared technology platforms.
|
||||
\citet{blind2017} examine the impact of standards and regulation on
|
||||
innovation in uncertain markets, a framing directly applicable to the
|
||||
nascent AI-agent ecosystem where both the technology and the regulatory
|
||||
environment are in flux. \citet{lerner2014} study standard-essential
|
||||
patents, a concern that is beginning to surface in the AI-agent space
|
||||
as organizations file IPR declarations on agent-related protocols.
|
||||
|
||||
Prior quantitative analyses of IETF activity have typically focused on
|
||||
participation patterns, working group dynamics, or the trajectory of
|
||||
individual RFCs through the standards process. Our work differs in
|
||||
scope: rather than analyzing the IETF as an institution, we analyze a
|
||||
specific cross-cutting topic (AI agents) that spans multiple working
|
||||
groups and is evolving too rapidly for traditional manual survey methods.
|
||||
|
||||
\subsection{NLP for Technical Documents}
|
||||
|
||||
The application of natural language processing to technical and legal
|
||||
document corpora has expanded significantly with the advent of large
|
||||
language models. \citet{devlin2019} introduced BERT-based approaches
|
||||
that enabled transfer learning for domain-specific text
|
||||
classification. More recently, \citet{brown2020} demonstrated that
|
||||
large language models exhibit strong few-shot and zero-shot performance
|
||||
on diverse text understanding tasks, opening the possibility of using
|
||||
LLMs as automated annotators for technical documents.
|
||||
|
||||
The ``LLM-as-judge'' paradigm---using language models to evaluate or
|
||||
rate text artifacts---has been systematically studied by
|
||||
\citet{zheng2023}, who introduced MT-Bench and Chatbot Arena to
|
||||
evaluate LLM judges against human preferences. Their work establishes
|
||||
both the promise (high correlation with human judgment on structured
|
||||
evaluation tasks) and the limitations (position bias, verbosity bias,
|
||||
self-enhancement bias) of LLM-based evaluation. Our use of Claude as a
|
||||
rater for IETF drafts follows this paradigm, with the specific
|
||||
limitation that no human calibration study has been performed on our
|
||||
rating outputs (see Section~\ref{sec:limitations}).
|
||||
|
||||
Embedding-based document similarity using models such as
|
||||
Sentence-BERT~\citep{nussbaumer2024} and its successors has become
|
||||
standard practice for document clustering and retrieval. We use
|
||||
nomic-embed-text~\citep{nomic2024}, a general-purpose text embedding
|
||||
model, for computing pairwise cosine similarity across the draft corpus.
|
||||
The resulting similarity matrix enables both cluster detection and
|
||||
visualization via t-SNE~\citep{vandermaaten2008}.
|
||||
|
||||
\subsection{Technology Landscape Surveys}
|
||||
|
||||
Technology landscape mapping---the systematic identification and
|
||||
organization of technical activities within a domain---has a long
|
||||
history in foresight and innovation studies.
|
||||
\citet{porter2005} introduced ``tech mining'' as a methodology for
|
||||
extracting competitive intelligence from patent and publication
|
||||
databases. \citet{roper2011} extended these methods to broader
|
||||
technology management contexts. Our work adapts these approaches to
|
||||
the standards domain, replacing patent databases with the IETF
|
||||
Datatracker and augmenting keyword-based search with LLM-driven
|
||||
semantic analysis.
|
||||
|
||||
The AI agent research community has produced several recent surveys.
|
||||
\citet{wang2024} and \citet{xi2023} survey the rapidly growing
|
||||
literature on LLM-based autonomous agents, covering architectures,
|
||||
capabilities, and evaluation. These academic surveys focus on
|
||||
research contributions; our work complements them by mapping the
|
||||
parallel standardization effort, where research ideas meet the
|
||||
engineering constraints of Internet protocol design.
|
||||
|
||||
The multi-agent systems (MAS) research tradition, surveyed
|
||||
comprehensively by \citet{wooldridge2009} and \citet{dorri2018},
|
||||
provides historical context. The FIPA Agent Communication
|
||||
Language~\citep{fipa-acl} and Agent Management
|
||||
Specification~\citep{fipa-ams}, developed between 1996 and 2005,
|
||||
addressed many of the same problems---agent discovery, communication
|
||||
protocols, platform interoperability---that the current IETF drafts
|
||||
tackle. The near-complete absence of FIPA references in the
|
||||
contemporary IETF corpus suggests limited awareness of this prior art,
|
||||
a finding we quantify in Section~\ref{sec:results}.
|
||||
|
||||
|
||||
% =========================================================================
|
||||
\section{Methodology}
|
||||
\label{sec:method}
|
||||
% =========================================================================
|
||||
|
||||
The analysis pipeline consists of six sequential stages, each building
|
||||
on the output of the previous. All intermediate results are stored in
|
||||
a SQLite database (28\,MB) with FTS5 full-text search, enabling both
|
||||
pipeline idempotency and ad-hoc querying. The complete pipeline is
|
||||
implemented as a Python CLI tool (approximately 6,100 lines across 12
|
||||
modules) using Click, httpx, the Anthropic SDK, and Ollama.
|
||||
|
||||
\subsection{Data Collection}
|
||||
\label{sec:datacollection}
|
||||
|
||||
\subsubsection{Corpus Construction}
|
||||
|
||||
Drafts were retrieved from the IETF Datatracker
|
||||
API\footnote{\url{https://datatracker.ietf.org/api/v1/doc/document/}}
|
||||
using keyword search across both draft names
|
||||
(\texttt{name\_\_contains}) and abstracts
|
||||
(\texttt{abstract\_\_contains}). Twelve search terms were used:
|
||||
\textit{agent}, \textit{ai-agent}, \textit{agentic},
|
||||
\textit{autonomous}, \textit{mcp}, \textit{inference},
|
||||
\textit{generative}, \textit{intelligent}, \textit{large language
|
||||
model}, \textit{multi-agent}, and \textit{trustworth}.
|
||||
Only drafts with \texttt{type\_\_slug=draft} and submission date
|
||||
$\geq$~2024-01-01 were included. Full text was downloaded from the
|
||||
IETF archive.\footnote{\url{https://www.ietf.org/archive/id/}}
|
||||
|
||||
The keyword set was expanded iteratively. An initial set of 6 keywords
|
||||
yielded 260 drafts; adding 6 further terms captured 174 additional
|
||||
drafts in categories initially underrepresented, including MCP-related
|
||||
work, generative AI infrastructure, and the nascent \texttt{aipref}
|
||||
working group. A polite delay of 0.5\,seconds was applied between API
|
||||
requests.
|
||||
|
||||
The resulting corpus contains 475 drafts. After false-positive
|
||||
filtering (removing drafts about ``user agents,'' ``autonomous
|
||||
systems'' in routing, and other non-AI uses of matched keywords), 361
|
||||
drafts were retained as AI/agent-relevant based on a relevance
|
||||
rating threshold.
|
||||
|
||||
\subsubsection{Supplementary Standards Bodies}
|
||||
|
||||
To contextualize the IETF landscape, we ingested a supplementary
|
||||
corpus of standards and specifications from five additional bodies:
|
||||
ISO/IEC (including ISO~22989~\citep{iso22989} and
|
||||
ISO~42001~\citep{iso42001}), ITU-T (including
|
||||
Y.3172~\citep{itu-y3172}), ETSI (ENI, ZSM), W3C (Web of Things,
|
||||
Verifiable Credentials, WebNN), and NIST (AI RMF~\citep{nist-ai-rmf}).
|
||||
These documents were included in the gap analysis (Section~\ref{sec:gaps})
|
||||
to identify areas where non-IETF bodies provide coverage that the IETF
|
||||
corpus lacks, and vice versa.
|
||||
|
||||
\subsubsection{Author and Affiliation Data}
|
||||
|
||||
Author records were fetched from the Datatracker's
|
||||
\texttt{documentauthor} and \texttt{person} endpoints. Organizational
|
||||
affiliations were normalized using a hand-curated alias table of 40+
|
||||
mappings (e.g., ``Huawei Technologies Co., Ltd.''
|
||||
$\rightarrow$~``Huawei'') supplemented by automatic suffix stripping
|
||||
for common corporate suffixes.
|
||||
|
||||
\subsection{LLM-Based Analysis}
|
||||
\label{sec:llm-analysis}
|
||||
|
||||
\subsubsection{Multi-Dimensional Rating}
|
||||
|
||||
Each draft was rated by Claude (Anthropic; Sonnet model) on five
|
||||
dimensions using a structured prompt containing the draft's name,
|
||||
title, submission date, page count, and abstract (truncated to 2,000
|
||||
characters). The five rating dimensions are:
|
||||
|
||||
\begin{itemize}[nosep]
|
||||
\item \textbf{Novelty} (1--5): Originality relative to existing
|
||||
standards and proposals.
|
||||
\item \textbf{Maturity} (1--5): Completeness of the technical
|
||||
specification.
|
||||
\item \textbf{Overlap} (1--5): Redundancy with other known drafts
|
||||
(5 indicates near-duplication).
|
||||
\item \textbf{Momentum} (1--5): Community engagement, revisions,
|
||||
and working group adoption signals.
|
||||
\item \textbf{Relevance} (1--5): Importance to the AI/agent
|
||||
ecosystem specifically.
|
||||
\end{itemize}
|
||||
|
||||
The prompt instructs Claude to return structured JSON with integer
|
||||
scores and brief justification notes for each dimension, plus a 2--3
|
||||
sentence summary and one or more category labels drawn from a
|
||||
predefined taxonomy of 11 categories (Table~\ref{tab:categories}).
|
||||
A composite quality score is computed as the arithmetic mean of
|
||||
novelty, maturity, momentum, and relevance (excluding overlap, which
|
||||
measures redundancy rather than quality).
|
||||
|
||||
To reduce API costs, drafts were rated in batches of five using a
|
||||
batch prompt variant. Each draft's abstract was truncated to 1,500
|
||||
characters in batch mode. All API responses were cached in an
|
||||
\texttt{llm\_cache} table keyed by SHA-256 hash of the full prompt,
|
||||
making the pipeline idempotent on re-runs.
|
||||
|
||||
\subsubsection{Idea Extraction}
|
||||
|
||||
Discrete technical ideas---mechanisms, protocols, architectural
|
||||
patterns, extensions, and requirements---were extracted from each
|
||||
draft using Claude. For individual extraction, the prompt included
|
||||
the abstract and the first 3,000 characters of full text (Sonnet
|
||||
model). For batch extraction, groups of five drafts were processed
|
||||
per API call using the cheaper Haiku model with abstracts truncated
|
||||
to 800 characters. The prompt requested 1--4 top-level novel
|
||||
contributions per draft, with explicit instructions to merge
|
||||
sub-features into parent ideas and to return an empty array for
|
||||
drafts lacking substantive technical content.
|
||||
|
||||
Extracted ideas were deduplicated within each draft using
|
||||
embedding-based cosine similarity (threshold~0.85), removing ideas
|
||||
that were restatements of the same concept. Cross-draft idea overlap
|
||||
was analyzed using Python's \texttt{SequenceMatcher} with a fuzzy
|
||||
matching threshold of~0.75 on idea titles, enabling detection of
|
||||
convergent ideas across organizational boundaries.
|
||||
|
||||
\subsubsection{Gap Analysis}
|
||||
|
||||
A single Claude Sonnet call received a compressed landscape summary
|
||||
containing category distribution counts, the 20 most frequently
|
||||
occurring idea titles, overlap cluster statistics, and summaries of
|
||||
relevant non-IETF standards. The prompt instructed the model to
|
||||
identify 8--15 standardization gaps---areas, problems, or technical
|
||||
challenges not adequately addressed by the existing corpus---with
|
||||
structured output including topic, description, severity rating
|
||||
(critical/high/medium/low), evidence, and partial coverage from
|
||||
existing standards.
|
||||
|
||||
\subsection{Embedding and Clustering}
|
||||
\label{sec:embedding}
|
||||
|
||||
Vector embeddings were generated locally using Ollama with the
|
||||
nomic-embed-text model~\citep{nomic2024}. For each draft, the input
|
||||
combined the title, abstract, and first 4,000 characters of full text
|
||||
(when available), producing a 768-dimensional vector stored as a
|
||||
binary blob in SQLite.
|
||||
|
||||
Pairwise cosine similarity was computed across all embedded drafts,
|
||||
producing an $n \times n$ similarity matrix (cached to disk as a
|
||||
NumPy array). Clustering used a greedy single-linkage algorithm: for
|
||||
each unvisited draft, all unvisited drafts with cosine similarity
|
||||
$\geq \tau$ to the seed were added to its cluster. Three empirically
|
||||
determined thresholds were applied:
|
||||
|
||||
\begin{itemize}[nosep]
|
||||
\item $\tau = 0.85$: Topically overlapping drafts (42 clusters).
|
||||
\item $\tau = 0.90$: Near-duplicates or same-author variants (34
|
||||
clusters).
|
||||
\item $\tau = 0.98$: Functionally identical drafts (25+ pairs).
|
||||
\end{itemize}
|
||||
|
||||
These thresholds were selected by manual inspection of draft pairs at
|
||||
each level; no systematic sensitivity analysis was performed (see
|
||||
Section~\ref{sec:limitations}).
|
||||
|
||||
\subsection{Supplementary Analyses}
|
||||
|
||||
Three additional analysis passes operate on the stored data with zero
|
||||
API cost:
|
||||
|
||||
\begin{enumerate}[nosep]
|
||||
\item \textbf{RFC cross-references}: Regex-based extraction of
|
||||
RFC, BCP, and draft citations from full text, yielding 4,231
|
||||
cross-references across 360 drafts.
|
||||
\item \textbf{Category trends}: SQL-based monthly breakdown of new
|
||||
drafts per category with growth rates.
|
||||
\item \textbf{Co-authorship network}: Team bloc detection via
|
||||
pairwise author overlap ($\geq$70\% shared drafts, $\geq$2 shared
|
||||
drafts), with connected components forming blocs.
|
||||
\end{enumerate}
|
||||
|
||||
\subsection{Cost}
|
||||
|
||||
Table~\ref{tab:cost} summarizes the total pipeline cost for 475 drafts.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\caption{Pipeline cost breakdown.}
|
||||
\label{tab:cost}
|
||||
\begin{tabular}{llrr}
|
||||
\toprule
|
||||
\textbf{Stage} & \textbf{Model} & \textbf{Items} & \textbf{Cost (USD)} \\
|
||||
\midrule
|
||||
Rating & Claude Sonnet & 475 drafts & \$5.50--8.00 \\
|
||||
Idea extract. & Claude Haiku & 475 drafts & \$0.80 \\
|
||||
Gap analysis & Claude Sonnet & 1 call & \$0.20 \\
|
||||
Embeddings & Ollama (local) & 475 drafts & \$0.00 \\
|
||||
RFC refs & Regex (local) & 475 drafts & \$0.00 \\
|
||||
Trends & SQL (local) & 475 drafts & \$0.00 \\
|
||||
Idea overlap & SequenceMatcher & 501 ideas & \$0.00 \\
|
||||
\midrule
|
||||
\textbf{Total} & & & \textbf{\$6.50--9.00} \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
|
||||
% =========================================================================
|
||||
\section{Results}
|
||||
\label{sec:results}
|
||||
% =========================================================================
|
||||
|
||||
\subsection{Corpus Overview and Growth Trajectory}
|
||||
|
||||
The final corpus comprises 475 Internet-Drafts submitted between
|
||||
January~2024 and March~2026. After false-positive filtering (drafts
|
||||
with relevance score $\leq$~2 or manually flagged), 361 drafts were
|
||||
retained as substantively related to AI agents.
|
||||
|
||||
The growth trajectory is striking. In 2024, 9 AI/agent drafts were
|
||||
submitted (0.5\% of 1,651 total IETF drafts). In 2025, 190 were
|
||||
submitted (7.0\% of 2,696). In Q1~2026 alone, 162 were submitted
|
||||
(9.3\% of 1,748). Monthly submissions followed a step function:
|
||||
5~drafts in June~2025, 61 in October~2025, 85 in February~2026.
|
||||
The acceleration has not plateaued as of March~2026.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\caption{Growth of AI/agent-related IETF Internet-Drafts.}
|
||||
\label{tab:growth}
|
||||
\begin{tabular}{rrrr}
|
||||
\toprule
|
||||
\textbf{Year} & \textbf{Total IETF} & \textbf{AI/Agent} & \textbf{Share (\%)} \\
|
||||
\midrule
|
||||
2024 & 1,651 & 9 & 0.5 \\
|
||||
2025 & 2,696 & 190 & 7.0 \\
|
||||
2026 (Q1) & 1,748 & 162 & 9.3 \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
\subsection{Thematic Distribution}
|
||||
\label{sec:categories}
|
||||
|
||||
Drafts were classified into 11 non-exclusive categories
|
||||
(Table~\ref{tab:categories}). A single draft may belong to multiple
|
||||
categories; percentages therefore exceed 100\%.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\caption{Category distribution across 475 drafts. Drafts may appear in
|
||||
multiple categories.}
|
||||
\label{tab:categories}
|
||||
\begin{tabular}{lrr}
|
||||
\toprule
|
||||
\textbf{Category} & \textbf{Drafts} & \textbf{Share (\%)} \\
|
||||
\midrule
|
||||
Data formats / interoperability & 214 & 45 \\
|
||||
Policy / governance & 214 & 45 \\
|
||||
Agent identity / authentication & 160 & 34 \\
|
||||
A2A protocols & 157 & 33 \\
|
||||
Autonomous network operations & 124 & 26 \\
|
||||
Agent discovery / registration & 89 & 19 \\
|
||||
ML traffic management & 79 & 17 \\
|
||||
Human--agent interaction & 57 & 12 \\
|
||||
AI safety / alignment & 112 & 24 \\
|
||||
Model serving / inference & 42 & 9 \\
|
||||
Other AI/agent & -- & -- \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
The dominance of infrastructure categories---data formats, identity,
|
||||
communication protocols---is expected for an early-stage standards
|
||||
effort. The comparatively low representation of safety/alignment and
|
||||
human--agent interaction categories is a structural finding we examine
|
||||
in Section~\ref{sec:safety-deficit}.
|
||||
|
||||
\subsection{The Capability-to-Safety Deficit}
|
||||
\label{sec:safety-deficit}
|
||||
|
||||
The ratio of capability-building drafts (A2A protocols, autonomous
|
||||
network operations, agent discovery, model serving) to safety-oriented
|
||||
drafts (AI safety/alignment, human--agent interaction) is
|
||||
approximately 4:1 on aggregate. This ratio varies significantly by
|
||||
month, ranging from 1.5:1 in months with concentrated safety
|
||||
submissions to over 20:1 in months dominated by protocol proposals.
|
||||
|
||||
The drafts that do address safety are among the highest-rated in the
|
||||
corpus. The Verifiable Observation Logging for Transparency
|
||||
(VOLT)~\citep{draft-cowles-volt} protocol scored 4.75/5.0 on the
|
||||
four-dimension composite (excluding overlap), as did the Distributed
|
||||
AI Accountability Protocol (DAAP)~\citep{draft-aylward-daap}. The
|
||||
STAMP protocol~\citep{draft-guy-bary-stamp} for cryptographic
|
||||
delegation and proof scored 4.5. The quality of safety-focused work
|
||||
is high; the quantity is not.
|
||||
|
||||
An analysis of RFC cross-references reinforces this finding. Across
|
||||
4,231 parsed citations, the most-referenced standards after the
|
||||
boilerplate RFC~2119/8174 conventions are TLS~1.3~\citep{rfc8446}
|
||||
(42 citations), OAuth~2.0~\citep{rfc6749} (36), HTTP
|
||||
Semantics~\citep{rfc9110} (34), and JWT~\citep{rfc7519} (22). The
|
||||
agent standards ecosystem is being constructed on the web's existing
|
||||
security infrastructure---OAuth, TLS, HTTP, JWT---yet the safety
|
||||
layer that should accompany this security foundation remains
|
||||
underdeveloped.
|
||||
|
||||
\subsection{Protocol Fragmentation}
|
||||
\label{sec:fragmentation}
|
||||
|
||||
Embedding-based similarity analysis reveals extensive duplication and
|
||||
fragmentation across the corpus.
|
||||
|
||||
\subsubsection{Near-Duplicates}
|
||||
|
||||
At the 0.98 cosine similarity threshold, 25+ draft pairs are
|
||||
functionally identical---the same proposal submitted under different
|
||||
names, to different working groups, or as renamed revisions. A
|
||||
taxonomy of near-duplicates includes: same draft submitted to
|
||||
different working groups (14 pairs), renamed drafts (5), evolutionary
|
||||
versions (3), and genuinely competing proposals from different
|
||||
organizations (2+).
|
||||
|
||||
\subsubsection{Competing Clusters}
|
||||
|
||||
At the 0.85 threshold, 42 topical clusters emerge. The most crowded
|
||||
is OAuth for AI agents, with 14 distinct proposals all addressing
|
||||
how AI agents authenticate and receive authorization via the OAuth
|
||||
framework. These range from broad profile proposals to narrow scope
|
||||
extensions to comprehensive accountability systems. None are
|
||||
interoperable.
|
||||
|
||||
The A2A protocol space encompasses 157 drafts with no
|
||||
interoperability layer. The most common technical idea in the entire
|
||||
extracted corpus---``Multi-Agent Communication Protocol''---appears
|
||||
independently in 8 drafts from different teams. A 10-draft cluster
|
||||
addresses agent gateway and multi-agent collaboration, with
|
||||
approaches ranging from semantic routing gateways to cross-domain
|
||||
interoperability frameworks.
|
||||
|
||||
\subsubsection{Causes of Fragmentation}
|
||||
|
||||
The data distinguishes three causes: (1)~working group shopping, where
|
||||
authors submit the same draft to multiple working groups seeking
|
||||
adoption; (2)~parallel invention, where isolated teams independently
|
||||
solve the same problem; and (3)~strategic surface-area expansion,
|
||||
where organizations submit multiple related drafts to maximize
|
||||
presence in the standards landscape.
|
||||
|
||||
\subsection{Organizational Dynamics}
|
||||
\label{sec:orgs}
|
||||
|
||||
\subsubsection{Concentration}
|
||||
|
||||
Authorship is heavily concentrated. Huawei leads with 53 authors
|
||||
contributing to 69 drafts---approximately 16\% of the entire corpus
|
||||
across all Huawei entities. China Mobile (24~authors, 35~drafts),
|
||||
Cisco (24~authors, 26~drafts), and China Telecom (24~authors,
|
||||
24~drafts) follow. Chinese-linked institutions (Huawei, China
|
||||
Mobile, China Telecom, China Unicom, Tsinghua University, ZTE, BUPT,
|
||||
and associated laboratories) collectively account for over 160
|
||||
authors.
|
||||
|
||||
Western technology companies are dramatically underrepresented
|
||||
relative to their market positions. Google is present with 5 authors
|
||||
on 9 drafts. Microsoft, Apple, and Meta have minimal direct
|
||||
participation. Amazon's 6 authors focus on post-quantum cryptography
|
||||
rather than agent-specific work.
|
||||
|
||||
\subsubsection{Team Blocs}
|
||||
|
||||
Co-authorship analysis identifies 18 team blocs among the 713 authors,
|
||||
covering approximately 25\% of all authors. The largest bloc is a
|
||||
13-person Huawei team sharing 22 drafts with 94\% average cohesion
|
||||
(measured as pairwise overlap of draft portfolios). The team's core
|
||||
of 7 members each appear on 13--23 drafts.
|
||||
|
||||
Cross-organizational collaboration is sparse. The most productive
|
||||
cross-team pair shares only 3 drafts. Chinese organizations form a
|
||||
tightly linked ecosystem: Huawei--China Unicom shares 6 drafts,
|
||||
Tsinghua--Zhongguancun Lab shares 5, China Mobile--ZTE shares 4.
|
||||
European telecoms (Deutsche Telekom, Telef\'onica, Orange) act as
|
||||
bridges between Chinese and Western institutions.
|
||||
|
||||
\subsection{Cross-Organization Convergence}
|
||||
\label{sec:convergence}
|
||||
|
||||
Despite the fragmentation, significant latent consensus exists. Using
|
||||
fuzzy title matching (\texttt{SequenceMatcher} at 0.75 threshold) on
|
||||
the 501 extracted ideas, 132 ideas (approximately 33\% of unique idea
|
||||
clusters) have been independently proposed by two or more organizations.
|
||||
|
||||
The strongest convergence signals include ``A2A Communication
|
||||
Paradigm'' (proposed by 8 organizations from 5 countries),
|
||||
``AI Agent Network Architecture'' (8 organizations), and
|
||||
``Multi-Agent Communication Protocol'' (7 organizations). An
|
||||
examination of organizational pairs reveals that 180 convergent ideas
|
||||
cross the boundary between Chinese-linked and Western organizations,
|
||||
indicating genuine cross-cultural consensus on technical directions
|
||||
despite the sparse direct collaboration noted in
|
||||
Section~\ref{sec:orgs}.
|
||||
|
||||
The coexistence of convergence and fragmentation has a specific
|
||||
structure: organizations agree on \textit{what} needs building (the
|
||||
convergent ideas) but disagree on \textit{how} to build it (the
|
||||
competing protocol proposals). This gap between problem consensus and
|
||||
solution divergence is where architectural coordination is most needed.
|
||||
|
||||
\subsection{Gap Analysis}
|
||||
\label{sec:gaps}
|
||||
|
||||
The gap analysis identified 11 standardization gaps, distributed across
|
||||
severity levels as shown in Table~\ref{tab:gaps}.
|
||||
|
||||
\begin{table}[H]
|
||||
\centering
|
||||
\caption{Identified standardization gaps by severity.}
|
||||
\label{tab:gaps}
|
||||
\begin{tabularx}{\textwidth}{llX}
|
||||
\toprule
|
||||
\textbf{Severity} & \textbf{Topic} & \textbf{Description} \\
|
||||
\midrule
|
||||
Critical & Agent legal liability &
|
||||
No standard addresses liability assignment when autonomous agents
|
||||
cause harm or make binding commitments across creators, operators,
|
||||
and users. \\
|
||||
Critical & Capability degradation detection &
|
||||
No standard defines detection mechanisms for gradual capability
|
||||
degradation due to concept drift, adversarial inputs, or model
|
||||
corruption. \\
|
||||
Critical & Emergency override protocols &
|
||||
No standard defines distributed emergency-stop mechanisms for
|
||||
autonomous agents exhibiting dangerous behavior across
|
||||
multi-system deployments. \\
|
||||
\midrule
|
||||
High & Cross-domain identity portability &
|
||||
Agents cannot maintain consistent identity across organizational
|
||||
domains with different identity systems. \\
|
||||
High & Real-time behavior explanation &
|
||||
No standard for interactive, real-time explanations of agent
|
||||
decision-making during operation. \\
|
||||
High & Multi-agent conflict resolution &
|
||||
No protocol for resolving conflicts when multiple agents have
|
||||
competing objectives or contend for shared resources. \\
|
||||
High & Inter-standards-body bridging &
|
||||
Protocols from IETF, ITU-T, and ISO cannot interoperate, creating
|
||||
silos across network, internet, and industrial domains. \\
|
||||
High & Behavioral audit trails &
|
||||
Missing standards for immutable, decision-level audit logs
|
||||
supporting forensic analysis and regulatory compliance. \\
|
||||
\midrule
|
||||
Medium & Resource consumption limits &
|
||||
No self-regulation standards for agent computational, network, and
|
||||
energy resource usage. \\
|
||||
Medium & Training data provenance &
|
||||
Missing standards for tracking data lineage as it flows between
|
||||
agents in federated learning scenarios. \\
|
||||
Medium & Content attribution &
|
||||
No cryptographic attribution standards for agent-generated content.\\
|
||||
\bottomrule
|
||||
\end{tabularx}
|
||||
\end{table}
|
||||
|
||||
The three critical gaps share a common theme: they address what happens
|
||||
when autonomous agents fail or misbehave. The capability-building
|
||||
majority of the corpus assumes cooperative, well-functioning agent
|
||||
systems; the critical gaps expose the absence of standards for the
|
||||
adversarial, degraded, and emergency cases that inevitably arise in
|
||||
production deployment.
|
||||
|
||||
Cross-referencing gaps with extracted ideas quantifies the coverage
|
||||
deficit. The ``emergency override'' gap has only 15 partially
|
||||
addressing ideas across the corpus. The ``multi-agent conflict
|
||||
resolution'' and ``inter-standards-body bridging'' gaps have zero
|
||||
directly related extracted ideas---they are entirely unaddressed.
|
||||
|
||||
|
||||
% =========================================================================
|
||||
\section{Discussion}
|
||||
\label{sec:discussion}
|
||||
% =========================================================================
|
||||
|
||||
\subsection{Implications for Standardization Strategy}
|
||||
|
||||
The landscape reveals a standards ecosystem in a characteristic
|
||||
early-stage pattern: rapid expansion, parallel invention, and
|
||||
insufficient coordination. The IETF has navigated such patterns
|
||||
before---the early web, IoT, DNS security---and the historical
|
||||
resolution involves convergence of competing proposals, working group
|
||||
consolidation, and the emergence of a small number of lasting
|
||||
standards from a large initial field.
|
||||
|
||||
Three strategic priorities emerge from the data:
|
||||
|
||||
\textbf{Safety-first coordination.} The 4:1 capability-to-safety
|
||||
ratio is a structural risk. The critical gaps---behavioral verification,
|
||||
capability degradation detection, emergency override---are precisely
|
||||
the areas where standardization failure has the highest real-world
|
||||
consequence. Unlike protocol fragmentation, which causes confusion and
|
||||
implementation cost, safety gaps create liability and harm. The
|
||||
EU AI Act~\citep{eu-ai-act}, which mandates real-time explainability
|
||||
and human oversight for high-risk AI systems, will make several of
|
||||
these gaps regulatory obligations rather than optional best practices.
|
||||
|
||||
\textbf{Architectural connective tissue.} The landscape needs not more
|
||||
protocols but a shared execution model. The convergence data shows that
|
||||
organizations agree on the components; they disagree on the
|
||||
integration. Proposals like VOLT~\citep{draft-cowles-volt} (execution
|
||||
traces), DAAP~\citep{draft-aylward-daap} (accountability),
|
||||
STAMP~\citep{draft-guy-bary-stamp} (cryptographic delegation), and
|
||||
Verifiable Agent Conversations~\citep{draft-birkholz-vac} (signed
|
||||
conversation records) address complementary parts of the same
|
||||
architectural problem. An overarching agent execution architecture
|
||||
that composes these components would accelerate convergence more
|
||||
effectively than continued parallel invention.
|
||||
|
||||
\textbf{Cross-organization coordination.} The team bloc structure
|
||||
produces drafts that are internally consistent but externally
|
||||
incompatible. The 18 detected blocs function as islands; the bridges
|
||||
between them are thin. Mechanisms that encourage cross-bloc
|
||||
collaboration---joint design teams, interop testing events,
|
||||
shared reference implementations---are more likely to produce lasting
|
||||
standards than the current pattern of parallel submission.
|
||||
|
||||
\subsection{Relationship to Prior Agent Standards}
|
||||
|
||||
A notable finding is the near-complete absence of references to FIPA
|
||||
(Foundation for Intelligent Physical Agents) in the contemporary IETF
|
||||
corpus. FIPA's Agent Communication Language~\citep{fipa-acl} and Agent
|
||||
Management Specification~\citep{fipa-ams}, developed between 1996 and
|
||||
2005, addressed agent discovery, communication, platform
|
||||
interoperability, and interaction protocols---the same problem space
|
||||
that the current wave of IETF drafts tackles.
|
||||
|
||||
The absence of FIPA references does not necessarily indicate ignorance;
|
||||
the web-native technical context of 2025 differs substantially from the
|
||||
Java/CORBA context of 2002. However, the recurrence of problems
|
||||
FIPA addressed (agent naming, message semantics, directory services,
|
||||
interaction protocols) suggests that explicit engagement with the
|
||||
FIPA legacy could help the IETF community avoid re-learning lessons
|
||||
from two decades ago.
|
||||
|
||||
\subsection{Limitations}
|
||||
\label{sec:limitations}
|
||||
|
||||
The methodology has several limitations that affect the confidence and
|
||||
generalizability of the findings.
|
||||
|
||||
\textbf{LLM-as-judge validity.} All quality ratings are generated by a
|
||||
single LLM (Claude Sonnet) from draft abstracts truncated to 2,000
|
||||
characters. No human calibration study has been performed; no
|
||||
inter-rater reliability is established. The ratings should be treated
|
||||
as relative rankings within this corpus, not absolute quality measures.
|
||||
Maturity scores are particularly affected by abstract-only input, as
|
||||
abstracts may not convey the full technical depth of a specification.
|
||||
The overlap dimension is limited because Claude rates each draft
|
||||
independently without access to the full corpus, meaning it reflects
|
||||
the model's general knowledge rather than corpus-specific similarity.
|
||||
A validation study using domain expert ratings on a sample of 25--30
|
||||
drafts would substantially strengthen confidence.
|
||||
|
||||
\textbf{Corpus selection bias.} Keyword-based selection introduces both
|
||||
false positives (``agent'' matching ``user agent,'' ``autonomous''
|
||||
matching ``autonomous systems'' in routing) and false negatives
|
||||
(relevant drafts using terminology outside the keyword set). We
|
||||
estimate 30--50 false positives remain despite relevance filtering.
|
||||
The temporal cutoff of January~2024 excludes earlier foundational work.
|
||||
|
||||
\textbf{Clustering thresholds.} The similarity thresholds (0.85, 0.90,
|
||||
0.98) are empirically chosen by manual inspection, not derived from
|
||||
principled analysis. The embedding model (nomic-embed-text) is a
|
||||
general-purpose model not fine-tuned for standards document similarity.
|
||||
Sensitivity analysis across thresholds and comparison with alternative
|
||||
clustering methods (DBSCAN, hierarchical agglomerative) would
|
||||
strengthen the clustering results.
|
||||
|
||||
\textbf{Gap analysis methodology.} Gap identification relies on a
|
||||
single-shot LLM analysis of compressed landscape statistics, not
|
||||
systematic comparison against a reference taxonomy. A rigorous
|
||||
approach would compare the corpus against an explicit reference
|
||||
architecture such as NIST AI RMF~\citep{nist-ai-rmf}, the FIPA agent
|
||||
platform model, or a purpose-built agent ecosystem reference model.
|
||||
Gap severity is assigned by Claude without defined quantitative
|
||||
thresholds.
|
||||
|
||||
\textbf{Idea extraction consistency.} Batch extraction using Haiku
|
||||
with abstract-only input produces different results from individual
|
||||
extraction using Sonnet with full text. No precision/recall measurement
|
||||
has been performed. The extraction prompt limits output to 1--4 ideas
|
||||
per draft, potentially under-counting contributions from comprehensive
|
||||
specifications.
|
||||
|
||||
\textbf{Organizational normalization.} Cross-organization analysis
|
||||
depends on the accuracy of a hand-curated alias table. Boundary cases
|
||||
(e.g., joint ventures, university--industry affiliations, subsidiary
|
||||
relationships) introduce judgment calls that affect concentration
|
||||
statistics.
|
||||
|
||||
Despite these limitations, the findings are robust in their broad
|
||||
contours: the growth trajectory, the safety deficit, the protocol
|
||||
fragmentation, and the organizational concentration are visible
|
||||
across multiple analytical methods and are not sensitive to the
|
||||
specific threshold or model choices within reasonable ranges.
|
||||
|
||||
\subsection{Reproducibility and Openness}
|
||||
|
||||
The complete pipeline, database, and derived reports are released as
|
||||
open-source software (the IETF Draft Analyzer). The SQLite database
|
||||
contains all raw data, ratings, embeddings, ideas, gaps, author
|
||||
records, and cached LLM responses, enabling independent verification
|
||||
of every finding reported in this paper. The caching mechanism ensures
|
||||
that re-running the pipeline produces identical results without
|
||||
additional API cost.
|
||||
|
||||
|
||||
% =========================================================================
|
||||
\section{Conclusion}
|
||||
\label{sec:conclusion}
|
||||
% =========================================================================
|
||||
|
||||
We have presented a systematic, LLM-assisted analysis of the IETF's
|
||||
AI-agent standardization landscape, covering 475 Internet-Drafts from
|
||||
713 authors across more than 230 organizations. The analysis reveals a
|
||||
standards ecosystem experiencing unprecedented growth---from 0.5\% to
|
||||
9.3\% of all IETF submissions in fifteen months---accompanied by
|
||||
significant structural challenges.
|
||||
|
||||
The capability-to-safety ratio of approximately 4:1, the extreme
|
||||
protocol fragmentation (14 competing OAuth proposals, 155 A2A drafts
|
||||
with no interoperability layer), and the concentration of authorship
|
||||
(one vendor contributing $\sim$16\% of all drafts) are findings that
|
||||
have direct implications for the trajectory of AI-agent
|
||||
standardization. The 11 identified gaps, with three critical gaps
|
||||
centered on what happens when agents fail, highlight the areas where
|
||||
standardization effort is most urgently needed.
|
||||
|
||||
At the same time, the 132 cross-organization convergent ideas
|
||||
demonstrate that latent consensus exists beneath the fragmentation.
|
||||
Organizations agree on the problems; they disagree on the solutions.
|
||||
This gap between problem consensus and solution divergence defines the
|
||||
current phase of the standards race and points toward the needed
|
||||
intervention: not more protocol proposals, but architectural
|
||||
connective tissue that composes the existing high-quality components
|
||||
into a coherent ecosystem.
|
||||
|
||||
The methodology itself contributes a replicable, cost-effective
|
||||
approach to standards landscape analysis. At \$9--15 total, the
|
||||
pipeline demonstrates that LLM-assisted document analysis at scale is
|
||||
practical for research and policy applications. The explicit
|
||||
documentation of limitations---no human calibration, empirical
|
||||
thresholds, single-judge ratings---provides a template for the
|
||||
responsible use of LLM-as-judge methodologies in technical document
|
||||
analysis.
|
||||
|
||||
The IETF has navigated standardization sprints before, and the lasting
|
||||
standards have consistently emerged from efforts that prioritized
|
||||
interoperability and safety alongside capability. Whether the current
|
||||
AI-agent wave follows this historical pattern depends on whether the
|
||||
community can shift from parallel invention to coordinated
|
||||
architecture before the capability work ships without the safety work
|
||||
that should accompany it.
|
||||
|
||||
|
||||
% =========================================================================
|
||||
% References
|
||||
% =========================================================================
|
||||
\bibliographystyle{plainnat}
|
||||
\bibliography{ietf-refs}
|
||||
|
||||
\end{document}
|
||||
334
paper/ietf-refs.bib
Normal file
334
paper/ietf-refs.bib
Normal file
@@ -0,0 +1,334 @@
|
||||
% =========================================================================
|
||||
% Bibliography — IETF AI-Agent Landscape Paper
|
||||
% =========================================================================
|
||||
|
||||
% --- IETF RFCs and Internet-Drafts ---
|
||||
|
||||
@techreport{rfc6749,
|
||||
author = {Dick Hardt},
|
||||
title = {{The OAuth 2.0 Authorization Framework}},
|
||||
institution = {IETF},
|
||||
type = {RFC},
|
||||
number = {6749},
|
||||
year = {2012},
|
||||
doi = {10.17487/RFC6749},
|
||||
}
|
||||
|
||||
@techreport{rfc7519,
|
||||
author = {Michael Jones and John Bradley and Nat Sakimura},
|
||||
title = {{JSON Web Token (JWT)}},
|
||||
institution = {IETF},
|
||||
type = {RFC},
|
||||
number = {7519},
|
||||
year = {2015},
|
||||
doi = {10.17487/RFC7519},
|
||||
}
|
||||
|
||||
@techreport{rfc8446,
|
||||
author = {Eric Rescorla},
|
||||
title = {{The Transport Layer Security (TLS) Protocol Version 1.3}},
|
||||
institution = {IETF},
|
||||
type = {RFC},
|
||||
number = {8446},
|
||||
year = {2018},
|
||||
doi = {10.17487/RFC8446},
|
||||
}
|
||||
|
||||
@techreport{rfc9110,
|
||||
author = {Roy T. Fielding and Mark Nottingham and Julian Reschke},
|
||||
title = {{HTTP Semantics}},
|
||||
institution = {IETF},
|
||||
type = {RFC},
|
||||
number = {9110},
|
||||
year = {2022},
|
||||
doi = {10.17487/RFC9110},
|
||||
}
|
||||
|
||||
@misc{draft-cowles-volt,
|
||||
author = {Colin Cowles},
|
||||
title = {{Verifiable Observation Logging for Transparency (VOLT)}},
|
||||
howpublished = {Internet-Draft},
|
||||
year = {2026},
|
||||
note = {Work in progress},
|
||||
}
|
||||
|
||||
@misc{draft-aylward-daap,
|
||||
author = {Ryan Aylward},
|
||||
title = {{Distributed AI Accountability Protocol (DAAP) Version 2.0}},
|
||||
howpublished = {Internet-Draft},
|
||||
year = {2026},
|
||||
note = {Work in progress},
|
||||
}
|
||||
|
||||
@misc{draft-guy-bary-stamp,
|
||||
author = {Guy Bary},
|
||||
title = {{Secure Task Authentication and Monitoring Protocol (STAMP)}},
|
||||
howpublished = {Internet-Draft},
|
||||
year = {2026},
|
||||
note = {Work in progress},
|
||||
}
|
||||
|
||||
@misc{draft-birkholz-vac,
|
||||
author = {Henk Birkholz},
|
||||
title = {{Verifiable Agent Conversations}},
|
||||
howpublished = {Internet-Draft},
|
||||
year = {2026},
|
||||
note = {Work in progress},
|
||||
}
|
||||
|
||||
@misc{draft-rosenberg-cheq,
|
||||
author = {Jonathan Rosenberg},
|
||||
title = {{CHEQ: Constrained Human-Engaged Queries for AI Agents}},
|
||||
howpublished = {Internet-Draft},
|
||||
year = {2025},
|
||||
note = {Work in progress},
|
||||
}
|
||||
|
||||
@misc{draft-williams-lm-hierarchy,
|
||||
author = {Brandon Williams},
|
||||
title = {{YANG Data Model for Hierarchical Language Model Coordination}},
|
||||
howpublished = {Internet-Draft},
|
||||
year = {2026},
|
||||
note = {Work in progress},
|
||||
}
|
||||
|
||||
@misc{draft-ietf-lake-edhoc,
|
||||
title = {{Ephemeral Diffie-Hellman Over COSE (EDHOC)}},
|
||||
howpublished = {Internet-Draft (IETF LAKE WG)},
|
||||
year = {2025},
|
||||
note = {Work in progress},
|
||||
}
|
||||
|
||||
% --- Standards bodies ---
|
||||
|
||||
@techreport{iso22989,
|
||||
author = {{ISO/IEC}},
|
||||
title = {{Information technology --- Artificial intelligence --- Artificial intelligence concepts and terminology}},
|
||||
institution = {ISO/IEC},
|
||||
number = {22989:2022},
|
||||
year = {2022},
|
||||
}
|
||||
|
||||
@techreport{iso42001,
|
||||
author = {{ISO/IEC}},
|
||||
title = {{Information technology --- Artificial intelligence --- Management system}},
|
||||
institution = {ISO/IEC},
|
||||
number = {42001:2023},
|
||||
year = {2023},
|
||||
}
|
||||
|
||||
@techreport{itu-y3172,
|
||||
author = {{ITU-T}},
|
||||
title = {{Architectural framework for machine learning in future networks including IMT-2020}},
|
||||
institution = {ITU-T},
|
||||
number = {Y.3172},
|
||||
year = {2019},
|
||||
}
|
||||
|
||||
@techreport{nist-ai-rmf,
|
||||
author = {{National Institute of Standards and Technology}},
|
||||
title = {{Artificial Intelligence Risk Management Framework (AI RMF 1.0)}},
|
||||
institution = {NIST},
|
||||
number = {AI 100-1},
|
||||
year = {2023},
|
||||
doi = {10.6028/NIST.AI.100-1},
|
||||
}
|
||||
|
||||
@misc{eu-ai-act,
|
||||
author = {{European Parliament and Council of the European Union}},
|
||||
title = {{Regulation (EU) 2024/1689 --- Artificial Intelligence Act}},
|
||||
howpublished = {Official Journal of the European Union},
|
||||
year = {2024},
|
||||
}
|
||||
|
||||
% --- FIPA ---
|
||||
|
||||
@techreport{fipa-acl,
|
||||
author = {{Foundation for Intelligent Physical Agents}},
|
||||
title = {{FIPA ACL Message Structure Specification}},
|
||||
institution = {FIPA},
|
||||
number = {SC00061G},
|
||||
year = {2002},
|
||||
}
|
||||
|
||||
@techreport{fipa-ams,
|
||||
author = {{Foundation for Intelligent Physical Agents}},
|
||||
title = {{FIPA Agent Management Specification}},
|
||||
institution = {FIPA},
|
||||
number = {SC00023K},
|
||||
year = {2004},
|
||||
}
|
||||
|
||||
% --- Multi-agent systems ---
|
||||
|
||||
@book{wooldridge2009,
|
||||
author = {Michael Wooldridge},
|
||||
title = {{An Introduction to MultiAgent Systems}},
|
||||
publisher = {John Wiley \& Sons},
|
||||
edition = {2nd},
|
||||
year = {2009},
|
||||
}
|
||||
|
||||
@article{dorri2018,
|
||||
author = {Ali Dorri and Salil S. Kanhere and Raja Jurdak},
|
||||
title = {{Multi-Agent Systems: A Survey}},
|
||||
journal = {IEEE Access},
|
||||
volume = {6},
|
||||
pages = {28573--28593},
|
||||
year = {2018},
|
||||
doi = {10.1109/ACCESS.2018.2831228},
|
||||
}
|
||||
|
||||
@inproceedings{shoham2008,
|
||||
author = {Yoav Shoham and Kevin Leyton-Brown},
|
||||
title = {{Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations}},
|
||||
booktitle = {Cambridge University Press},
|
||||
year = {2008},
|
||||
}
|
||||
|
||||
% --- NLP and text analysis ---
|
||||
|
||||
@inproceedings{devlin2019,
|
||||
author = {Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
|
||||
title = {{BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding}},
|
||||
booktitle = {Proceedings of NAACL-HLT},
|
||||
pages = {4171--4186},
|
||||
year = {2019},
|
||||
}
|
||||
|
||||
@article{zheng2023,
|
||||
author = {Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric P. Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica},
|
||||
title = {{Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena}},
|
||||
journal = {Advances in Neural Information Processing Systems},
|
||||
volume = {36},
|
||||
year = {2023},
|
||||
}
|
||||
|
||||
@article{brown2020,
|
||||
author = {Tom Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and others},
|
||||
title = {{Language Models are Few-Shot Learners}},
|
||||
journal = {Advances in Neural Information Processing Systems},
|
||||
volume = {33},
|
||||
pages = {1877--1901},
|
||||
year = {2020},
|
||||
}
|
||||
|
||||
% --- Embeddings and clustering ---
|
||||
|
||||
@article{nussbaumer2024,
|
||||
author = {Nils Reimers and Iryna Gurevych},
|
||||
title = {{Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks}},
|
||||
journal = {Proceedings of EMNLP-IJCNLP},
|
||||
pages = {3982--3992},
|
||||
year = {2019},
|
||||
}
|
||||
|
||||
@article{nomic2024,
|
||||
author = {Zach Nussbaumer and John X. Morris and Brandon Duderstadt},
|
||||
title = {{Nomic Embed: Training a Reproducible Long Context Text Embedder}},
|
||||
journal = {arXiv preprint arXiv:2402.01613},
|
||||
year = {2024},
|
||||
}
|
||||
|
||||
@article{vandermaaten2008,
|
||||
author = {Laurens van der Maaten and Geoffrey Hinton},
|
||||
title = {{Visualizing Data using t-SNE}},
|
||||
journal = {Journal of Machine Learning Research},
|
||||
volume = {9},
|
||||
pages = {2579--2605},
|
||||
year = {2008},
|
||||
}
|
||||
|
||||
% --- Technology landscape analysis ---
|
||||
|
||||
@article{martin2016,
|
||||
author = {Ben R. Martin},
|
||||
title = {{Technology Foresight in a Rapidly Globalizing Economy}},
|
||||
journal = {International Journal of Foresight and Innovation Policy},
|
||||
volume = {4},
|
||||
number = {1/2},
|
||||
year = {2016},
|
||||
}
|
||||
|
||||
@book{porter2005,
|
||||
author = {Alan L. Porter and Scott W. Cunningham},
|
||||
title = {{Tech Mining: Exploiting New Technologies for Competitive Advantage}},
|
||||
publisher = {John Wiley \& Sons},
|
||||
year = {2005},
|
||||
}
|
||||
|
||||
@book{roper2011,
|
||||
author = {A. Thomas Roper and Scott W. Cunningham and Alan L. Porter and Thomas W. Mason and Frederick A. Rossini and Jerry Banks},
|
||||
title = {{Forecasting and Management of Technology}},
|
||||
publisher = {John Wiley \& Sons},
|
||||
edition = {2nd},
|
||||
year = {2011},
|
||||
}
|
||||
|
||||
% --- Standards analysis ---
|
||||
|
||||
@article{blind2017,
|
||||
author = {Knut Blind and Sören S. Petersen and Cesare A.F. Riillo},
|
||||
title = {{The Impact of Standards and Regulation on Innovation in Uncertain Markets}},
|
||||
journal = {Research Policy},
|
||||
volume = {46},
|
||||
number = {1},
|
||||
pages = {249--264},
|
||||
year = {2017},
|
||||
doi = {10.1016/j.respol.2016.11.003},
|
||||
}
|
||||
|
||||
@article{simcoe2012,
|
||||
author = {Timothy Simcoe},
|
||||
title = {{Standard Setting Committees: Consensus Governance for Shared Technology Platforms}},
|
||||
journal = {American Economic Review},
|
||||
volume = {102},
|
||||
number = {1},
|
||||
pages = {305--336},
|
||||
year = {2012},
|
||||
doi = {10.1257/aer.102.1.305},
|
||||
}
|
||||
|
||||
@article{lerner2014,
|
||||
author = {Josh Lerner and Jean Tirole},
|
||||
title = {{Standard-Essential Patents}},
|
||||
journal = {Journal of Political Economy},
|
||||
volume = {123},
|
||||
number = {3},
|
||||
pages = {547--586},
|
||||
year = {2015},
|
||||
doi = {10.1086/680995},
|
||||
}
|
||||
|
||||
% --- Agent protocols ---
|
||||
|
||||
@misc{anthropic-mcp,
|
||||
author = {{Anthropic}},
|
||||
title = {{Model Context Protocol (MCP) Specification}},
|
||||
year = {2024},
|
||||
howpublished = {\url{https://modelcontextprotocol.io}},
|
||||
}
|
||||
|
||||
@misc{google-a2a,
|
||||
author = {{Google}},
|
||||
title = {{Agent-to-Agent (A2A) Protocol}},
|
||||
year = {2025},
|
||||
howpublished = {\url{https://github.com/google/A2A}},
|
||||
}
|
||||
|
||||
@article{wang2024,
|
||||
author = {Lei Wang and Chen Ma and Xueyang Feng and Zeyu Zhang and Hao Yang and Jingsen Zhang and Zhiyuan Chen and Jiakai Tang and Xu Chen and Yankai Lin and Wayne Xin Zhao and Zhewei Wei and Ji-Rong Wen},
|
||||
title = {{A Survey on Large Language Model Based Autonomous Agents}},
|
||||
journal = {Frontiers of Computer Science},
|
||||
volume = {18},
|
||||
number = {6},
|
||||
year = {2024},
|
||||
doi = {10.1007/s11704-024-40231-1},
|
||||
}
|
||||
|
||||
@article{xi2023,
|
||||
author = {Zhiheng Xi and Wenxiang Chen and Xin Guo and Wei He and Yiwen Ding and Boyang Hong and Ming Zhang and Junzhe Wang and Senjie Jin and Enyu Zhou and others},
|
||||
title = {{The Rise and Potential of Large Language Model Based Agents: A Survey}},
|
||||
journal = {arXiv preprint arXiv:2309.07864},
|
||||
year = {2023},
|
||||
}
|
||||
Reference in New Issue
Block a user