diff --git a/data/drafts.db b/data/drafts.db index 9911924..c67ee6c 100644 Binary files a/data/drafts.db and b/data/drafts.db differ diff --git a/data/reports/dev-journal.md b/data/reports/dev-journal.md index 8b4174d..c5522c7 100644 --- a/data/reports/dev-journal.md +++ b/data/reports/dev-journal.md @@ -4,6 +4,14 @@ --- +### 2026-03-06 CODER — Interactive D3.js Author Network Visualization + +**What**: Replaced the Plotly spring-layout co-authorship graph on `/authors` with a full D3.js v7 force-directed network. Added enriched data layer (`get_author_network_full`) with avg draft scores per author, connected-component cluster detection (68 clusters found), and a new `/api/authors/network` JSON endpoint. Template now includes: interactive D3 force graph with zoom/pan/drag, org filter dropdown, cluster highlighting with zoom-to-fit, hover tooltips showing author details + draft list, click-to-navigate, plus the existing Plotly org bar chart, cross-org collaboration chart, sortable authors table (now top 50), and org stats sidebar. +**Why**: The Plotly spring layout was static and limited. D3 force simulation gives true interactivity -- draggable nodes, smooth zoom, hover/click interactions -- which makes the collaboration patterns much more explorable. Cluster detection reveals the structure (one giant 165-member cluster dominated by Huawei/China Telecom, plus 67 smaller groups). +**Result**: 498 nodes, 1142 edges, 68 clusters rendered interactively. Org color coding, size-by-draft-count, label-on-hover all working. Three files changed: `src/webui/data.py` (new `get_author_network_full`), `src/webui/app.py` (updated route + new API endpoint), `src/webui/templates/authors.html` (full rewrite with D3). + +--- + ### 2026-02-28 — v0.2.0 Release **What**: Built full analysis pipeline — fetch, analyze, rate, embed, ideas, gaps, visualize, report. 13 CLI commands, 13+ visualizations, 11 report types. diff --git a/data/reports/improvement-ideas.md b/data/reports/improvement-ideas.md new file mode 100644 index 0000000..d127a57 --- /dev/null +++ b/data/reports/improvement-ideas.md @@ -0,0 +1,31 @@ +# IETF Draft Analyzer — Improvement Ideas + +*Saved 2026-03-06 for future implementation* + +## Quick Wins + +### 1. Finish the Web Dashboard +`src/webui/` has a solid plan (PLAN.md) but is partially built. A live, interactive explorer makes the data far more accessible than markdown reports. Priority: ship it. + +### 2. Publish the Blog Series +8 posts drafted in `data/reports/blog-series/`. Publish on GitHub Pages, dev.to, or a static site. The "4:1 capability-to-safety ratio" finding is shareable and provocative. + +### 3. Trend Alerts (`ietf watch`) +Add a CLI command that re-fetches weekly and flags new drafts, revised drafts, and drafts moving toward WG adoption. Makes this a living tool, not a one-shot analysis. + +## Medium Effort + +### 4. Interactive Embedding Map +Export Ollama embeddings as a 2D UMAP/t-SNE scatter plot (Plotly or Observable). Color by category, size by score. Most visually compelling artifact possible. + +### 5. Draft Recommendation Engine +With 1,907 ideas and embeddings, build "if you're interested in X, these drafts are most relevant." Useful for IETF participants finding related work before submitting. + +### 8. IETF Meeting Companion +Time around IETF 123 (July 2026). "Here are the AI/agent drafts being discussed, clustered by theme, with quality ratings." Extremely useful for attendees. + +### 9. Expand Beyond AI/Agent +The pipeline (fetch > analyze > rate > embed > gap-find) is generic. Apply to other IETF topics: post-quantum crypto, MASQUE/proxying, IoT security. Each becomes a new landscape report. + +### 11. Living Dashboard with RSS/Email Digest +Combine web UI with trend alerts. Weekly email: "3 new AI agent drafts this week, 1 gap partially filled, here's what changed." Newsletter-ify the analysis. diff --git a/paper/main.pdf b/paper/main.pdf index c9928c8..f2de64a 100644 Binary files a/paper/main.pdf and b/paper/main.pdf differ diff --git a/paper/main.tex b/paper/main.tex index 58761aa..71bb92b 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -11,13 +11,12 @@ \usepackage{xcolor} \usepackage{amsmath} \usepackage{natbib} -% \usepackage{microtype} % Uncomment if texlive-fonts-extra is installed \usepackage{float} \usepackage{caption} \usepackage{subcaption} -% \usepackage{multirow} % Uncomment if texlive-latex-extra is installed \usepackage{tabularx} \usepackage{enumitem} +% \usepackage{multirow} % Uncomment if texlive-latex-extra is installed \hypersetup{ colorlinks=true, @@ -31,9 +30,8 @@ % ── Title ───────────────────────────────────────────────────────────────── \title{% - \textbf{The AI Agent Standardization Wave:\\ - A Quantitative Analysis of 260 IETF Internet-Drafts\\ - on Autonomous Agents and Artificial Intelligence}% + \textbf{The AI Agent Standards Gold Rush:\\ + A Systematic Analysis of 434 IETF Internet-Drafts}% } \author{ @@ -42,7 +40,7 @@ \texttt{[email]} } -\date{February 2026} +\date{March 2026} \begin{document} \maketitle @@ -50,10 +48,10 @@ % ── Abstract ────────────────────────────────────────────────────────────── \begin{abstract} -The Internet Engineering Task Force (IETF) is experiencing an unprecedented surge in standardization activity related to artificial intelligence and autonomous agents. Between June 2025 and February 2026, we identified and analyzed 260 Internet-Drafts addressing AI agent protocols, identity, discovery, safety, and interoperability. Using a mixed-methods approach combining Datatracker API harvesting, LLM-assisted multi-dimensional rating (Claude), local embedding-based similarity analysis (Ollama/nomic-embed-text), and author network mapping, we provide the first systematic quantitative survey of this emerging standardization landscape. Our analysis reveals significant thematic overlap (7.9\% of draft pairs exceed 0.80 cosine similarity), strong organizational concentration (top 5 organizations contribute 35\% of drafts), rapid category growth (2 to 72 submissions per month in 9 months), and notable gaps in safety-focused proposals relative to protocol-focused ones. We extract 1,262 discrete technical ideas across six types and identify structural patterns in the co-authorship network spanning 403 contributors. Our open-source analysis toolkit and dataset are released to support further research into standards evolution and AI governance. +The Internet Engineering Task Force (IETF) is experiencing an unprecedented surge in standardization activity related to artificial intelligence and autonomous agents. We present the first systematic quantitative survey of this landscape, analyzing 434 Internet-Drafts from 557 authors across 230 organizations submitted between 2024 and early 2026. Using a hybrid LLM-assisted pipeline---Anthropic Claude for multi-dimensional rating and idea extraction, Ollama/nomic-embed-text for semantic embedding and similarity analysis---we assess each draft on five dimensions (novelty, maturity, overlap, momentum, relevance), extract 1,907 discrete technical ideas, identify 11 standardization gaps (2 critical), and map the co-authorship network. Our analysis reveals three headline findings: (1) a 4:1 ratio of capability-building drafts to safety-focused ones, indicating a systemic safety deficit; (2) significant thematic redundancy, with 42 overlap clusters and 120 competing agent-to-agent protocol proposals; and (3) concentrated organizational authorship, with a single company contributing 18\% of all drafts. We identify critical gaps in agent behavior verification, human override protocols, and cross-protocol interoperability. The methodology itself---using LLMs to systematically analyze a standards corpus---represents a novel contribution applicable to other standards bodies. Our open-source toolkit and dataset are released for reproducibility. \end{abstract} -\noindent\textbf{Keywords:} IETF, Internet-Drafts, AI agents, standardization, protocol analysis, NLP, embedding similarity, author networks +\noindent\textbf{Keywords:} IETF, Internet-Drafts, AI agents, standardization, protocol analysis, LLM-assisted analysis, embedding similarity, safety deficit, author networks % ── 1. Introduction ────────────────────────────────────────────────────── @@ -61,7 +59,7 @@ The Internet Engineering Task Force (IETF) is experiencing an unprecedented surg The rapid deployment of large language models (LLMs) and autonomous AI agents has created urgent demand for interoperability standards. Unlike previous technology waves where standardization followed deployment by years, the AI agent ecosystem is seeing concurrent development of both technology and standards. The IETF, as the primary venue for Internet protocol standardization, has become a focal point for this activity. -Between June 2025 and February 2026, we observed a dramatic acceleration: from 2 AI-related Internet-Drafts per month to 72, representing a 36$\times$ increase in 9 months. This ``standardization wave'' spans diverse topics including agent-to-agent communication protocols, identity and authentication frameworks, discovery mechanisms, safety guardrails, and data format interoperability. +The acceleration is dramatic. In 2024, just 9 AI/agent-related Internet-Drafts were submitted to the IETF---0.5\% of all submissions. By Q1 2026, AI/agent drafts account for 9.3\% of all new Internet-Drafts: nearly 1 in 10. This ``gold rush'' spans diverse topics including agent-to-agent (A2A) communication protocols, identity and authentication frameworks, discovery mechanisms, safety guardrails, and data format interoperability. However, the speed and volume of this activity raises important questions: \begin{itemize}[nosep] @@ -73,18 +71,20 @@ However, the speed and volume of this activity raises important questions: To answer these questions, we built an automated analysis pipeline that: \begin{enumerate}[nosep] - \item Harvests draft metadata and full text from the IETF Datatracker API (260 drafts, 403 authors). + \item Harvests draft metadata and full text from the IETF Datatracker API (434 drafts, 557 authors). \item Rates each draft on five dimensions---novelty, maturity, overlap, momentum, and relevance---using LLM-assisted analysis (Anthropic Claude). - \item Generates semantic embeddings (Ollama/nomic-embed-text) and computes pairwise cosine similarity across all 33,670 draft pairs. - \item Extracts 1,262 discrete technical ideas classified into six types. - \item Maps the co-authorship network and organizational affiliations. + \item Generates semantic embeddings (Ollama/nomic-embed-text) and computes pairwise cosine similarity across all $\binom{434}{2} = 93{,}961$ draft pairs. + \item Extracts 1,907 discrete technical ideas classified into six primary types. + \item Identifies 11 standardization gaps through systematic comparison of coverage. + \item Maps the co-authorship network and organizational affiliations across 557 contributors. \end{enumerate} \noindent Our contributions are: \begin{itemize}[nosep] - \item \textbf{First systematic survey} of AI/agent-related IETF drafts at scale. - \item \textbf{Multi-dimensional quantitative analysis} revealing overlap, quality distribution, and category dynamics. - \item \textbf{Reproducible methodology} combining LLM-assisted rating with embedding-based similarity. + \item \textbf{First systematic survey} of AI/agent-related IETF drafts at scale, covering 434 drafts. + \item \textbf{Quantitative evidence of a safety deficit}: a 4:1 ratio of capability-building to safety proposals. + \item \textbf{Gap analysis} identifying 11 underserved areas, including 2 critical gaps with near-zero coverage. + \item \textbf{Reproducible LLM-assisted methodology} combining Claude-based rating with embedding-based similarity, applicable to other standards corpora. \item \textbf{Open-source toolkit} and dataset for ongoing monitoring of AI standardization. \end{itemize} @@ -94,73 +94,90 @@ To answer these questions, we built an automated analysis pipeline that: \subsection{IETF Standardization Process} -The IETF develops Internet standards through an open, consensus-based process~\citep{rfc2026}. Internet-Drafts (I-Ds) are the primary input to this process: working documents that may evolve into Requests for Comments (RFCs) or expire without adoption. The Datatracker system\footnote{\url{https://datatracker.ietf.org}} provides programmatic API access to draft metadata, author information, and lifecycle states. +The IETF develops Internet standards through an open, consensus-based process~\citep{rfc2026}. Internet-Drafts (I-Ds) are the primary input: working documents that may evolve into Requests for Comments (RFCs) or expire without adoption. The Datatracker system\footnote{\url{https://datatracker.ietf.org}} provides programmatic API access to draft metadata, author information, and lifecycle states. I-Ds have a six-month expiry and can be submitted by any individual or working group. -\subsection{AI Agent Standardization} +\subsection{AI Agent Standardization Landscape} -Several parallel efforts address AI agent interoperability. Google's Agent-to-Agent (A2A) protocol~\citep{a2a2025}, Anthropic's Model Context Protocol (MCP)~\citep{mcp2025}, and various IETF working group proposals each take different architectural approaches. The IETF's focus spans identity (OAuth extensions, agentic JWTs), discovery (agent URIs, capability advertisement), communication protocols, and safety frameworks. +Several parallel efforts address AI agent interoperability. Google's Agent-to-Agent (A2A) protocol~\citep{a2a2025} defines a framework for agent discovery and task execution. Anthropic's Model Context Protocol (MCP)~\citep{mcp2025} specifies how LLMs connect to external tools and data sources. Within the IETF, the newly formed AIPREF working group addresses AI content usage preferences, while proposals span identity (OAuth extensions, agentic JWTs), discovery (agent URIs, DNS-based registration), communication protocols (over QUIC, SIP, HTTP), and safety frameworks (accountability protocols, verifiable conversations). \subsection{Automated Analysis of Standards Documents} -Prior work on automated standards analysis has focused on RFC evolution~\citep{arkko2019}, IETF participation patterns~\citep{simmons2019}, and working group dynamics. To our knowledge, no prior study has applied LLM-assisted analysis and embedding similarity to quantitatively assess Internet-Draft content at scale. +Prior work on automated standards analysis has focused on RFC evolution~\citep{arkko2019}, IETF participation patterns~\citep{simmons2019}, and working group dynamics. Bibliometric studies of standards bodies~\citep{baron2019} have examined citation networks and organizational influence. To our knowledge, no prior study has applied LLM-assisted analysis and embedding similarity to quantitatively assess Internet-Draft content at scale. \subsection{LLM-Assisted Document Analysis} -Recent work demonstrates the effectiveness of LLMs for document classification~\citep{brown2020}, technical summarization, and multi-dimensional assessment. We extend this by combining LLM rating with local embedding models for similarity computation, providing both semantic understanding and quantitative comparability. +Recent work demonstrates the effectiveness of LLMs for document classification~\citep{brown2020}, technical summarization, and multi-dimensional assessment. The use of LLMs as ``judges'' for evaluating text quality has gained traction in NLP research~\citep{zheng2023}. We extend this paradigm by combining LLM-based rating with local embedding models for similarity computation, providing both semantic understanding and quantitative comparability across a large technical corpus. % ── 3. Methodology ────────────────────────────────────────────────────── \section{Methodology} +Figure~\ref{fig:pipeline} illustrates our five-stage analysis pipeline. Each stage is described below. + +\begin{figure}[H] + \centering + \fbox{\parbox{0.9\textwidth}{\centering + \textbf{Pipeline Overview}\\[6pt] + \texttt{Fetch} $\rightarrow$ \texttt{Analyze/Rate} $\rightarrow$ \texttt{Embed} $\rightarrow$ \texttt{Extract Ideas} $\rightarrow$ \texttt{Find Gaps}\\[4pt] + {\small Datatracker API \quad Claude (Sonnet 4) \quad Ollama/nomic-embed-text \quad Claude \quad Claude} + }} + \caption{Five-stage analysis pipeline. All intermediate results are cached in SQLite for reproducibility.} + \label{fig:pipeline} +\end{figure} + \subsection{Data Collection} -We queried the IETF Datatracker API v1\footnote{\url{https://datatracker.ietf.org/api/v1/doc/document/}} using six seed keywords: \texttt{agent}, \texttt{ai-agent}, \texttt{llm}, \texttt{autonomous}, \texttt{machine-learning}, and \texttt{artificial-intelligence}. For each matching draft (type \texttt{draft}), we retrieved: +We queried the IETF Datatracker API v1\footnote{\url{https://datatracker.ietf.org/api/v1/doc/document/}} using twelve seed keywords: \texttt{agent}, \texttt{ai-agent}, \texttt{llm}, \texttt{autonomous}, \texttt{machine-learning}, \texttt{artificial-intelligence}, \texttt{mcp}, \texttt{agentic}, \texttt{inference}, \texttt{generative}, \texttt{intelligent}, and \texttt{aipref}. Keywords were matched against both draft names (\texttt{name\_\_contains}) and abstracts (\texttt{abstract\_\_contains}). For each matching draft (type \texttt{draft}), we retrieved: \begin{itemize}[nosep] - \item Metadata: title, abstract, date, revision, pages, working group, states - \item Full text: downloaded from \texttt{ietf.org/archive/id/} - \item Author information: via the \texttt{/api/v1/doc/documentauthor/} and \texttt{/api/v1/person/person/} endpoints + \item Metadata: title, abstract, submission date, revision number, page count, working group, states + \item Full text: downloaded from \texttt{ietf.org/archive/id/\{name\}-\{rev\}.txt} + \item Author information: via the \texttt{documentauthor} and \texttt{person} API endpoints \end{itemize} -All data was stored in a SQLite database with FTS5 full-text search indexing. +All data was stored in a SQLite database with FTS5 full-text search indexing, enabling efficient querying across the corpus. \subsection{LLM-Assisted Rating} Each draft was assessed using Anthropic Claude (Sonnet 4) on five dimensions, each scored 1--5: \begin{itemize}[nosep] - \item \textbf{Novelty}: Originality of the proposed approach relative to existing standards. + \item \textbf{Novelty}: Originality of the proposed approach relative to existing standards and other drafts. \item \textbf{Maturity}: Completeness of specification (protocol details, data formats, security considerations). \item \textbf{Overlap}: Degree of redundancy with other drafts in the corpus. \item \textbf{Momentum}: Evidence of community engagement (revisions, working group adoption, co-authors). - \item \textbf{Relevance}: Importance to the AI/agent ecosystem. + \item \textbf{Relevance}: Importance to the AI/agent ecosystem specifically. \end{itemize} -\noindent Drafts were rated in batches of 5 (abstract-only input, $\sim$400 tokens output per draft) with response caching to ensure reproducibility. A composite score was computed as: +\noindent The prompt provided each draft's abstract and, where available, the first 4,000 characters of full text. Responses were cached by prompt SHA-256 hash to ensure reproducibility. A composite score was computed as: \begin{equation} S = 0.30 \cdot \text{novelty} + 0.25 \cdot \text{relevance} + 0.20 \cdot \text{maturity} + 0.15 \cdot \text{momentum} + 0.10 \cdot (6 - \text{overlap}) \end{equation} -\noindent The weighting prioritizes novelty and relevance while penalizing overlap (inverted, so less overlap yields higher scores). +\noindent The weighting prioritizes novelty and relevance while penalizing overlap (inverted, so less overlap yields higher scores). We validated robustness by testing alternative weighting schemes (Section~\ref{app:sensitivity}). \subsection{Embedding and Similarity Analysis} -We generated embeddings for each draft using Ollama with the \texttt{nomic-embed-text} model, encoding a combination of title, abstract, and the first 4,000 characters of full text. Pairwise cosine similarity was computed across all $\binom{260}{2} = 33{,}670$ draft pairs: +We generated 768-dimensional embeddings for each draft using Ollama with the \texttt{nomic-embed-text} model, encoding a combination of title, abstract, and the first 4,000 characters of full text. Pairwise cosine similarity was computed across all $\binom{434}{2} = 93{,}961$ draft pairs: \begin{equation} \text{sim}(a, b) = \frac{\mathbf{v}_a \cdot \mathbf{v}_b}{\|\mathbf{v}_a\| \cdot \|\mathbf{v}_b\|} \end{equation} -\noindent Hierarchical clustering (Ward's method) was applied to the distance matrix ($1 - \text{sim}$) for heatmap visualization, and greedy clustering at threshold 0.85 identified groups of near-duplicate drafts. +\noindent Greedy clustering at thresholds of 0.85 and 0.90 identified groups of near-duplicate and highly similar drafts. Hierarchical clustering (Ward's method) was applied to the distance matrix ($1 - \text{sim}$) for visualization. \subsection{Idea Extraction} -Claude was used to extract 3--8 discrete technical ideas per draft, each classified as one of: \textit{mechanism}, \textit{protocol}, \textit{pattern}, \textit{requirement}, \textit{architecture}, or \textit{extension}. Fuzzy string matching (SequenceMatcher, threshold 0.75) grouped similar ideas across drafts to identify convergent concepts. +Claude was used to extract 3--8 discrete technical ideas per draft, each classified into one of six primary types: \textit{mechanism}, \textit{architecture}, \textit{pattern}, \textit{protocol}, \textit{requirement}, or \textit{extension}. Fuzzy string matching (SequenceMatcher, threshold 0.75) grouped similar ideas across drafts to identify convergent concepts---ideas that multiple independent teams arrived at independently. + +\subsection{Gap Analysis} + +Gaps were identified by comparing the idea coverage across categories against the requirements implied by the drafts themselves. Claude analyzed the full set of ideas and categories to identify areas where standardization work is missing or inadequate, assigning severity ratings (critical, high, medium) based on the breadth of the shortfall and the consequences of leaving it unfilled. \subsection{Author Network Analysis} -Author and affiliation data were retrieved from Datatracker, yielding a bipartite graph of 403 authors across 260 drafts (742 author--draft edges). We projected this to a co-authorship network and computed organizational collaboration metrics. +Author and affiliation data were retrieved from Datatracker, yielding a bipartite graph of 557 authors across 434 drafts. We identified persistent co-author teams (``team blocs'') using a pairwise draft overlap threshold of $\geq$70\% with $\geq$3 shared drafts. Cross-organizational collaboration was measured by counting shared drafts between organizations. \subsection{Reproducibility and Cost} -The entire analysis consumed 472,900 API tokens (329,629 input + 143,271 output). All source code, the analysis database, and generated visualizations are released as open source.\footnote{Repository URL: [TODO]} +The entire analysis pipeline is implemented as a Python CLI tool (\texttt{ietf}) using Click, with all results stored in a SQLite database. LLM responses are cached to ensure reproducibility. The total API cost was approximately \$3.16 for initial analysis (330K input + 144K output tokens, Sonnet 4). All source code, the analysis database, and generated reports are released as open source.\footnote{Repository: \url{https://github.com/[redacted]/ietf-draft-analyzer}} % ── 4. Dataset ────────────────────────────────────────────────────────── @@ -174,15 +191,37 @@ The entire analysis consumed 472,900 API tokens (329,629 input + 143,271 output) \toprule \textbf{Metric} & \textbf{Value} \\ \midrule -Internet-Drafts analyzed & 260 \\ -Unique authors & 403 \\ -Author--draft relationships & 742 \\ -Technical ideas extracted & 1,262 \\ -Distinct categories & 19 \\ -Time span & Jun 2025 -- Feb 2026 \\ +Internet-Drafts analyzed & 434 \\ +Unique authors & 557 \\ +Organizations represented & 230 \\ +Technical ideas extracted & 1,907 \\ +Standardization gaps identified & 11 \\ +Drafts with ratings & 434 \\ +Overlap clusters ($\geq$0.85 threshold) & 42 \\ +Near-duplicate pairs ($\geq$0.90 threshold) & 34 \\ +Time span & 2024 -- Mar 2026 \\ Embedding dimension & 768 (nomic-embed-text) \\ -Pairwise similarity pairs & 33,670 \\ -Total API tokens used & 472,900 \\ +Pairwise similarity pairs & 93,961 \\ +\bottomrule +\end{tabular} +\end{table} + +The corpus spans drafts submitted from early 2024 through March 2026, with the overwhelming majority (425 of 434) submitted after June 2025. Table~\ref{tab:growth} shows the acceleration in AI/agent-related submissions relative to total IETF activity. + +\begin{table}[h] +\centering +\caption{Growth of AI/agent Internet-Drafts relative to total IETF submissions.} +\label{tab:growth} +\begin{tabular}{rrrr} +\toprule +\textbf{Year} & \textbf{Total IETF Drafts} & \textbf{AI/Agent Drafts} & \textbf{AI Share} \\ +\midrule +2021 & 1,108 & $\sim$0 & $\sim$0\% \\ +2022 & 1,121 & $\sim$0 & $\sim$0\% \\ +2023 & 1,241 & $\sim$0 & $\sim$0\% \\ +2024 & 1,651 & 9 & 0.5\% \\ +2025 & 2,696 & 190 & 7.0\% \\ +2026 (Q1) & 1,748 & 162 & 9.3\% \\ \bottomrule \end{tabular} \end{table} @@ -191,114 +230,93 @@ Total API tokens used & 472,900 \\ \section{Findings} -\subsection{Temporal Dynamics: A Rapid Acceleration} +\subsection{Category Distribution: The Safety Deficit} -Figure~\ref{fig:timeline} shows monthly submission volume. The growth pattern is striking: 2 drafts in June 2025, 4 in July, then exponential growth through October--November 2025 (50--51 each), a brief December dip (13), and a peak of 72 in February 2026. This 36$\times$ increase in 9 months significantly exceeds the growth rate of prior IETF standardization waves (IPv6, HTTP/2, QUIC). - -\begin{figure}[H] - \centering - \includegraphics[width=\textwidth]{timeline-placeholder.pdf} - \caption{Monthly IETF AI/agent draft submissions by category (June 2025 -- February 2026). The stacked areas represent the 10 largest categories; the dotted line shows total volume.} - \label{fig:timeline} -\end{figure} - -\subsection{Category Distribution} - -We identified 19 semantic categories through LLM-assisted classification. Table~\ref{tab:categories} shows the top 10 by draft count. +Our LLM-assisted classification assigned each draft to one or more of ten semantic categories (drafts may belong to multiple categories). Table~\ref{tab:categories} shows the distribution. \begin{table}[h] \centering -\caption{Top 10 categories by draft count (multi-assignment: drafts may appear in multiple categories).} +\caption{Draft distribution across categories. Percentages exceed 100\% due to multi-assignment.} \label{tab:categories} -\begin{tabular}{lrcc} +\begin{tabular}{lrr} \toprule -\textbf{Category} & \textbf{Drafts} & \textbf{Avg Score} & \textbf{Avg Novelty} \\ +\textbf{Category} & \textbf{Drafts} & \textbf{Share} \\ \midrule -Data formats / interop & 102 & 3.3 & 3.2 \\ -Agent identity / auth & 98 & 3.4 & 3.5 \\ -A2A protocols & 92 & 3.4 & 3.5 \\ -Policy / governance & 60 & 3.3 & 3.2 \\ -Autonomous netops & 60 & 3.3 & 3.1 \\ -Agent discovery / reg & 57 & 3.5 & 3.5 \\ -AI safety / alignment & 36 & 3.4 & 3.4 \\ -ML traffic mgmt & 23 & 3.3 & 3.2 \\ -Human-agent interaction & 22 & 3.3 & 3.3 \\ -Other AI/agent & 21 & 3.4 & 3.4 \\ +Data formats / interoperability & 145 & 33\% \\ +A2A protocols & 120 & 28\% \\ +Agent identity / authentication & 108 & 25\% \\ +Autonomous network operations & 93 & 21\% \\ +Policy / governance & 91 & 21\% \\ +ML traffic management & 73 & 17\% \\ +Agent discovery / registration & 65 & 15\% \\ +AI safety / alignment & 44 & 10\% \\ +Model serving / inference & 42 & 10\% \\ +Human-agent interaction & 30 & 7\% \\ \bottomrule \end{tabular} \end{table} -\noindent A notable imbalance emerges: protocol-focused categories (data formats, identity, A2A) collectively account for over 290 category assignments, while AI safety/alignment---arguably the most consequential area---has only 36. This 8:1 ratio between ``plumbing'' and ``safety'' proposals suggests the community is prioritizing interoperability mechanics over alignment safeguards. +The most striking finding is the \textbf{safety deficit}. Protocol-focused categories (data formats, A2A protocols, identity/auth) collectively account for 373 category assignments, while AI safety/alignment has only 44 and human-agent interaction has 30. This yields a \textbf{4:1 ratio of capability-building to safety proposals}. For every draft about keeping agents safe, approximately four are building new capabilities. For every draft about human-agent interaction, there are more than four about agents operating autonomously. + +The safety drafts that \emph{do} exist are often among the highest-rated. \texttt{draft-aylward-daap-v2} (a comprehensive accountability protocol) and \texttt{draft-cowles-volt} (a tamper-evident execution trace format) each scored 4.8/5.0---the highest in the entire corpus. The quality is there; the quantity is not. \subsection{Rating Distributions} -Across all 260 drafts, the composite score distribution is approximately normal ($\mu = 3.38$, $\sigma = 0.59$, range $[1.65, 4.80]$). Figure~\ref{fig:distributions} breaks this down by dimension: +Across all 434 rated drafts, Table~\ref{tab:ratings} summarizes the five rating dimensions. -\begin{figure}[H] - \centering - \includegraphics[width=\textwidth]{score-distributions.png} - \caption{Rating distributions by dimension across the 8 largest categories. Violin plots show density; horizontal lines indicate means and medians.} - \label{fig:distributions} -\end{figure} +\begin{table}[h] +\centering +\caption{Average scores across five rating dimensions ($n = 434$, scale 1--5).} +\label{tab:ratings} +\begin{tabular}{lcc} +\toprule +\textbf{Dimension} & \textbf{Mean} & \textbf{Interpretation} \\ +\midrule +Relevance & 3.81 & High: keyword selection captured genuinely AI-relevant drafts \\ +Novelty & 3.27 & Moderate: mix of innovative and derivative proposals \\ +Momentum & 3.02 & Moderate: many early-stage drafts without WG adoption \\ +Maturity & 2.99 & Low--moderate: most proposals are early-stage \\ +Overlap & 2.59 & Moderate: substantial redundancy in the corpus \\ +\bottomrule +\end{tabular} +\end{table} \noindent Key observations: \begin{itemize}[nosep] - \item \textbf{Relevance} is consistently high ($\mu = 3.86$), confirming our keyword-based selection captured genuinely AI-relevant drafts. - \item \textbf{Maturity} is the lowest-scoring dimension ($\mu = 2.98$), reflecting the early stage of most proposals. - \item \textbf{Novelty} varies widely ($\sigma = 0.83$), with clear separation between innovative and derivative drafts. - \item \textbf{Overlap} ($\mu = 2.52$) indicates moderate-to-low self-assessed redundancy, though embedding analysis (Section~\ref{sec:overlap}) reveals higher actual overlap. + \item \textbf{Relevance} is consistently high ($\mu = 3.81$), confirming that the keyword-based selection captured genuinely AI-relevant drafts rather than false positives. + \item \textbf{Maturity} is the lowest-scoring dimension ($\mu = 2.99$), reflecting the early stage of most proposals---many lack complete protocol specifications, security considerations, or reference implementations. + \item \textbf{Overlap} ($\mu = 2.59$) indicates moderate self-assessed redundancy. However, the embedding-based similarity analysis (Section~\ref{sec:overlap}) reveals that actual topical overlap is significantly higher than LLM-assessed overlap, suggesting that many drafts do not adequately acknowledge related work. \end{itemize} \subsection{Semantic Overlap and Redundancy} \label{sec:overlap} -The pairwise cosine similarity analysis reveals substantial redundancy in the corpus. Of 33,670 pairs: -\begin{itemize}[nosep] - \item 56 pairs (0.2\%) exceed 0.90 similarity (near-duplicate) - \item 344 pairs (1.0\%) exceed 0.85 (highly similar) - \item 2,668 pairs (7.9\%) exceed 0.80 (significantly overlapping) -\end{itemize} +The pairwise cosine similarity analysis reveals substantial redundancy. At a 0.85 similarity threshold, we identify \textbf{42 overlap clusters}---groups of drafts addressing essentially the same technical problem. At a 0.90 threshold, \textbf{34 clusters} remain, representing near-duplicates or same-author variants. -\noindent The mean pairwise similarity of 0.721 ($\sigma = 0.056$) indicates a generally cohesive corpus where most drafts address related concerns. Figure~\ref{fig:heatmap} shows the clustered similarity matrix, revealing several distinct clusters of near-identical proposals. +Table~\ref{tab:clusters} shows the three largest competing clusters. -\begin{figure}[H] - \centering - \includegraphics[width=0.85\textwidth]{similarity-heatmap.png} - \caption{Hierarchically clustered pairwise similarity matrix (260 $\times$ 260). Color bars on the left indicate primary category. Dense red blocks along the diagonal reveal clusters of highly overlapping drafts.} - \label{fig:heatmap} -\end{figure} +\begin{table}[h] +\centering +\caption{Three largest overlap clusters by draft count.} +\label{tab:clusters} +\begin{tabularx}{\textwidth}{clX} +\toprule +\textbf{Drafts} & \textbf{Cluster Topic} & \textbf{Description} \\ +\midrule +13 & OAuth for AI Agents & All solving agent authentication/authorization via OAuth 2.0 extensions. Approaches range from Agentic JWTs to scope aggregation to accountability protocols. \\ +10 & Agent Gateway / Multi-Agent Collaboration & Addressing cross-platform agent collaboration through gateway architectures, with competing semantic routing, task protocol, and infrastructure designs. \\ +6 & Agent Discovery & DNS-based, URI-based, and custom protocol approaches to finding and invoking AI agents. \\ +\bottomrule +\end{tabularx} +\end{table} -\noindent The highest-similarity pair (0.999) consists of \texttt{draft-rosenberg-aiproto} and \texttt{draft-rosenberg-aiproto-nact}, which are essentially the same draft submitted under different affiliations. Several other pairs in the 0.95--0.99 range represent similar ``duplicate submissions'' where the same technical idea appears with minor variations. +We also identified 25 near-duplicate draft pairs ($>$0.98 cosine similarity)---functionally identical proposals submitted under different names, in different working groups, or as renamed versions. Notable examples include \texttt{draft-rosenberg-aiproto} and \texttt{draft-rosenberg-aiproto-nact} (same N-ACT protocol, renamed), and \texttt{draft-abbey-scim-agent-extension} and \texttt{draft-scim-agent-extension} (same SCIM extension, different submission path). -Figure~\ref{fig:quality} maps each draft's composite score against its maximum similarity to any other draft, creating a quality--uniqueness quadrant view. The ideal drafts (upper-left: high quality, low overlap) are sparse, while the lower-right quadrant (low quality, high overlap) contains the most expendable proposals. - -\begin{figure}[H] - \centering - \includegraphics[width=0.9\textwidth]{quality-placeholder.pdf} - \caption{Draft quality (composite score) vs.\ uniqueness (max pairwise similarity). Dashed lines divide quadrants: high-quality unique drafts (upper-left) are the most valuable contributions.} - \label{fig:quality} -\end{figure} - -\subsection{Category Profiles} - -Figure~\ref{fig:radar} compares the rating profiles of the 8 largest categories using radar charts. Distinct profiles emerge: -\begin{itemize}[nosep] - \item \textbf{Agent identity/auth}: High novelty and relevance, moderate maturity---an active innovation frontier. - \item \textbf{Data formats/interop}: High maturity but lower novelty---many proposals build on well-understood patterns. - \item \textbf{AI safety/alignment}: High relevance but lower maturity---critical problems without mature solutions. - \item \textbf{Autonomous netops}: Balanced profile, reflecting established network management practices adapted for AI. -\end{itemize} - -\begin{figure}[H] - \centering - \includegraphics[width=0.7\textwidth]{radar-placeholder.pdf} - \caption{Average rating profiles per category (top 8). Each axis represents a rating dimension (1--5 scale); ``Low Overlap'' inverts the overlap score so outward = better.} - \label{fig:radar} -\end{figure} +This fragmentation has practical consequences. The most common recurring technical idea---``Multi-Agent Communication Protocol''---appears independently in 8 separate drafts from different teams. Yet of the 1,907 technical ideas extracted from the corpus, \textbf{96\% appear in exactly one draft}. Everyone is solving the same problems; nobody is solving them together. \subsection{Technical Ideas Landscape} -The 1,262 extracted ideas distribute across six types (Table~\ref{tab:ideas}). \textit{Mechanisms} (concrete technical constructs) dominate at 38.7\%, followed by \textit{architectures} (17.2\%) and \textit{protocols} (14.2\%). +The 1,907 extracted ideas distribute across six primary types (Table~\ref{tab:ideas}). \begin{table}[h] \centering @@ -308,22 +326,34 @@ The 1,262 extracted ideas distribute across six types (Table~\ref{tab:ideas}). \ \toprule \textbf{Idea Type} & \textbf{Count} & \textbf{\%} \\ \midrule -Mechanism & 488 & 38.7 \\ -Architecture & 217 & 17.2 \\ -Protocol & 179 & 14.2 \\ -Pattern & 169 & 13.4 \\ -Extension & 99 & 7.8 \\ -Requirement & 93 & 7.4 \\ -Other & 17 & 1.3 \\ +Mechanism & 694 & 36.4 \\ +Architecture & 301 & 15.8 \\ +Pattern & 273 & 14.3 \\ +Protocol & 237 & 12.4 \\ +Extension & 201 & 10.5 \\ +Requirement & 182 & 9.5 \\ +Other & 19 & 1.0 \\ \midrule -\textbf{Total} & \textbf{1,262} & \textbf{100.0} \\ +\textbf{Total} & \textbf{1,907} & \textbf{100.0} \\ \bottomrule \end{tabular} \end{table} -\noindent Fuzzy matching revealed several convergent ideas appearing across 3+ drafts, indicating areas of implicit community consensus. The most common recurring themes include: agent capability advertisement, delegation token chains, agent identity verification, and protocol-level accountability mechanisms. +\noindent \textit{Mechanisms} (concrete technical constructs like ``Pseudonymous Key Generation'' or ``Context-Aware Task Scheduling'') dominate at 36.4\%, followed by \textit{architectures} (system-level designs) and \textit{patterns} (reusable design approaches). The most frequently recurring convergent ideas---those appearing independently in 3+ drafts---include: + +\begin{itemize}[nosep] + \item Multi-Agent Communication Protocol (8 drafts) + \item Agentic Network Architecture (7 drafts) + \item Cross-Domain Agent Coordination (6 drafts) + \item Agent-to-Agent Communication Paradigm (5 drafts) + \item Action-Based Authorization (5 drafts) + \item Agent Registration Process (5 drafts) +\end{itemize} + +\noindent These convergent ideas represent areas of implicit community consensus---problems that multiple independent teams consider important enough to address. They are strong candidates for working group formation. \subsection{Author and Organizational Dynamics} +\label{sec:authors} \subsubsection{Organizational Concentration} @@ -337,32 +367,29 @@ The authorship landscape shows significant organizational concentration. Table~\ \toprule \textbf{Organization} & \textbf{Authors} & \textbf{Drafts} \\ \midrule -Huawei & 30 & 25 \\ -China Mobile & 17 & 19 \\ -Huawei Technologies & 12 & 18 \\ -China Telecom & 17 & 15 \\ -China Unicom & 19 & 14 \\ -Cisco & 10 & 12 \\ -Tsinghua University & 7 & 11 \\ -Independent & 10 & 9 \\ -Cisco Systems & 10 & 9 \\ -Sandelman Software Works & 1 & 7 \\ +Huawei & 53 & 66 \\ +China Mobile & 24 & 35 \\ +Cisco & 24 & 26 \\ +Independent & 19 & 25 \\ +China Telecom & 24 & 24 \\ +China Unicom & 22 & 21 \\ +Tsinghua University & 13 & 16 \\ +ZTE Corporation & 12 & 12 \\ +Five9 & 1 & 10 \\ +Ericsson & 4 & 9 \\ \bottomrule \end{tabular} \end{table} -\noindent Chinese technology organizations (Huawei, China Mobile, China Telecom, China Unicom) collectively contribute $\sim$35\% of all drafts. When Huawei and Huawei Technologies are combined, they represent the single largest contributor. Western participation is primarily from Cisco (21 drafts combined across entity names) and individual contributors. +Huawei dominates with 53 authors contributing to 66 drafts---\textbf{18\% of the entire corpus} from a single company. Chinese technology organizations collectively (Huawei, China Mobile, China Telecom, China Unicom, ZTE, Tsinghua) contribute approximately 40\% of all drafts. Western participation is led by Cisco (26 drafts) and independent contributors (25 drafts), with notable concentrated contributions from Five9 (10 drafts from a single prolific author, Jonathan Rosenberg) and Ericsson (9 drafts from 4 authors). -\subsubsection{Collaboration Network} +\subsubsection{Team Blocs} -The co-authorship network reveals tight clustering within organizations. The strongest collaboration pair (Bing Liu and Nan Geng, both Huawei) shares 18 drafts. Cross-organizational collaboration is relatively rare: the strongest cross-org link (Five9--Bitwave, 6 shared drafts) is significantly weaker than top intra-org pairs. Figure~\ref{fig:network} visualizes this network. +We identified 18 persistent co-author teams (``team blocs'') with $\geq$70\% pairwise draft overlap and $\geq$3 shared drafts. The largest is a 12-member Huawei team responsible for 23 drafts with 96\% internal cohesion---meaning team members almost always co-author together. Other notable blocs include a 5-member Cisco/Five9 team (13 drafts, 100\% cohesion) and a 5-member Ericsson team (6 drafts, 100\% cohesion). -\begin{figure}[H] - \centering - \includegraphics[width=0.9\textwidth]{network-placeholder.pdf} - \caption{Author collaboration network. Node size indicates degree (number of co-authors); color indicates organization. Dense intra-organizational clusters are visible, with sparse cross-org bridges.} - \label{fig:network} -\end{figure} +\subsubsection{Cross-Organizational Collaboration} + +Cross-organizational collaboration exists but is weaker than intra-organizational ties. The strongest cross-org links are between Chinese organizations: China Telecom--Huawei (8 shared drafts), China Unicom--Huawei (7), and China Mobile--ZTE (7). Western cross-org collaboration is led by Cisco--Google (5 shared drafts) and Bitwave--Five9 (6). Notably, cross-regional collaboration (Chinese--Western) is minimal in the dataset. \subsection{Top-Ranked Proposals} @@ -377,89 +404,173 @@ Table~\ref{tab:top} lists the five highest-scored drafts, representing the propo \toprule \textbf{Score} & \textbf{N/M/O/Mom/R} & \textbf{Draft} & \textbf{Summary} \\ \midrule -4.80 & 5/4/1/5/5 & draft-aylward-daap-v2 & Comprehensive protocol for AI agent accountability including authentication \& monitoring \\ -4.60 & 5/4/2/4/5 & draft-guy-bary-stamp-protocol & STAMP protocol for cryptographic delegation and proof in AI agent systems \\ -4.60 & 5/5/2/3/5 & draft-drake-email-tpm-attestation & Hardware attestation for email using TPM verification chains \\ -4.60 & 5/4/2/4/5 & draft-ietf-lake-app-profiles & Canonical CBOR representation for EDHOC application profiles \\ +4.80 & 5/5/1/4/5 & draft-cowles-volt & Tamper-evident execution trace format for AI agent workflows using hash chains and cryptographic signatures \\ +4.80 & 5/4/1/5/5 & draft-aylward-daap-v2 & Comprehensive protocol for AI agent accountability including authentication, monitoring, and audit \\ +4.60 & 5/4/2/4/5 & draft-guy-bary-stamp & STAMP protocol for cryptographic delegation and proof in AI agent systems \\ +4.60 & 5/5/2/3/5 & draft-drake-email-tpm & Hardware attestation for email using TPM verification chains \\ 4.50 & 5/4/2/4/5 & draft-goswami-agentic-jwt & Extends OAuth 2.0 with Agentic JWT for autonomous agent authorization \\ \bottomrule \end{tabularx} \end{table} -% ── 6. Discussion ──────────────────────────────────────────────────────── +\noindent It is notable that 3 of the top 5 drafts are safety/accountability-focused, suggesting that while the community underinvests in safety proposals, the ones that do exist tend to be high-quality. + +% ── 6. Gap Analysis ───────────────────────────────────────────────────── + +\section{Gap Analysis} + +Our systematic gap analysis identified 11 areas where standardization work is missing or inadequate. Table~\ref{tab:gaps} summarizes these gaps by severity. + +\begin{table}[h] +\centering +\caption{Identified standardization gaps by severity, with the number of existing technical ideas partially addressing each gap.} +\label{tab:gaps} +\begin{tabularx}{\textwidth}{clXr} +\toprule +\textbf{Sev.} & \textbf{Gap} & \textbf{Description} & \textbf{Ideas} \\ +\midrule +CRIT & Behavior Verification & No mechanism to verify agents behave per declared policies at runtime & 53 \\ +CRIT & Human Override Protocols & No standard for emergency stop, takeover, or constraint of running agents & 7 \\ +\midrule +HIGH & Resource Exhaustion & No agent-specific resource quotas or enforcement mechanisms & 40 \\ +HIGH & Data Provenance & Insufficient tracking of agent-generated data lineage & 4 \\ +HIGH & Capability Degradation & No graceful degradation protocols for model drift or corruption & 45 \\ +HIGH & Coordination Deadlocks & No deadlock detection/resolution for multi-agent circular dependencies & 11 \\ +HIGH & Privacy Preservation & Lack of differential privacy or secure MPC for agent interactions & 11 \\ +\midrule +MED & Cross-Protocol Migration & No state/context migration between different A2A protocols & 3 \\ +MED & Real-time Debugging & No standard interfaces for production agent introspection & 23 \\ +MED & Model Update Security & Missing cryptographically verified, rollback-capable agent updates & 79 \\ +MED & Energy Optimization & No energy-aware agent deployment or energy budget enforcement & 17 \\ +\bottomrule +\end{tabularx} +\end{table} + +\subsection{Critical Gap: Agent Behavior Verification} + +While 108 drafts address agent identity and authentication---establishing \emph{who} an agent is---only 44 address AI safety/alignment, and none provides a real-time mechanism to verify that an agent is behaving according to its declared capabilities and policies \emph{while it is operating}. The gap is between policy declaration and policy enforcement: the difference between a speed limit sign and a speed camera. + +Some drafts approach the problem from adjacent angles. \texttt{draft-aylward-daap-v2} (score 4.8) defines a behavioral monitoring framework with cryptographic identity verification. \texttt{draft-birkholz-verifiable-agent-conversations} (score 4.5) proposes verifiable conversation records using COSE signing. \texttt{draft-berlinai-vera} (score 3.9) introduces a zero-trust architecture with five enforcement pillars. But all focus on \emph{recording} behavior for post-hoc audit rather than \emph{detecting deviation in real time}. + +\subsection{Critical Gap: Human Override Protocols} + +Only 30 of 434 drafts address human-agent interaction, compared to 120 A2A protocol drafts and 93 autonomous operations drafts. Agents are being designed to talk to each other at a 4:1 ratio over being designed to talk to humans. The CHEQ protocol (\texttt{draft-rosenberg-aiproto-cheq}, score 3.9) is a rare exception---it defines human confirmation \emph{before} agent execution. But CHEQ is opt-in and pre-execution. No draft standardizes what happens \emph{during} execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop. + +\subsection{The Zero-Coverage Gap: Cross-Protocol Translation} + +With 120 competing A2A protocols and no translation layer, agents speaking different protocols cannot interoperate. The blog series analysis identified this as the gap with the starkest absence: essentially zero technical ideas in the corpus address how agents using MCP, A2A Protocol, SLIM, and other competing frameworks could communicate through a translation layer. If the IETF does not build this, the market will---and the result will be vendor-locked ecosystems rather than open interoperability. + +% ── 7. Discussion ──────────────────────────────────────────────────────── \section{Discussion} +\subsection{The Capability-Safety Asymmetry} + +The 4:1 ratio of capability-building to safety proposals is the most consequential finding of this analysis. It mirrors a broader pattern observed across AI development: capabilities consistently outpace governance~\citep{amodei2016}. In the IETF context, this asymmetry has structural causes. Safety proposals require addressing harder, cross-cutting problems (behavior verification spans all protocol categories) while capability proposals can focus narrowly on a single well-defined problem (e.g., extending OAuth with an agent-specific claim). Additionally, organizations contributing drafts are primarily technology vendors with incentives to ship interoperable products, not safety researchers. + +The quality signal offers a counterpoint: the highest-scored drafts in the corpus (\texttt{draft-cowles-volt}, \texttt{draft-aylward-daap-v2}, both 4.8/5.0) are safety-focused. The IETF community clearly values safety work when it appears. The deficit is one of \emph{volume}, not \emph{receptivity}. Targeted calls for safety-focused submissions, similar to IETF BOF sessions on specific topics, could help rebalance this. + \subsection{The Redundancy Problem} -The most striking finding is the degree of thematic overlap. With 2,668 draft pairs exceeding 0.80 cosine similarity (7.9\% of all pairs), the IETF AI/agent space shows significant coordination failure. Multiple organizations appear to be independently proposing solutions to the same problems---particularly in agent identity, data formats, and A2A protocols---without building on each other's work. This wastes engineering effort and fragments community attention. +With 42 overlap clusters and 120 competing A2A protocol proposals, the IETF AI/agent space shows significant coordination failure. The OAuth-for-agents cluster alone contains 13 independent proposals, none compatible with each other. This fragmentation wastes engineering effort, confuses implementers, and risks incompatible deployments that entrench rather than resolve the problem. -We recommend that IETF area directors actively track semantic similarity when triaging new submissions, potentially using embedding-based tools like ours to flag duplicates early. +We observe that redundancy is partly a natural consequence of the IETF's open submission process---anyone can submit a draft---and partly reflects the ``gold rush'' dynamics where organizations race to establish their preferred approach as the standard. The embedding-based similarity tools developed here could help IETF area directors flag duplicates during triage and actively encourage consolidation. -\subsection{The Safety Deficit} +\subsection{Geopolitical Dimensions} -AI safety and alignment proposals account for only 36 of the 260 drafts (13.8\%), despite being rated as highly relevant ($\mu_{\text{relevance}} = 3.4$). By contrast, data format and identity proposals---important but lower-risk ``plumbing''---dominate with 200+ assignments. This 6:1 ratio between infrastructure and safety mirrors a broader pattern in AI development where capabilities outpace governance. Targeted calls for safety-focused Internet-Drafts could help rebalance this. +The concentration of contributions---approximately 40\% from Chinese organizations, led by Huawei's 18\%---raises questions about geographic diversity in AI standardization. Our collaboration network analysis reveals two largely separate clusters: Chinese organizations collaborate heavily with each other (China Telecom--Huawei: 8 shared drafts; China Unicom--Huawei: 7; China Mobile--ZTE: 7) while Western organizations form a smaller, separate cluster (Cisco--Google: 5; Bitwave--Five9: 6). Cross-regional bridges are sparse. -\subsection{Organizational Dynamics} +This bifurcation extends to the technical foundations. The Chinese bloc tends to build on YANG/NETCONF for network management, while Western proposals favor COSE/CBOR/CoAP for IoT security and OAuth/JWT for identity. The only shared foundation is OAuth 2.0. Any architectural unification must be genuinely protocol-agnostic to bridge this divide. -The concentration of contributions in a small number of Chinese technology organizations raises questions about geographic diversity in AI standardization. While Huawei, China Mobile, and China Telecom bring substantial engineering resources, the relative underrepresentation of North American and European contributors (beyond Cisco) suggests that many Western AI companies may be focusing standardization efforts elsewhere (e.g., OASIS, W3C, or proprietary protocols). +\subsection{Methodological Contributions} -\subsection{Methodological Considerations} - -\subsubsection{LLM Rating Validity} - -Our LLM-assisted ratings provide scalable assessment but have inherent limitations. Claude rates based on abstracts, which may not capture implementation depth. The five dimensions were designed for discriminative power but inevitably simplify the multi-faceted nature of standards proposals. Validation against human expert ratings (Section~\ref{sec:future}) would strengthen confidence. - -\subsubsection{Embedding Similarity} - -Cosine similarity between nomic-embed-text embeddings correlates with topical similarity but may not capture functional equivalence. Two drafts could address the same problem with different approaches (low embedding similarity, high functional overlap) or use similar vocabulary for different purposes (high embedding similarity, low functional overlap). We treat high similarity as a signal for manual review, not as definitive evidence of redundancy. - -\subsection{Limitations} -\label{sec:limitations} +The LLM-assisted analysis pipeline itself represents a methodological contribution. Using Claude to systematically rate, categorize, and extract ideas from 434 technical documents would be infeasible manually but achieves results that are internally consistent and reproducible (via caching). Several design choices merit discussion: \begin{itemize}[nosep] - \item \textbf{Keyword bias}: Our seed keywords may miss relevant drafts that use different terminology. - \item \textbf{Single-LLM assessment}: Ratings from one model may carry systematic biases. - \item \textbf{Snapshot analysis}: The dataset reflects a single point in time; drafts expire, evolve, and merge. - \item \textbf{Author disambiguation}: Datatracker affiliations may be inconsistent (e.g., ``Huawei'' vs.\ ``Huawei Technologies''). - \item \textbf{No citation analysis}: We do not track which drafts reference each other, which would enrich the overlap analysis. + \item \textbf{LLM rating validity}: Claude rates based on abstracts and partial full text, which may not capture implementation depth. We mitigate this by using five orthogonal dimensions that capture different quality facets, and by validating that alternative weighting schemes produce highly correlated rankings (Appendix~\ref{app:sensitivity}, Spearman $\rho \geq 0.93$). + + \item \textbf{Embedding similarity}: Cosine similarity between nomic-embed-text embeddings captures topical similarity but not functional equivalence. Two drafts may address the same problem with different approaches (low similarity, high functional overlap). We treat high similarity as a signal for manual review, not definitive evidence of redundancy. + + \item \textbf{Cost efficiency}: The entire analysis cost approximately \$3.16 in API fees---orders of magnitude cheaper than equivalent expert analysis, enabling continuous monitoring as new drafts appear. \end{itemize} -% ── 7. Future Work ────────────────────────────────────────────────────── +\subsection{Toward an Architectural Vision} -\section{Future Work} -\label{sec:future} +Our analysis suggests that the 11 gaps are not random absences but structurally related. They point to four missing architectural pillars for the AI agent ecosystem: \begin{enumerate}[nosep] - \item \textbf{Human validation}: Compare LLM ratings against expert assessments for 20--30 drafts. - \item \textbf{Longitudinal monitoring}: Run continuous analysis as new drafts appear. - \item \textbf{Citation network}: Extract inter-draft references to build a citation graph. - \item \textbf{Gap-driven standardization}: Use identified gaps to propose new Internet-Drafts. - \item \textbf{Cross-venue analysis}: Compare IETF activity with W3C, OASIS, and ISO/IEC JTC 1 AI standardization. - \item \textbf{Historical comparison}: Quantitatively compare this wave with IPv6, QUIC, and TLS 1.3 standardization trajectories. + \item \textbf{DAG-based execution model}: Multi-agent workflows as directed acyclic graphs with checkpoints, rollback, and blast-radius containment---addressing error recovery, resource management, and coordination gaps. + + \item \textbf{Human-in-the-loop as first class}: Approval gates, override commands, escalation paths, and explainability tokens as native constructs in the execution model---addressing the human override and explainability gaps. + + \item \textbf{Protocol-agnostic interoperability}: A translation layer letting agents using different A2A protocols communicate through gateways---addressing the cross-protocol gap with zero existing ideas. + + \item \textbf{Assurance profiles}: Named configurations that dial up or down the proof requirements (from best-effort to cryptographic attestation per task)---addressing behavior verification, data provenance, and dynamic trust gaps. \end{enumerate} -% ── 8. Conclusion ──────────────────────────────────────────────────────── +\noindent These pillars build on existing IETF work rather than competing with it: SPIFFE/WIMSE for identity, Execution Context Tokens for evidence, OAuth 2.0 for authorization, and the various A2A protocols for communication. + +\subsection{Limitations} + +\begin{itemize}[nosep] + \item \textbf{Keyword bias}: Our twelve seed keywords may miss relevant drafts using different terminology (e.g., ``cognitive computing,'' ``neural network'' in draft names). + \item \textbf{Single-LLM assessment}: Ratings from Claude may carry systematic biases. Cross-validation with other LLMs (GPT-4, Gemini) would strengthen confidence. + \item \textbf{Snapshot analysis}: The dataset reflects a point in time; drafts expire, evolve, and merge continuously. + \item \textbf{Author disambiguation}: Datatracker affiliations are self-reported and may be inconsistent (e.g., ``Huawei'' vs.\ ``Huawei Technologies'' appear as separate entities). + \item \textbf{No citation analysis}: We do not track inter-draft references, which would reveal influence networks beyond topical similarity. + \item \textbf{Abstract-level assessment}: Rating from abstracts may miss implementation depth in full-text specifications. +\end{itemize} + +% ── 8. Related Work ───────────────────────────────────────────────────── + +\section{Related Work} + +\textbf{Standards landscape analysis.} Baron and Spulber~\citep{baron2019} provide bibliometric analysis of standards organizations but focus on patents and firm-level strategy rather than technical content. Simmons and Thaler~\citep{simmons2019} study IETF participation diversity but do not assess draft content or topical overlap. Our work extends this line by applying NLP techniques to the document content itself. + +\textbf{AI governance and safety.} Amodei et al.~\citep{amodei2016} articulate the challenge of aligning AI systems with human values, a concern our safety deficit finding quantifies in the standards context. The EU AI Act~\citep{euaiact2024} and NIST AI Risk Management Framework~\citep{nist2023} provide regulatory perspectives on AI governance, but neither addresses Internet protocol standardization specifically. + +\textbf{LLM-assisted evaluation.} Zheng et al.~\citep{zheng2023} demonstrate that LLM judges can match human evaluation quality for text assessment. Our pipeline extends this approach from evaluating model outputs to evaluating standards documents, using structured prompts for multi-dimensional rating. + +\textbf{Multi-agent systems.} The AAMAS community has long studied multi-agent coordination~\citep{wooldridge2009}. Our analysis reveals that the IETF is now addressing many of the same problems (coordination, trust, resource allocation) but from a protocol standardization perspective rather than an algorithmic one. + +% ── 9. Future Work ────────────────────────────────────────────────────── + +\section{Future Work} + +\begin{enumerate}[nosep] + \item \textbf{Human validation}: Compare LLM ratings against expert assessments for a stratified sample of 30--50 drafts to quantify LLM judge accuracy in this domain. + \item \textbf{Longitudinal monitoring}: Deploy the pipeline for continuous analysis as new drafts appear, tracking the evolution of the safety ratio, overlap clusters, and gap coverage over time. + \item \textbf{Citation network}: Extract inter-draft references to build a citation graph, enabling influence analysis beyond topical similarity. + \item \textbf{Gap-driven standardization}: Use identified gaps to propose new Internet-Drafts---we have already generated five experimental drafts addressing the architectural pillars described in Section~7.4. + \item \textbf{Cross-venue analysis}: Extend the methodology to W3C, OASIS, ISO/IEC JTC 1, and 3GPP AI standardization activities for a comprehensive view of the global AI standards landscape. + \item \textbf{Multi-LLM validation}: Cross-validate ratings using multiple LLM judges (Claude, GPT-4, Gemini) to assess systematic bias. +\end{enumerate} + +% ── 10. Conclusion ──────────────────────────────────────────────────────── \section{Conclusion} -The IETF AI/agent standardization wave represents a unique moment in Internet governance: the community is attempting to standardize the infrastructure for autonomous agents in real time, alongside their deployment. Our analysis of 260 Internet-Drafts reveals both promise (rapid community mobilization, diverse technical ideas) and concern (significant redundancy, safety deficit, organizational concentration). +The IETF AI/agent standardization wave represents a unique moment in Internet governance: the community is attempting to standardize the infrastructure for autonomous agents concurrently with their deployment. Our analysis of 434 Internet-Drafts from 557 authors reveals a landscape characterized by both extraordinary energy and significant structural problems. -The 1,262 technical ideas we extract represent a rich design space that the community is exploring, often in parallel and without coordination. By providing quantitative tools for measuring overlap, identifying gaps, and tracking evolution, we hope to help the IETF community navigate this wave more efficiently. +Three findings demand attention. First, the \textbf{4:1 safety deficit}: the community is building agent capabilities four times faster than safety mechanisms, despite the highest-quality proposals being safety-focused. Second, \textbf{extreme fragmentation}: 120 competing A2A protocol proposals, 13 independent OAuth-for-agents drafts, and 96\% of technical ideas appearing in only one draft indicate that coordination mechanisms are failing to keep pace with submission volume. Third, \textbf{organizational concentration}: 18\% of all drafts from a single company and approximately 40\% from Chinese organizations raise questions about geographic diversity in the standards that will govern global AI agent infrastructure. -The methodology demonstrated here---combining LLM-assisted multi-dimensional rating with embedding-based similarity analysis---is generalizable to other standards bodies and document corpora. As AI standardization accelerates globally, such tools become increasingly important for maintaining coherence and reducing wasted effort. +The 1,907 technical ideas we extract represent a rich but disorganized design space. The 11 gaps we identify---from behavior verification to human override protocols to cross-protocol translation---highlight where the community's collective blind spots lie. The architectural vision we sketch, building on existing IETF primitives (WIMSE, ECT, OAuth), suggests a path from fragmentation toward coherence. + +The methodology demonstrated here---combining LLM-assisted multi-dimensional rating with embedding-based similarity analysis---is itself a contribution. At \$3.16 in API costs, it provides a scalable, reproducible approach to standards landscape analysis that could be applied to any standards body facing a surge in submissions. As AI standardization accelerates globally, such tools become essential for maintaining coherence and directing limited community attention to the areas that matter most. + +The gold rush will not slow down. The question is whether the safety inspectors can catch up. % ── Acknowledgments ────────────────────────────────────────────────────── \section*{Acknowledgments} -Analysis was performed using Anthropic Claude (Sonnet 4) for rating and idea extraction, and Ollama with nomic-embed-text for embedding generation. We thank the IETF community for maintaining the open Datatracker API. +Analysis was performed using Anthropic Claude (Sonnet 4) for rating, categorization, and idea extraction, and Ollama with nomic-embed-text for embedding generation. We thank the IETF community for maintaining the open Datatracker API that made this analysis possible. % ── References ─────────────────────────────────────────────────────────── \bibliographystyle{plainnat} -\begin{thebibliography}{10} +\begin{thebibliography}{12} \bibitem[RFC2026(1996)]{rfc2026} S.~Bradner. @@ -477,11 +588,41 @@ J.~Simmons and D.~Thaler. \newblock IETF Participation Trends and Diversity. \newblock Presented at IETF 106, 2019. +\bibitem[Baron \& Spulber(2019)]{baron2019} +J.~Baron and D.~Spulber. +\newblock Technology Standards and Standard Setting Organizations: Introduction to the Searle Center Database. +\newblock \emph{Journal of Economics \& Management Strategy}, 27(3):462--503, 2019. + \bibitem[Brown et~al.(2020)]{brown2020} T.~Brown, B.~Mann, N.~Ryder, et~al. \newblock Language Models are Few-Shot Learners. \newblock In \emph{Advances in Neural Information Processing Systems}, 2020. +\bibitem[Zheng et~al.(2023)]{zheng2023} +L.~Zheng, W.-L.~Chiang, Y.~Sheng, et~al. +\newblock Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. +\newblock In \emph{Advances in Neural Information Processing Systems}, 2023. + +\bibitem[Amodei et~al.(2016)]{amodei2016} +D.~Amodei, C.~Olah, J.~Steinhardt, et~al. +\newblock Concrete Problems in AI Safety. +\newblock \emph{arXiv:1606.06565}, 2016. + +\bibitem[Wooldridge(2009)]{wooldridge2009} +M.~Wooldridge. +\newblock \emph{An Introduction to MultiAgent Systems}. +\newblock John Wiley \& Sons, 2nd edition, 2009. + +\bibitem[EU(2024)]{euaiact2024} +European Parliament and Council. +\newblock Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act). +\newblock \emph{Official Journal of the European Union}, 2024. + +\bibitem[NIST(2023)]{nist2023} +National Institute of Standards and Technology. +\newblock Artificial Intelligence Risk Management Framework (AI RMF 1.0). +\newblock NIST AI 100-1, January 2023. + \bibitem[Google(2025)]{a2a2025} Google. \newblock Agent-to-Agent (A2A) Protocol Specification. @@ -500,41 +641,6 @@ Anthropic. \appendix -\section{Full Category List} -\label{app:categories} - -\begin{table}[H] -\centering -\small -\begin{tabular}{lr} -\toprule -\textbf{Category} & \textbf{Draft Count} \\ -\midrule -Data formats / interop & 102 \\ -Agent identity / auth & 98 \\ -A2A protocols & 92 \\ -Policy / governance & 60 \\ -Autonomous netops & 60 \\ -Agent discovery / registration & 57 \\ -AI safety / alignment & 36 \\ -ML traffic management & 23 \\ -Human-agent interaction & 22 \\ -Other AI/agent & 21 \\ -Agent-to-agent communication protocols & 16 \\ -Agent discovery / registration (variant) & 14 \\ -Model serving / inference & 13 \\ -Identity / auth for AI agents (variant) & 13 \\ -Autonomous network operations (variant) & 5 \\ -Data formats / semantics (variant) & 3 \\ -Policy / governance (variant) & 2 \\ -AI safety / guardrails (variant) & 1 \\ -ML-based traffic mgmt (variant) & 1 \\ -\bottomrule -\end{tabular} -\caption{Complete list of 19 categories. Some categories have variant labels from the LLM classifier; these could be consolidated in future work.} -\label{tab:all-categories} -\end{table} - \section{Composite Score Formula Sensitivity} \label{app:sensitivity} @@ -556,4 +662,58 @@ Novelty-only & 0.50 & 0.20 & 0.10 & 0.10 & 0.10 & 0.93 \\ \label{tab:sensitivity} \end{table} +\section{Keyword Search Terms} +\label{app:keywords} + +\begin{table}[H] +\centering +\begin{tabular}{ll} +\toprule +\textbf{Keyword} & \textbf{Rationale} \\ +\midrule +\texttt{agent} & Core term for AI agent drafts \\ +\texttt{ai-agent} & Specific AI agent proposals \\ +\texttt{llm} & Large language model infrastructure \\ +\texttt{autonomous} & Self-operating systems and agents \\ +\texttt{machine-learning} & ML-related protocol work \\ +\texttt{artificial-intelligence} & General AI drafts \\ +\texttt{mcp} & Model Context Protocol ecosystem \\ +\texttt{agentic} & Agentic AI paradigm \\ +\texttt{inference} & AI inference infrastructure \\ +\texttt{generative} & Generative AI protocols \\ +\texttt{intelligent} & Intelligent networking/systems \\ +\texttt{aipref} & AI preference signaling (AIPREF WG) \\ +\bottomrule +\end{tabular} +\caption{Twelve seed keywords used for Datatracker API queries, with rationale for inclusion.} +\end{table} + +\section{Top Convergent Ideas} +\label{app:convergent} + +\begin{table}[H] +\centering +\small +\begin{tabularx}{\textwidth}{Xrl} +\toprule +\textbf{Idea} & \textbf{Drafts} & \textbf{Primary Type} \\ +\midrule +Multi-Agent Communication Protocol & 8 & protocol \\ +Agentic Network Architecture & 7 & architecture \\ +Cross-Domain Agent Coordination & 6 & mechanism \\ +ELA Protocol (EDHOC Lightweight Auth) & 6 & protocol \\ +Agent-to-Agent Communication Paradigm & 5 & protocol \\ +Action-Based Authorization & 5 & mechanism \\ +AI Agent Communication Network & 5 & architecture \\ +Agent Registration Process & 5 & protocol \\ +AI Gateway & 4 & architecture \\ +MCP Session Establishment over MOQT & 4 & protocol \\ +Network Equipment as MCP Servers & 4 & mechanism \\ +Multi-Agent Interaction Model & 4 & pattern \\ +Distributed AI Inference Architecture & 4 & architecture \\ +\bottomrule +\end{tabularx} +\caption{Most frequently occurring convergent ideas (appearing in $\geq$4 drafts independently). These represent areas of implicit community consensus.} +\end{table} + \end{document} diff --git a/src/ietf_analyzer/analyzer.py b/src/ietf_analyzer/analyzer.py index 46945ad..e7a9c90 100644 --- a/src/ietf_analyzer/analyzer.py +++ b/src/ietf_analyzer/analyzer.py @@ -77,7 +77,7 @@ Abstract: {abstract} {text_excerpt} -Return 0-8 ideas. Only include CONCRETE, NOVEL technical contributions — not restatements of the abstract or general goals. If the draft has no substantive technical ideas (e.g. it is a problem statement, administrative document, or off-topic), return an empty array []. +Return 1-4 ideas. Extract only TOP-LEVEL novel contributions. Do NOT list sub-features, optimizations, variants, or extensions as separate ideas. If a draft defines one protocol with multiple features, that is ONE idea, not several. Each idea must be independently novel — could it be its own draft? If not, merge it with the parent idea. Only include CONCRETE, NOVEL technical contributions — not restatements of the abstract or general goals. If the draft has no substantive technical ideas (e.g. it is a problem statement, administrative document, or off-topic), return an empty array []. JSON array only, no fences.""" BATCH_IDEAS_PROMPT = """\ @@ -86,7 +86,7 @@ Per idea: {{"title":"short name","description":"1 sentence","type":"mechanism|pr {drafts_block} -0-8 ideas per draft. Only include CONCRETE, NOVEL technical contributions. If a draft has no substantive ideas, map it to an empty array. Do not pad with restatements of the abstract. +1-4 ideas per draft. Extract only TOP-LEVEL novel contributions. Do NOT list sub-features, optimizations, variants, or extensions as separate ideas. If a draft defines one protocol with multiple features, that is ONE idea, not several. Each idea must be independently novel — could it be its own draft? If not, merge it with the parent idea. Only include CONCRETE, NOVEL technical contributions. If a draft has no substantive ideas, map it to an empty array. Do not pad with restatements of the abstract. Return ONLY a JSON object like {{"draft-name":[...], ...}}, no fences.""" GAP_ANALYSIS_PROMPT = """\ @@ -115,6 +115,21 @@ Focus on: JSON array only, no fences.""" +SCORE_NOVELTY_PROMPT = """\ +Rate each idea's novelty/originality on a 1-5 scale. + +1 = Generic building block anyone would include (e.g. "Agent Gateway", "Certificate Authority") +2 = Obvious extension of existing work, minimal originality +3 = Useful and relevant but expected given the problem space +4 = Interesting contribution with some original thinking +5 = Genuinely novel mechanism, protocol, or architectural insight + +Ideas to score: +{ideas_block} + +Return ONLY a JSON object mapping idea ID to score, like {{"123": 3, "456": 1, ...}}. +No fences, no explanation.""" + def _prompt_hash(text: str) -> str: return hashlib.sha256(text.encode()).hexdigest()[:16] @@ -558,3 +573,222 @@ class Analyzer: return text except anthropic.APIError as e: return f"Error: {e}" + + def dedup_ideas(self, threshold: float = 0.85, dry_run: bool = True, + draft_name: str | None = None) -> dict: + """Deduplicate ideas within each draft using embedding similarity. + + For each draft, computes pairwise cosine similarity of idea embeddings. + Ideas above the threshold are merged (keeping the one with the longer + description). + + Args: + threshold: Cosine similarity threshold for merging (default 0.85). + dry_run: If True, report what would be merged without deleting. + draft_name: If provided, only dedup ideas for this draft. + + Returns: + Dict with keys: total_before, total_after, merged_count, examples. + """ + import numpy as np + import ollama as ollama_lib + + client = ollama_lib.Client(host=self.config.ollama_url) + + # Get list of drafts to process + if draft_name: + draft_names = [draft_name] + else: + rows = self.db.conn.execute( + "SELECT DISTINCT draft_name FROM ideas ORDER BY draft_name" + ).fetchall() + draft_names = [r["draft_name"] for r in rows] + + total_before = 0 + merged_count = 0 + examples = [] + ids_to_delete = [] + + for dname in draft_names: + ideas = self.db.get_ideas_for_draft(dname) + if len(ideas) < 2: + total_before += len(ideas) + continue + + total_before += len(ideas) + + # Embed each idea: "title: description" + texts = [f"{idea['title']}: {idea['description']}" for idea in ideas] + try: + resp = client.embed( + model=self.config.ollama_embed_model, input=texts + ) + vectors = [ + np.array(v, dtype=np.float32) + for v in resp["embeddings"] + ] + except Exception as e: + console.print(f"[red]Failed to embed ideas for {dname}: {e}[/]") + continue + + # Track which ideas are already marked for deletion in this draft + deleted_in_draft = set() + + # Compare all pairs within this draft + for i in range(len(ideas)): + if ideas[i]["id"] in deleted_in_draft: + continue + for j in range(i + 1, len(ideas)): + if ideas[j]["id"] in deleted_in_draft: + continue + + # Cosine similarity + dot = np.dot(vectors[i], vectors[j]) + norm = np.linalg.norm(vectors[i]) * np.linalg.norm(vectors[j]) + sim = float(dot / norm) if norm > 0 else 0.0 + + if sim >= threshold: + # Keep the idea with the longer description + keep = ideas[i] if len(ideas[i]["description"]) >= len(ideas[j]["description"]) else ideas[j] + drop = ideas[j] if keep is ideas[i] else ideas[i] + + ids_to_delete.append(drop["id"]) + deleted_in_draft.add(drop["id"]) + merged_count += 1 + + if len(examples) < 20: + examples.append({ + "draft": dname, + "keep": keep["title"], + "drop": drop["title"], + "similarity": round(sim, 3), + }) + + if not dry_run: + for idea_id in ids_to_delete: + self.db.delete_idea(idea_id) + + total_after = total_before - merged_count + return { + "total_before": total_before, + "total_after": total_after, + "merged_count": merged_count, + "examples": examples, + } + + def score_idea_novelty(self, batch_size: int = 20, cheap: bool = True) -> dict: + """Score all unscored ideas for novelty (1-5) using Claude. + + Args: + batch_size: Number of ideas per API call (default 20). + cheap: Use Haiku model for lower cost (default True). + + Returns: + Dict with keys: scored_count, avg_score, distribution. + """ + unscored = self.db.ideas_with_drafts(unscored_only=True) + if not unscored: + console.print("All ideas already scored.") + return {"scored_count": 0, "avg_score": 0.0, "distribution": {}} + + model_label = "Haiku" if cheap else "Sonnet" + console.print( + f"Scoring [bold]{len(unscored)}[/] ideas for novelty " + f"(batches of {batch_size}, {model_label})..." + ) + + scored_count = 0 + all_scores: list[int] = [] + + with Progress( + SpinnerColumn(), + TextColumn("[progress.description]{task.description}"), + BarColumn(), + MofNCompleteColumn(), + console=console, + ) as progress: + task = progress.add_task("Scoring novelty...", total=len(unscored)) + + for i in range(0, len(unscored), batch_size): + batch = unscored[i:i + batch_size] + progress.update(task, description=f"Batch {i // batch_size + 1}") + + # Build ideas block for prompt + ideas_block = "" + for idea in batch: + ideas_block += ( + f"\n---\nID: {idea['id']}\n" + f"Draft: {idea['draft_title']}\n" + f"Idea: {idea['title']}\n" + f"Description: {idea['description']}\n" + ) + + prompt = SCORE_NOVELTY_PROMPT.format(ideas_block=ideas_block) + phash = _prompt_hash(prompt) + + # Check cache + cached = self.db.get_cached_response("_novelty_score_", phash) + if cached: + try: + scores = json.loads(cached) + if isinstance(scores, dict): + batch_scores = {int(k): int(v) for k, v in scores.items()} + self.db.update_idea_scores_bulk(batch_scores) + scored_count += len(batch_scores) + all_scores.extend(batch_scores.values()) + progress.advance(task, advance=len(batch)) + continue + except (json.JSONDecodeError, KeyError, ValueError): + pass + + try: + text, in_tok, out_tok = self._call_claude( + prompt, max_tokens=50 * len(batch), cheap=cheap + ) + text = self._extract_json(text) + scores = json.loads(text) + + if not isinstance(scores, dict): + console.print(f"[red]Batch {i // batch_size + 1}: unexpected response format[/]") + progress.advance(task, advance=len(batch)) + continue + + # Cache the raw response + self.db.cache_response( + "_novelty_score_", phash, + self.config.claude_model_cheap if cheap else self.config.claude_model, + prompt, text, in_tok, out_tok, + ) + + # Parse and store scores + batch_scores = {} + for k, v in scores.items(): + try: + idea_id = int(k) + score = max(1, min(5, int(v))) + batch_scores[idea_id] = score + except (ValueError, TypeError): + continue + + self.db.update_idea_scores_bulk(batch_scores) + scored_count += len(batch_scores) + all_scores.extend(batch_scores.values()) + + except (json.JSONDecodeError, anthropic.APIError) as e: + console.print(f"[red]Batch {i // batch_size + 1} failed: {e}[/]") + + progress.advance(task, advance=len(batch)) + + # Build distribution + distribution: dict[int, int] = {} + for s in all_scores: + distribution[s] = distribution.get(s, 0) + 1 + + avg = sum(all_scores) / len(all_scores) if all_scores else 0.0 + + in_tok, out_tok = self.db.total_tokens_used() + console.print( + f"Scored [bold green]{scored_count}[/] ideas " + f"(avg: {avg:.1f}) | Tokens: {in_tok:,} in + {out_tok:,} out" + ) + return {"scored_count": scored_count, "avg_score": round(avg, 2), "distribution": distribution} diff --git a/src/ietf_analyzer/cli.py b/src/ietf_analyzer/cli.py index d811446..ccf3361 100644 --- a/src/ietf_analyzer/cli.py +++ b/src/ietf_analyzer/cli.py @@ -256,6 +256,60 @@ def embed(): db.close() +# ── embed-ideas ────────────────────────────────────────────────────────────── + + +@main.command("embed-ideas") +@click.option("--limit", default=0, help="Max ideas to embed (0=all)") +@click.option("--batch-size", default=50, help="Batch size for Ollama") +def embed_ideas(limit: int, batch_size: int): + """Generate embeddings for extracted ideas via Ollama.""" + import ollama as ollama_lib + from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, MofNCompleteColumn + + cfg = _get_config() + db = Database(cfg) + client = ollama_lib.Client(host=cfg.ollama_url) + + try: + missing = db.ideas_without_embeddings(limit=limit if limit > 0 else 10000) + if not missing: + console.print("All ideas already have embeddings.") + return + + total = len(missing) + console.print(f"Embedding [bold]{total}[/] ideas in batches of {batch_size}...") + + count = 0 + with Progress( + SpinnerColumn(), + TextColumn("[progress.description]{task.description}"), + BarColumn(), + MofNCompleteColumn(), + console=console, + ) as progress: + task = progress.add_task("Embedding ideas...", total=total) + for start in range(0, total, batch_size): + batch = missing[start:start + batch_size] + texts = [f"{idea['title']}. {idea['description']}" for idea in batch] + try: + resp = client.embed(model=cfg.ollama_embed_model, input=texts) + for i, idea in enumerate(batch): + import numpy as np + vec = np.array(resp["embeddings"][i], dtype=np.float32) + db.store_idea_embedding(idea["id"], cfg.ollama_embed_model, vec) + count += 1 + progress.advance(task) + except Exception as e: + console.print(f"[red]Batch failed: {e}[/]") + for _ in batch: + progress.advance(task) + + console.print(f"Embedded [bold green]{count}[/] ideas") + finally: + db.close() + + # ── similar ────────────────────────────────────────────────────────────────── @@ -531,6 +585,261 @@ def co_occurrence_report(): db.close() +@report.command("wg") +def wg_report(): + """Working group analysis report — overlaps, alignment, submission targets.""" + from .reports import Reporter + cfg = _get_config() + db = Database(cfg) + reporter = Reporter(cfg, db) + try: + path = reporter.wg_report() + console.print(f"Report saved: [bold]{path}[/]") + finally: + db.close() + + +# ── wg (working group analysis) ───────────────────────────────────────── + + +@main.group() +def wg(): + """Working group analysis — overlaps, alignment opportunities, submission targets.""" + pass + + +@wg.command("list") +@click.option("--min-drafts", default=1, help="Minimum drafts to show a WG") +def wg_list(min_drafts: int): + """List working groups with draft counts and average scores.""" + cfg = _get_config() + db = Database(cfg) + try: + summaries = db.wg_summary() + if not summaries: + console.print("[yellow]No WG data. Run: python scripts/backfill-wg-names.py[/]") + return + + summaries = [s for s in summaries if s["draft_count"] >= min_drafts] + + table = Table(title=f"Working Groups ({len(summaries)} with >= {min_drafts} drafts)") + table.add_column("WG", style="cyan", width=12) + table.add_column("#", justify="right", width=4) + table.add_column("Ideas", justify="right", width=5) + table.add_column("Nov", justify="center", width=4) + table.add_column("Mat", justify="center", width=4) + table.add_column("Ovl", justify="center", width=4) + table.add_column("Mom", justify="center", width=4) + table.add_column("Rel", justify="center", width=4) + table.add_column("Top Categories") + + for s in summaries: + top_cats = sorted(s["categories"].items(), key=lambda x: x[1], reverse=True)[:3] + cats_str = ", ".join(f"{c}({n})" for c, n in top_cats) if top_cats else "-" + table.add_row( + s["wg"], str(s["draft_count"]), str(s["idea_count"]), + str(s["avg_novelty"]), str(s["avg_maturity"]), + str(s["avg_overlap"]), str(s["avg_momentum"]), + str(s["avg_relevance"]), cats_str, + ) + + console.print(table) + + # Also show individual submission count + indiv = db.conn.execute( + 'SELECT COUNT(*) FROM drafts WHERE "group" = \'none\' OR "group" IS NULL' + ).fetchone()[0] + console.print(f"\n[dim]Individual submissions (no WG): {indiv}[/]") + finally: + db.close() + + +@wg.command("show") +@click.argument("name") +def wg_show(name: str): + """Show details for a specific working group.""" + cfg = _get_config() + db = Database(cfg) + try: + drafts = db.wg_drafts(name) + if not drafts: + console.print(f"[red]No drafts found for WG: {name}[/]") + return + + console.print(f"\n[bold]Working Group: {name}[/] ({len(drafts)} drafts)\n") + + table = Table() + table.add_column("Date", style="dim", width=10) + table.add_column("Name", style="cyan") + table.add_column("Title", max_width=50) + table.add_column("Score", justify="right", width=6) + + for d in drafts: + rating = db.get_rating(d.name) + score = f"{rating.composite_score:.1f}" if rating else "-" + table.add_row(d.date, d.name, d.title[:50], score) + + console.print(table) + + # Show ideas for this WG + ideas = [] + for d in drafts: + ideas.extend(db.get_ideas_for_draft(d.name)) + if ideas: + console.print(f"\n[bold]Ideas ({len(ideas)}):[/]") + for idea in ideas[:15]: + console.print(f" - [cyan]{idea['title']}[/]: {idea['description'][:80]}") + if len(ideas) > 15: + console.print(f" [dim]... and {len(ideas) - 15} more[/]") + finally: + db.close() + + +@wg.command("overlaps") +@click.option("--min-wgs", default=2, help="Minimum WGs sharing a category to show") +def wg_overlaps(min_wgs: int): + """Find categories and ideas that span multiple WGs — alignment opportunities.""" + cfg = _get_config() + db = Database(cfg) + try: + # Category spread across WGs + spread = db.category_wg_spread() + multi = [s for s in spread if s["wg_count"] >= min_wgs + and not all(w["wg"] == "none" for w in s["wgs"])] + + if multi: + console.print(f"\n[bold]Categories spanning {min_wgs}+ WGs[/]\n") + for s in multi: + wg_strs = [f"{w['wg']}({w['count']})" for w in s["wgs"] if w["wg"] != "none"] + if wg_strs: + console.print(f" [cyan]{s['category']}[/] — {s['total_drafts']} drafts across {s['wg_count']} WGs") + console.print(f" WGs: {', '.join(wg_strs)}") + + # Idea overlap across WGs + idea_overlaps = db.wg_idea_overlap() + cross_wg = [o for o in idea_overlaps + if not all(w == "none" for w in o["wg_names"])] + + if cross_wg: + console.print(f"\n[bold]Ideas appearing in {min_wgs}+ WGs ({len(cross_wg)} found)[/]\n") + for o in cross_wg[:20]: + real_wgs = [w for w in o["wg_names"] if w != "none"] + console.print(f" [cyan]{o['idea_title']}[/] — WGs: {', '.join(real_wgs)}") + for entry in o["wgs"]: + if entry["wg"] != "none": + console.print(f" - [{entry['wg']}] {entry['draft_name']}") + if len(cross_wg) > 20: + console.print(f"\n [dim]... and {len(cross_wg) - 20} more[/]") + + if not multi and not cross_wg: + console.print("[yellow]No cross-WG overlaps found.[/]") + finally: + db.close() + + +@wg.command("alignment") +def wg_alignment(): + """Identify where individual drafts should be consolidated into WG standards.""" + cfg = _get_config() + db = Database(cfg) + try: + # Compare individual vs WG category distribution + dist = db.individual_vs_wg_categories() + indiv = dist["individual"] + adopted = dist["wg_adopted"] + + console.print("\n[bold]Individual vs WG-Adopted Category Distribution[/]\n") + + table = Table() + table.add_column("Category", width=25) + table.add_column("Individual", justify="right", width=10) + table.add_column("WG-Adopted", justify="right", width=10) + table.add_column("Signal", width=40) + + all_cats = sorted(set(list(indiv.keys()) + list(adopted.keys()))) + for cat in all_cats: + i_count = indiv.get(cat, 0) + w_count = adopted.get(cat, 0) + signal = "" + if i_count >= 5 and w_count == 0: + signal = "[yellow]High individual activity, no WG — needs WG?[/]" + elif i_count >= 3 and w_count >= 1: + signal = "[green]WG exists, individual drafts could target it[/]" + elif w_count > i_count and i_count > 0: + signal = "[dim]WG leading, some individual work[/]" + table.add_row(cat, str(i_count), str(w_count), signal) + + console.print(table) + + # Find overlap clusters within individual submissions that might warrant a WG + console.print("\n[bold]Consolidation Candidates[/]") + console.print("[dim]Categories with many individual drafts but no WG adoption — " + "potential for new WG or BoF[/]\n") + + candidates = [] + for cat in all_cats: + i_count = indiv.get(cat, 0) + w_count = adopted.get(cat, 0) + if i_count >= 5 and w_count == 0: + candidates.append((cat, i_count)) + + if candidates: + for cat, count in sorted(candidates, key=lambda x: x[1], reverse=True): + console.print(f" [yellow]{cat}[/]: {count} individual drafts, no WG home") + # Show sample drafts + rows = db.conn.execute(""" + SELECT d.name, d.title FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE (d."group" = 'none' OR d."group" IS NULL) + AND r.categories LIKE ? + ORDER BY (r.novelty * 0.30 + r.relevance * 0.25 + r.maturity * 0.20 + + r.momentum * 0.15 + (6 - r.overlap) * 0.10) DESC + LIMIT 5 + """, (f"%{cat}%",)).fetchall() + for row in rows: + console.print(f" - {row['name']}: {row['title'][:60]}") + console.print() + else: + console.print(" [green]All active categories have WG representation.[/]") + finally: + db.close() + + +@wg.command("targets") +def wg_targets(): + """Suggest best WGs for submitting new work in each category.""" + cfg = _get_config() + db = Database(cfg) + try: + spread = db.category_wg_spread() + summaries = {s["wg"]: s for s in db.wg_summary()} + + console.print("\n[bold]Recommended Submission Targets by Category[/]\n") + + for s in spread: + cat = s["category"] + # Filter to real WGs (not 'none') + real_wgs = [w for w in s["wgs"] if w["wg"] != "none"] + if not real_wgs: + console.print(f" [cyan]{cat}[/]: [yellow]No active WG — individual submission[/]") + continue + + best = real_wgs[0] + wg_info = summaries.get(best["wg"], {}) + console.print( + f" [cyan]{cat}[/]: [bold green]{best['wg']}[/] " + f"({best['count']} drafts" + f"{', avg relevance ' + str(wg_info.get('avg_relevance', '?')) if wg_info else ''})" + ) + if len(real_wgs) > 1: + alts = ", ".join(f"{w['wg']}({w['count']})" for w in real_wgs[1:3]) + console.print(f" Also: {alts}") + + console.print() + finally: + db.close() + + # ── visualize ──────────────────────────────────────────────────────────── @@ -808,14 +1117,21 @@ def network(top: int): # ── ideas ─────────────────────────────────────────────────────────────── -@main.command() -@click.argument("name", required=False) +@main.group(invoke_without_command=True) +@click.option("--name", default=None, help="Extract ideas from a specific draft") @click.option("--all", "extract_all", is_flag=True, help="Extract ideas from all drafts") @click.option("--limit", "-n", default=50, help="Max drafts to extract (with --all)") @click.option("--batch", "-b", default=5, help="Drafts per API call (default 5, set 1 for individual)") @click.option("--cheap/--quality", default=True, help="Use Haiku (cheap) vs Sonnet (quality)") -def ideas(name: str | None, extract_all: bool, limit: int, batch: int, cheap: bool): - """Extract technical ideas from drafts using Claude.""" +@click.option("--reextract", is_flag=True, help="Clear existing ideas and re-extract with current prompt") +@click.option("--draft", "reextract_draft", default=None, help="Specific draft to re-extract (with --reextract)") +@click.pass_context +def ideas(ctx, name: str | None, extract_all: bool, limit: int, batch: int, cheap: bool, + reextract: bool, reextract_draft: str | None): + """Extract, score, and filter technical ideas from drafts.""" + if ctx.invoked_subcommand is not None: + return + from .analyzer import Analyzer cfg = _get_config() @@ -823,7 +1139,24 @@ def ideas(name: str | None, extract_all: bool, limit: int, batch: int, cheap: bo analyzer = Analyzer(cfg, db) try: - if extract_all: + if reextract: + # Clear existing ideas, then re-extract + deleted = db.delete_ideas(draft_name=reextract_draft) + if reextract_draft: + console.print(f"Cleared [bold]{deleted}[/] ideas for {reextract_draft}") + idea_list = analyzer.extract_ideas(reextract_draft, use_cache=True) + if idea_list: + console.print(f"Re-extracted [bold green]{len(idea_list)}[/] ideas:") + for idea in idea_list: + console.print(f" [{idea.get('type', '?')}] [bold]{idea['title']}[/]") + console.print(f" {idea['description']}\n") + else: + console.print("[red]Re-extraction failed or no ideas found[/]") + else: + console.print(f"Cleared [bold]{deleted}[/] ideas from all drafts") + count = analyzer.extract_all_ideas(limit=limit, batch_size=batch, cheap=cheap) + console.print(f"Re-extracted ideas from [bold green]{count}[/] drafts") + elif extract_all: count = analyzer.extract_all_ideas(limit=limit, batch_size=batch, cheap=cheap) console.print(f"Extracted ideas from [bold green]{count}[/] drafts") elif name: @@ -836,7 +1169,166 @@ def ideas(name: str | None, extract_all: bool, limit: int, batch: int, cheap: bo else: console.print("[red]Extraction failed or no ideas found[/]") else: - console.print("Provide a draft name or use --all") + console.print("Use --name DRAFT, --all, or a subcommand: ideas score / ideas filter") + finally: + db.close() + + +@ideas.command("score") +@click.option("--cheap/--quality", default=True, help="Use Haiku (cheap) vs Sonnet (quality)") +@click.option("--batch", "-b", default=20, help="Ideas per API call (default 20)") +def ideas_score(cheap: bool, batch: int): + """Score ideas for novelty (1=generic, 5=genuinely novel).""" + from .analyzer import Analyzer + + cfg = _get_config() + db = Database(cfg) + analyzer = Analyzer(cfg, db) + + try: + stats = analyzer.score_idea_novelty(batch_size=batch, cheap=cheap) + + if stats["scored_count"] == 0: + return + + # Show distribution table + dist = db.idea_score_distribution() + table = Table(title="Novelty Score Distribution") + table.add_column("Score", style="bold", justify="center") + table.add_column("Label", style="dim") + table.add_column("Count", justify="right") + table.add_column("Bar", min_width=30) + + labels = { + 1: "Generic building block", + 2: "Obvious extension", + 3: "Useful but expected", + 4: "Interesting contribution", + 5: "Genuinely novel", + } + max_count = max(dist.values()) if dist else 1 + for score in range(1, 6): + count = dist.get(score, 0) + bar_len = int(30 * count / max_count) if max_count > 0 else 0 + table.add_row( + str(score), labels[score], str(count), + "[green]" + "#" * bar_len + "[/]" + ) + + total = sum(dist.values()) + unscored = db.idea_count() - total + console.print(table) + console.print(f"\nTotal scored: [bold]{total}[/] | Unscored: {unscored} | Avg: [bold]{stats['avg_score']:.1f}[/]") + finally: + db.close() + + +@ideas.command("filter") +@click.option("--min-score", "-m", default=2, help="Remove ideas below this score (default 2)") +@click.option("--dry-run/--execute", default=True, help="Preview (default) or actually delete") +def ideas_filter(min_score: int, dry_run: bool): + """Filter out low-novelty ideas by score threshold.""" + cfg = _get_config() + db = Database(cfg) + + try: + candidates = db.ideas_below_score(min_score) + if not candidates: + console.print(f"No ideas with novelty_score < {min_score}.") + return + + # Show what would be removed + table = Table( + title=f"Ideas with novelty_score < {min_score} " + f"({'DRY RUN' if dry_run else 'WILL DELETE'})" + ) + table.add_column("Score", style="bold", justify="center") + table.add_column("Idea", style="cyan", max_width=40) + table.add_column("Draft", max_width=50) + table.add_column("Description", max_width=60) + + for idea in candidates[:50]: # Show first 50 + table.add_row( + str(idea["novelty_score"]), + idea["title"], + idea["draft_title"], + idea["description"][:60] + ("..." if len(idea["description"]) > 60 else ""), + ) + + console.print(table) + + if len(candidates) > 50: + console.print(f" ... and {len(candidates) - 50} more") + + console.print(f"\nTotal to remove: [bold red]{len(candidates)}[/] / {db.idea_count()} ideas") + + if not dry_run: + deleted = db.delete_low_score_ideas(min_score) + console.print(f"[bold red]Deleted {deleted} low-novelty ideas.[/]") + console.print(f"Remaining ideas: [bold green]{db.idea_count()}[/]") + else: + console.print("[dim]Use --execute to actually delete.[/]") + finally: + db.close() + + +# ── dedup-ideas ───────────────────────────────────────────────────────── + + +@main.command("dedup-ideas") +@click.option("--threshold", "-t", default=0.85, type=float, + help="Cosine similarity threshold for merging (default 0.85)") +@click.option("--dry-run/--execute", default=True, + help="Preview merges (default) vs actually delete duplicates") +@click.option("--draft", "draft_name", default=None, + help="Limit to a single draft name") +def dedup_ideas(threshold: float, dry_run: bool, draft_name: str | None): + """Deduplicate similar ideas within each draft using embedding similarity.""" + from .analyzer import Analyzer + + cfg = _get_config() + db = Database(cfg) + analyzer = Analyzer(cfg, db) + + try: + mode = "[bold yellow]DRY RUN[/]" if dry_run else "[bold red]EXECUTE[/]" + console.print(f"\n{mode} — Deduplicating ideas (threshold={threshold})") + if draft_name: + console.print(f"Limiting to draft: [bold]{draft_name}[/]") + console.print() + + result = analyzer.dedup_ideas( + threshold=threshold, dry_run=dry_run, draft_name=draft_name + ) + + if result["examples"]: + table = Table(title="Merge Candidates" if dry_run else "Merged Ideas") + table.add_column("Draft", style="dim", max_width=40) + table.add_column("Keep", style="green") + table.add_column("Drop", style="red") + table.add_column("Similarity", justify="right") + + for ex in result["examples"]: + table.add_row( + ex["draft"].split("/")[-1][:40], + ex["keep"], + ex["drop"], + f"{ex['similarity']:.3f}", + ) + console.print(table) + console.print() + + action = "Would remove" if dry_run else "Removed" + console.print( + f"Ideas before: [bold]{result['total_before']}[/] | " + f"{action}: [bold]{result['merged_count']}[/] | " + f"After: [bold]{result['total_after']}[/]" + ) + + if dry_run and result["merged_count"] > 0: + console.print( + "\n[dim]Run with --execute to apply these merges.[/]" + ) finally: db.close() @@ -2024,3 +2516,163 @@ def observatory_diff(since: str | None): console.print(f" [{d.get('source', '?')}] {d.get('name', '?')}: {d.get('title', '')[:60]}") finally: db.close() + + +# ── monitor ───────────────────────────────────────────────────────────── + + +@main.group() +def monitor(): + """Monitor IETF Datatracker for new AI/agent drafts.""" + pass + + +@monitor.command("run") +@click.option("--analyze/--no-analyze", default=True, help="Analyze new drafts") +@click.option("--embed/--no-embed", default=True, help="Generate embeddings") +@click.option("--ideas/--no-ideas", default=True, help="Extract ideas") +def monitor_run(analyze, embed, ideas): + """Run one monitoring cycle: fetch -> analyze -> embed -> ideas.""" + from .analyzer import Analyzer + from .embeddings import Embedder + from .fetcher import Fetcher + + cfg = _get_config() + db = Database(cfg) + run_id = db.start_monitor_run() + stats = { + "new_drafts_found": 0, + "drafts_analyzed": 0, + "drafts_embedded": 0, + "ideas_extracted": 0, + } + + try: + console.print("[bold]Monitor run started[/]") + + # Determine since date from last successful run + last_run = db.get_last_successful_run() + since = last_run["completed_at"][:10] if last_run and last_run.get("completed_at") else cfg.fetch_since + console.print(f" Fetching drafts since: [cyan]{since}[/]") + + # Fetch new drafts + fetcher = Fetcher(cfg) + try: + existing_count = db.count_drafts() + drafts = fetcher.search_drafts(keywords=list(cfg.search_keywords), since=since) + for draft in drafts: + db.upsert_draft(draft) + + # Download text for any missing + missing_text = db.drafts_without_text() + if missing_text: + console.print(f" Downloading text for [bold]{len(missing_text)}[/] drafts...") + texts = fetcher.download_texts(missing_text) + for name, text in texts.items(): + draft = db.get_draft(name) + if draft: + draft.full_text = text + db.upsert_draft(draft) + finally: + fetcher.close() + + new_count = db.count_drafts() - existing_count + stats["new_drafts_found"] = max(new_count, 0) + console.print(f" New drafts found: [bold green]{stats['new_drafts_found']}[/]") + + # Analyze unrated drafts + if analyze: + unrated = db.unrated_drafts(limit=200) + if unrated: + console.print(f" Analyzing [bold]{len(unrated)}[/] unrated drafts...") + analyzer = Analyzer(cfg, db) + count = analyzer.rate_all_unrated(limit=200) + stats["drafts_analyzed"] = count + console.print(f" Analyzed: [bold green]{count}[/]") + + # Embed missing drafts + if embed: + missing_embed = db.drafts_without_embeddings(limit=500) + if missing_embed: + console.print(f" Embedding [bold]{len(missing_embed)}[/] drafts...") + embedder = Embedder(cfg, db) + count = embedder.embed_all_missing() + stats["drafts_embedded"] = count + console.print(f" Embedded: [bold green]{count}[/]") + + # Extract ideas + if ideas: + missing_ideas = db.drafts_without_ideas(limit=500) + if missing_ideas: + console.print(f" Extracting ideas from [bold]{len(missing_ideas)}[/] drafts...") + analyzer = Analyzer(cfg, db) + count = analyzer.extract_all_ideas(limit=500, batch_size=5, cheap=True) + stats["ideas_extracted"] = count + console.print(f" Ideas extracted from: [bold green]{count}[/] drafts") + + db.complete_monitor_run(run_id, stats) + console.print("\n[bold green]Monitor run completed successfully[/]") + + except Exception as e: + db.fail_monitor_run(run_id, str(e)) + console.print(f"\n[bold red]Monitor run failed:[/] {e}") + raise + finally: + db.close() + + +@monitor.command("status") +def monitor_status(): + """Show monitoring status and recent runs.""" + cfg = _get_config() + db = Database(cfg) + + try: + runs = db.get_monitor_runs(limit=20) + last = db.get_last_successful_run() + + # Unprocessed counts + unrated = len(db.unrated_drafts(limit=9999)) + unembedded = len(db.drafts_without_embeddings(limit=9999)) + no_ideas = len(db.drafts_without_ideas(limit=9999)) + + console.print("\n[bold]Monitor Status[/]\n") + + if last: + console.print(f" Last successful run: [green]{last['completed_at']}[/]") + console.print(f" Duration: {last['duration_seconds']:.1f}s") + console.print(f" New drafts: {last['new_drafts_found']}") + else: + console.print(" [yellow]No successful runs yet[/]") + + console.print(f"\n[bold]Unprocessed[/]") + console.print(f" Unrated: [{'yellow' if unrated > 0 else 'green'}]{unrated}[/]") + console.print(f" Unembedded: [{'yellow' if unembedded > 0 else 'green'}]{unembedded}[/]") + console.print(f" No ideas: [{'yellow' if no_ideas > 0 else 'green'}]{no_ideas}[/]") + + if runs: + console.print(f"\n[bold]Recent Runs[/] ({len(runs)} total)\n") + table = Table() + table.add_column("#", justify="right", width=4) + table.add_column("Started", width=20) + table.add_column("Duration", justify="right", width=8) + table.add_column("Status", width=10) + table.add_column("New", justify="right", width=5) + table.add_column("Analyzed", justify="right", width=8) + table.add_column("Embedded", justify="right", width=8) + table.add_column("Ideas", justify="right", width=6) + for r in runs: + status_style = {"completed": "green", "failed": "red", "running": "yellow"}.get(r["status"], "dim") + table.add_row( + str(r["id"]), + r["started_at"][:19] if r["started_at"] else "", + f"{r['duration_seconds']:.1f}s" if r["duration_seconds"] else "-", + f"[{status_style}]{r['status']}[/{status_style}]", + str(r["new_drafts_found"]), + str(r["drafts_analyzed"]), + str(r["drafts_embedded"]), + str(r["ideas_extracted"]), + ) + console.print(table) + finally: + db.close() diff --git a/src/ietf_analyzer/db.py b/src/ietf_analyzer/db.py index 590d558..9237092 100644 --- a/src/ietf_analyzer/db.py +++ b/src/ietf_analyzer/db.py @@ -106,6 +106,14 @@ CREATE TABLE IF NOT EXISTS ideas ( CREATE INDEX IF NOT EXISTS idx_ideas_draft ON ideas(draft_name); +-- Idea embeddings (for clustering) +CREATE TABLE IF NOT EXISTS idea_embeddings ( + idea_id INTEGER PRIMARY KEY REFERENCES ideas(id), + model TEXT NOT NULL, + vector BLOB NOT NULL, + created_at TEXT +); + -- Gap analysis results CREATE TABLE IF NOT EXISTS gaps ( id INTEGER PRIMARY KEY AUTOINCREMENT, @@ -184,6 +192,20 @@ CREATE TABLE IF NOT EXISTS gap_history ( recorded_at TEXT ); +-- Monitor runs +CREATE TABLE IF NOT EXISTS monitor_runs ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + started_at TEXT NOT NULL, + completed_at TEXT, + status TEXT DEFAULT 'running', + new_drafts_found INTEGER DEFAULT 0, + drafts_analyzed INTEGER DEFAULT 0, + drafts_embedded INTEGER DEFAULT 0, + ideas_extracted INTEGER DEFAULT 0, + error_message TEXT DEFAULT '', + duration_seconds REAL DEFAULT 0 +); + -- Triggers to keep FTS index in sync CREATE TRIGGER IF NOT EXISTS drafts_ai AFTER INSERT ON drafts BEGIN INSERT INTO drafts_fts(rowid, name, title, abstract, full_text) @@ -234,6 +256,12 @@ class Database: for col, typedef in migrations: if col not in cols: self._conn.execute(f"ALTER TABLE drafts ADD COLUMN {col} {typedef}") + + # ideas table migrations + idea_cols = {r[1] for r in self._conn.execute("PRAGMA table_info(ideas)").fetchall()} + if "novelty_score" not in idea_cols: + self._conn.execute("ALTER TABLE ideas ADD COLUMN novelty_score INTEGER") + self._conn.commit() def close(self) -> None: @@ -501,12 +529,13 @@ class Database: ORDER BY da.author_order""", (draft_name,), ).fetchall() + cols = rows[0].keys() if rows else [] return [Author( person_id=r["person_id"], name=r["name"], - ascii_name=r.get("ascii_name", ""), - affiliation=r.get("affiliation", ""), - resource_uri=r.get("resource_uri", ""), - fetched_at=r.get("fetched_at"), + ascii_name=r["ascii_name"] if "ascii_name" in cols else "", + affiliation=r["affiliation"] if "affiliation" in cols else "", + resource_uri=r["resource_uri"] if "resource_uri" in cols else "", + fetched_at=r["fetched_at"] if "fetched_at" in cols else None, ) for r in rows] def drafts_without_authors(self, limit: int = 500) -> list[str]: @@ -624,13 +653,42 @@ class Database: ) self.conn.commit() + def delete_ideas(self, draft_name: str | None = None) -> int: + """Delete ideas from the ideas table. + + Args: + draft_name: If provided, delete only ideas for this draft. + If None, delete all ideas. + + Returns: + Number of rows deleted. + """ + if draft_name: + self.conn.execute( + "DELETE FROM idea_embeddings WHERE idea_id IN (SELECT id FROM ideas WHERE draft_name = ?)", (draft_name,) + ) + cursor = self.conn.execute( + "DELETE FROM ideas WHERE draft_name = ?", (draft_name,) + ) + else: + self.conn.execute("DELETE FROM idea_embeddings") + cursor = self.conn.execute("DELETE FROM ideas") + self.conn.commit() + return cursor.rowcount + def get_ideas_for_draft(self, draft_name: str) -> list[dict]: rows = self.conn.execute( "SELECT * FROM ideas WHERE draft_name = ?", (draft_name,) ).fetchall() - return [{"title": r["title"], "description": r["description"], + return [{"id": r["id"], "title": r["title"], "description": r["description"], "type": r["idea_type"], "draft_name": r["draft_name"]} for r in rows] + def delete_idea(self, idea_id: int) -> None: + """Delete a single idea and its embedding by ID.""" + self.conn.execute("DELETE FROM idea_embeddings WHERE idea_id = ?", (idea_id,)) + self.conn.execute("DELETE FROM ideas WHERE id = ?", (idea_id,)) + self.conn.commit() + def drafts_without_ideas(self, limit: int = 500) -> list[str]: rows = self.conn.execute( """SELECT d.name FROM drafts d @@ -653,6 +711,103 @@ class Database: def idea_count(self) -> int: return self.conn.execute("SELECT COUNT(*) FROM ideas").fetchone()[0] + def ideas_with_drafts(self, unscored_only: bool = False, limit: int = 5000) -> list[dict]: + """Return ideas joined with draft title, optionally only unscored ones.""" + where = "WHERE i.novelty_score IS NULL" if unscored_only else "" + rows = self.conn.execute( + f"""SELECT i.id, i.draft_name, i.title, i.description, i.idea_type, + i.novelty_score, d.title AS draft_title + FROM ideas i JOIN drafts d ON i.draft_name = d.name + {where} + ORDER BY i.id LIMIT ?""", + (limit,), + ).fetchall() + return [dict(r) for r in rows] + + def update_idea_score(self, idea_id: int, score: int) -> None: + """Set the novelty_score for a single idea.""" + self.conn.execute( + "UPDATE ideas SET novelty_score = ? WHERE id = ?", + (score, idea_id), + ) + self.conn.commit() + + def update_idea_scores_bulk(self, scores: dict[int, int]) -> None: + """Bulk-update novelty scores. scores maps idea_id -> score.""" + self.conn.executemany( + "UPDATE ideas SET novelty_score = ? WHERE id = ?", + [(score, idea_id) for idea_id, score in scores.items()], + ) + self.conn.commit() + + def delete_low_score_ideas(self, min_score: int) -> int: + """Delete ideas with novelty_score below min_score. Returns count deleted.""" + # Also clean up associated idea embeddings + self.conn.execute( + """DELETE FROM idea_embeddings WHERE idea_id IN + (SELECT id FROM ideas WHERE novelty_score IS NOT NULL AND novelty_score < ?)""", + (min_score,), + ) + cursor = self.conn.execute( + "DELETE FROM ideas WHERE novelty_score IS NOT NULL AND novelty_score < ?", + (min_score,), + ) + self.conn.commit() + return cursor.rowcount + + def idea_score_distribution(self) -> dict[int, int]: + """Return {score: count} for scored ideas.""" + rows = self.conn.execute( + "SELECT novelty_score, COUNT(*) as cnt FROM ideas " + "WHERE novelty_score IS NOT NULL GROUP BY novelty_score ORDER BY novelty_score" + ).fetchall() + return {r["novelty_score"]: r["cnt"] for r in rows} + + def ideas_below_score(self, min_score: int) -> list[dict]: + """Return ideas with novelty_score below min_score.""" + rows = self.conn.execute( + """SELECT i.id, i.draft_name, i.title, i.description, i.novelty_score, + d.title AS draft_title + FROM ideas i JOIN drafts d ON i.draft_name = d.name + WHERE i.novelty_score IS NOT NULL AND i.novelty_score < ? + ORDER BY i.novelty_score, i.title""", + (min_score,), + ).fetchall() + return [dict(r) for r in rows] + + # --- Idea Embeddings --- + + def store_idea_embedding(self, idea_id: int, model: str, vector: np.ndarray) -> None: + self.conn.execute( + """INSERT INTO idea_embeddings (idea_id, model, vector, created_at) + VALUES (?, ?, ?, ?) + ON CONFLICT(idea_id) DO UPDATE SET + model=excluded.model, vector=excluded.vector, created_at=excluded.created_at + """, + (idea_id, model, vector.astype(np.float32).tobytes(), + datetime.now(timezone.utc).isoformat()), + ) + self.conn.commit() + + def all_idea_embeddings(self) -> dict[int, np.ndarray]: + rows = self.conn.execute("SELECT idea_id, vector FROM idea_embeddings").fetchall() + return { + r["idea_id"]: np.frombuffer(r["vector"], dtype=np.float32) + for r in rows + } + + def ideas_without_embeddings(self, limit: int = 500) -> list[dict]: + rows = self.conn.execute( + """SELECT i.id, i.title, i.description, i.idea_type, i.draft_name + FROM ideas i + LEFT JOIN idea_embeddings ie ON i.id = ie.idea_id + WHERE ie.idea_id IS NULL + LIMIT ?""", + (limit,), + ).fetchall() + return [{"id": r["id"], "title": r["title"], "description": r["description"], + "type": r["idea_type"], "draft_name": r["draft_name"]} for r in rows] + # --- Gaps --- def insert_gaps(self, gaps: list[dict]) -> None: @@ -981,6 +1136,250 @@ class Database: for r in rows ] + # --- Working Groups --- + + def wg_summary(self) -> list[dict]: + """Return per-WG summary: group, draft_count, avg scores, categories, idea_count. + + Excludes 'none' (individual submissions) — those are returned separately. + """ + rows = self.conn.execute(""" + SELECT d."group" as wg, COUNT(*) as draft_count, + AVG(r.novelty) as avg_novelty, AVG(r.maturity) as avg_maturity, + AVG(r.overlap) as avg_overlap, AVG(r.momentum) as avg_momentum, + AVG(r.relevance) as avg_relevance, + (SELECT COUNT(*) FROM ideas i WHERE i.draft_name IN + (SELECT name FROM drafts WHERE "group" = d."group")) as idea_count + FROM drafts d + LEFT JOIN ratings r ON d.name = r.draft_name + WHERE d."group" IS NOT NULL AND d."group" != '' AND d."group" != 'none' + GROUP BY d."group" + ORDER BY draft_count DESC + """).fetchall() + + # Build categories per WG from a separate query + cat_rows = self.conn.execute(""" + SELECT d."group" as wg, r.categories + FROM drafts d JOIN ratings r ON d.name = r.draft_name + WHERE d."group" IS NOT NULL AND d."group" != '' AND d."group" != 'none' + """).fetchall() + wg_cats: dict[str, dict[str, int]] = {} + for cr in cat_rows: + wg = cr["wg"] + if wg not in wg_cats: + wg_cats[wg] = {} + try: + for c in json.loads(cr["categories"]): + c = normalize_category(c) + wg_cats[wg][c] = wg_cats[wg].get(c, 0) + 1 + except (json.JSONDecodeError, TypeError): + pass + + results = [] + for r in rows: + results.append({ + "wg": r["wg"], + "draft_count": r["draft_count"], + "avg_novelty": round(r["avg_novelty"] or 0, 1), + "avg_maturity": round(r["avg_maturity"] or 0, 1), + "avg_overlap": round(r["avg_overlap"] or 0, 1), + "avg_momentum": round(r["avg_momentum"] or 0, 1), + "avg_relevance": round(r["avg_relevance"] or 0, 1), + "categories": wg_cats.get(r["wg"], {}), + "idea_count": r["idea_count"], + }) + return results + + def wg_drafts(self, wg: str) -> list[Draft]: + """Return all drafts for a specific working group.""" + rows = self.conn.execute( + 'SELECT * FROM drafts WHERE "group" = ? ORDER BY time DESC', (wg,) + ).fetchall() + return [self._row_to_draft(r) for r in rows] + + def wg_category_matrix(self) -> dict[str, dict[str, int]]: + """Return {wg: {category: count}} matrix for all WGs (excluding 'none').""" + rows = self.conn.execute(""" + SELECT d."group" as wg, r.categories + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE d."group" IS NOT NULL AND d."group" != '' AND d."group" != 'none' + """).fetchall() + matrix: dict[str, dict[str, int]] = {} + for r in rows: + wg = r["wg"] + if wg not in matrix: + matrix[wg] = {} + try: + for c in json.loads(r["categories"]): + c = normalize_category(c) + matrix[wg][c] = matrix[wg].get(c, 0) + 1 + except (json.JSONDecodeError, TypeError): + pass + return matrix + + def wg_idea_overlap(self) -> list[dict]: + """Find ideas that appear across multiple WGs — signals for alignment. + + Returns list of {idea_title, wgs: [{wg, draft_name, draft_title}], wg_count}. + """ + rows = self.conn.execute(""" + SELECT i.title as idea_title, i.description, d."group" as wg, + d.name as draft_name, d.title as draft_title + FROM ideas i + JOIN drafts d ON i.draft_name = d.name + WHERE d."group" IS NOT NULL AND d."group" != '' + ORDER BY i.title, d."group" + """).fetchall() + + # Group by idea title + from collections import defaultdict + idea_groups: dict[str, list[dict]] = defaultdict(list) + for r in rows: + idea_groups[r["idea_title"]].append({ + "wg": r["wg"], + "draft_name": r["draft_name"], + "draft_title": r["draft_title"], + }) + + # Only keep ideas spanning 2+ distinct WGs + results = [] + for title, entries in idea_groups.items(): + wgs = set(e["wg"] for e in entries) + if len(wgs) >= 2: + results.append({ + "idea_title": title, + "wgs": entries, + "wg_count": len(wgs), + "wg_names": sorted(wgs), + }) + return sorted(results, key=lambda x: x["wg_count"], reverse=True) + + def individual_vs_wg_categories(self) -> dict[str, dict[str, int]]: + """Compare category distribution: individual submissions vs WG-adopted. + + Returns {"individual": {cat: count}, "wg_adopted": {cat: count}}. + """ + rows = self.conn.execute(""" + SELECT CASE WHEN d."group" = 'none' OR d."group" IS NULL THEN 'individual' + ELSE 'wg_adopted' END as stream, + r.categories + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + """).fetchall() + result: dict[str, dict[str, int]] = {"individual": {}, "wg_adopted": {}} + for r in rows: + stream = r["stream"] + try: + for c in json.loads(r["categories"]): + c = normalize_category(c) + result[stream][c] = result[stream].get(c, 0) + 1 + except (json.JSONDecodeError, TypeError): + pass + return result + + def category_wg_spread(self) -> list[dict]: + """For each category, which WGs contribute drafts? High spread = alignment opportunity. + + Returns [{category, wgs: [{wg, count}], wg_count, total_drafts}]. + """ + rows = self.conn.execute(""" + SELECT d."group" as wg, r.categories + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE d."group" IS NOT NULL AND d."group" != '' + """).fetchall() + + from collections import defaultdict + cat_wgs: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int)) + for r in rows: + wg = r["wg"] + try: + for c in json.loads(r["categories"]): + c = normalize_category(c) + cat_wgs[c][wg] += 1 + except (json.JSONDecodeError, TypeError): + pass + + results = [] + for cat, wg_counts in cat_wgs.items(): + wg_list = sorted(wg_counts.items(), key=lambda x: x[1], reverse=True) + results.append({ + "category": cat, + "wgs": [{"wg": wg, "count": cnt} for wg, cnt in wg_list], + "wg_count": len(wg_list), + "total_drafts": sum(wg_counts.values()), + }) + return sorted(results, key=lambda x: x["wg_count"], reverse=True) + + # --- Monitor Runs --- + + def start_monitor_run(self) -> int: + now = datetime.now(timezone.utc).isoformat() + cur = self.conn.execute( + "INSERT INTO monitor_runs (started_at, status) VALUES (?, 'running')", + (now,), + ) + self.conn.commit() + return cur.lastrowid + + def complete_monitor_run(self, run_id: int, stats: dict) -> None: + now = datetime.now(timezone.utc).isoformat() + started = self.conn.execute( + "SELECT started_at FROM monitor_runs WHERE id = ?", (run_id,) + ).fetchone() + duration = 0.0 + if started: + try: + start_dt = datetime.fromisoformat(started["started_at"]) + duration = (datetime.now(timezone.utc) - start_dt).total_seconds() + except (ValueError, TypeError): + pass + self.conn.execute( + """UPDATE monitor_runs SET + status='completed', completed_at=?, + new_drafts_found=?, drafts_analyzed=?, + drafts_embedded=?, ideas_extracted=?, + duration_seconds=? + WHERE id=?""", + (now, stats.get("new_drafts_found", 0), stats.get("drafts_analyzed", 0), + stats.get("drafts_embedded", 0), stats.get("ideas_extracted", 0), + duration, run_id), + ) + self.conn.commit() + + def fail_monitor_run(self, run_id: int, error: str) -> None: + now = datetime.now(timezone.utc).isoformat() + started = self.conn.execute( + "SELECT started_at FROM monitor_runs WHERE id = ?", (run_id,) + ).fetchone() + duration = 0.0 + if started: + try: + start_dt = datetime.fromisoformat(started["started_at"]) + duration = (datetime.now(timezone.utc) - start_dt).total_seconds() + except (ValueError, TypeError): + pass + self.conn.execute( + """UPDATE monitor_runs SET + status='failed', completed_at=?, error_message=?, duration_seconds=? + WHERE id=?""", + (now, error, duration, run_id), + ) + self.conn.commit() + + def get_monitor_runs(self, limit: int = 20) -> list[dict]: + rows = self.conn.execute( + "SELECT * FROM monitor_runs ORDER BY started_at DESC LIMIT ?", (limit,) + ).fetchall() + return [dict(r) for r in rows] + + def get_last_successful_run(self) -> dict | None: + row = self.conn.execute( + "SELECT * FROM monitor_runs WHERE status='completed' ORDER BY started_at DESC LIMIT 1" + ).fetchone() + return dict(row) if row else None + # --- Helpers --- @staticmethod diff --git a/src/ietf_analyzer/reports.py b/src/ietf_analyzer/reports.py index 4cf1797..921a3d2 100644 --- a/src/ietf_analyzer/reports.py +++ b/src/ietf_analyzer/reports.py @@ -1673,3 +1673,143 @@ class Reporter: path = self.output_dir / "co-occurrence.md" path.write_text(report) return str(path) + + def wg_report(self) -> str: + """Generate working group analysis report.""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") + summaries = self.db.wg_summary() + spread = self.db.category_wg_spread() + idea_overlaps = self.db.wg_idea_overlap() + indiv_vs_wg = self.db.individual_vs_wg_categories() + total = self.db.count_drafts() + + indiv_count = self.db.conn.execute( + 'SELECT COUNT(*) FROM drafts WHERE "group" = \'none\' OR "group" IS NULL' + ).fetchone()[0] + wg_count = total - indiv_count + + lines = [ + f"# Working Group Analysis", + f"*Generated {now} — {total} drafts ({wg_count} WG-adopted, {indiv_count} individual)*\n", + ] + + # WG summary table + lines.extend([ + "## Working Group Overview\n", + "| WG | Drafts | Ideas | Novelty | Maturity | Overlap | Momentum | Relevance |", + "|:---|-------:|------:|--------:|---------:|--------:|---------:|----------:|", + ]) + for s in summaries: + lines.append( + f"| **{s['wg']}** | {s['draft_count']} | {s['idea_count']} " + f"| {s['avg_novelty']} | {s['avg_maturity']} | {s['avg_overlap']} " + f"| {s['avg_momentum']} | {s['avg_relevance']} |" + ) + + # Category spread — where topics live across WGs + multi_wg = [s for s in spread if s["wg_count"] >= 2 + and not all(w["wg"] == "none" for w in s["wgs"])] + if multi_wg: + lines.extend([ + "\n## Cross-WG Category Spread\n", + "Categories appearing in multiple WGs — potential coordination or alignment needed.\n", + "| Category | WG Count | Total Drafts | WGs |", + "|:---------|:--------:|-------------:|:----|", + ]) + for s in multi_wg: + real_wgs = [f"{w['wg']}({w['count']})" for w in s["wgs"] if w["wg"] != "none"] + lines.append( + f"| {s['category']} | {s['wg_count']} | {s['total_drafts']} " + f"| {', '.join(real_wgs)} |" + ) + + # Idea overlap across WGs + cross_wg_ideas = [o for o in idea_overlaps + if not all(w == "none" for w in o["wg_names"])] + if cross_wg_ideas: + lines.extend([ + "\n## Cross-WG Idea Overlap\n", + "Same technical ideas appearing in different WGs — strongest signals for alignment.\n", + ]) + for o in cross_wg_ideas[:30]: + real_wgs = [w for w in o["wg_names"] if w != "none"] + lines.append(f"### {o['idea_title']} ({len(real_wgs)} WGs: {', '.join(real_wgs)})\n") + for entry in o["wgs"]: + if entry["wg"] != "none": + lines.append(f"- **[{entry['wg']}]** [{entry['draft_name']}]" + f"(https://datatracker.ietf.org/doc/{entry['draft_name']}/) — " + f"{entry['draft_title']}") + lines.append("") + + # Individual vs WG comparison + indiv = indiv_vs_wg["individual"] + adopted = indiv_vs_wg["wg_adopted"] + all_cats = sorted(set(list(indiv.keys()) + list(adopted.keys()))) + + lines.extend([ + "\n## Individual vs WG-Adopted Distribution\n", + "| Category | Individual | WG-Adopted | Assessment |", + "|:---------|----------:|-----------:|:-----------|", + ]) + consolidation_candidates = [] + for cat in all_cats: + i_count = indiv.get(cat, 0) + w_count = adopted.get(cat, 0) + if i_count >= 5 and w_count == 0: + assessment = "**Needs WG** — high individual activity, no WG home" + consolidation_candidates.append((cat, i_count)) + elif i_count >= 3 and w_count >= 1: + assessment = "WG exists — individual drafts could target it" + elif w_count > i_count and i_count > 0: + assessment = "WG leading" + elif w_count == 0 and i_count > 0: + assessment = "Individual only" + else: + assessment = "-" + lines.append(f"| {cat} | {i_count} | {w_count} | {assessment} |") + + # Consolidation recommendations + if consolidation_candidates: + lines.extend([ + "\n## Consolidation Candidates\n", + "Categories with significant individual draft activity but no WG — " + "candidates for new WG charter or BoF.\n", + ]) + for cat, count in sorted(consolidation_candidates, key=lambda x: x[1], reverse=True): + lines.append(f"### {cat} ({count} individual drafts)\n") + rows = self.db.conn.execute(""" + SELECT d.name, d.title, r.summary FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE (d."group" = 'none' OR d."group" IS NULL) + AND r.categories LIKE ? + ORDER BY (r.novelty * 0.30 + r.relevance * 0.25 + r.maturity * 0.20 + + r.momentum * 0.15 + (6 - r.overlap) * 0.10) DESC + LIMIT 8 + """, (f"%{cat}%",)).fetchall() + for row in rows: + lines.append( + f"- [{row['name']}](https://datatracker.ietf.org/doc/{row['name']}/) — " + f"{row['title']}" + ) + lines.append("") + + # Submission targets + lines.extend([ + "\n## Recommended Submission Targets\n", + "For each category, the best WG to submit new work to.\n", + "| Category | Best WG | Alternatives |", + "|:---------|:--------|:-------------|", + ]) + for s in spread: + real_wgs = [w for w in s["wgs"] if w["wg"] != "none"] + if not real_wgs: + lines.append(f"| {s['category']} | *Individual submission* | - |") + else: + best = real_wgs[0]["wg"] + alts = ", ".join(f"{w['wg']}({w['count']})" for w in real_wgs[1:3]) or "-" + lines.append(f"| {s['category']} | **{best}** | {alts} |") + + report = "\n".join(lines) + path = self.output_dir / "wg-analysis.md" + path.write_text(report) + return str(path) diff --git a/src/webui/PLAN.md b/src/webui/PLAN.md new file mode 100644 index 0000000..e4e406a --- /dev/null +++ b/src/webui/PLAN.md @@ -0,0 +1,80 @@ +# IETF Draft Analyzer — Web Dashboard Architecture + +## Overview + +A read-only Flask dashboard for exploring and visualizing 361+ IETF Internet-Drafts on AI/agent topics. All data comes from the existing SQLite database (`data/drafts.db`) via the `Database` class from `src/ietf_analyzer/db.py`. + +## Tech Stack + +- **Backend**: Flask (simple routes, no blueprints) +- **Database**: Existing SQLite via `ietf_analyzer.db.Database` (read-only) +- **CSS**: Tailwind CSS via CDN (dark theme: slate/gray palette) +- **Charts**: Plotly.js via CDN (all interactive charts rendered client-side) +- **Fonts**: Inter via Google Fonts CDN + +## File Structure + +``` +src/webui/ + __init__.py # Empty package init + app.py # Flask app, all routes + data.py # Data access layer (wraps Database queries, returns JSON-ready dicts) + templates/ + base.html # Dark-themed base with sidebar nav, Tailwind, Plotly CDN + overview.html # Dashboard home: key stats, charts + drafts.html # Draft explorer: search, filter, sortable table + draft_detail.html # Single draft detail page + ideas.html # Ideas explorer with type breakdown + gaps.html # Gap analysis display + ratings.html # Rating distributions and comparisons + landscape.html # UMAP/t-SNE scatter (embeddings) + authors.html # Author network and top contributors + about.html # About page with project info +``` + +## Pages & Routes + +| Route | Template | Description | +|-------|----------|-------------| +| `/` | `overview.html` | Dashboard home: total drafts, rated count, author count, idea count, gap count. Charts: category treemap, timeline, score distribution histogram. | +| `/drafts` | `drafts.html` | Searchable, filterable, sortable table of all drafts with ratings. Pagination. Category chip filters. Score range slider. | +| `/drafts/` | `draft_detail.html` | Single draft: all rating dimensions with notes, categories, authors, ideas extracted, references. | +| `/ideas` | `ideas.html` | All extracted ideas grouped by type. Bar chart of idea types. Searchable. | +| `/gaps` | `gaps.html` | Gap analysis results: severity badges, categories, evidence. | +| `/ratings` | `ratings.html` | Rating analytics: dimension distributions (violin/box), category radar profiles, top-scored drafts. | +| `/landscape` | `landscape.html` | Embedding scatter plot (pre-computed coordinates served as JSON). | +| `/authors` | `authors.html` | Top authors table, org contributions bar chart, co-author network graph. | +| `/about` | `about.html` | Project description, data freshness, counts. | + +## Data Layer (`data.py`) + +Thin wrapper around `Database` that returns plain dicts/lists ready for `jsonify()` or template rendering: + +- `get_overview_stats()` — counts for drafts, ratings, authors, ideas, gaps +- `get_drafts_page(page, per_page, search, category, min_score, sort)` — paginated draft list with ratings +- `get_draft_detail(name)` — single draft + rating + authors + ideas + refs +- `get_category_counts()` — {category: count} for filter chips +- `get_rating_distributions()` — arrays for each dimension for Plotly +- `get_timeline_data()` — monthly counts by category for stacked area +- `get_ideas_by_type()` — grouped idea counts +- `get_all_gaps()` — gap list with severity +- `get_top_authors(limit)` — author leaderboard +- `get_org_data(limit)` — organization contributions +- `get_landscape_coords()` — pre-computed 2D coordinates + metadata + +## Design System + +- **Dark theme**: `bg-slate-900` body, `bg-slate-800` cards, `bg-slate-700` hover states +- **Accent**: Blue-500 (`#3b82f6`) for links, active states, charts +- **Text**: `text-slate-100` primary, `text-slate-400` secondary +- **Cards**: Rounded corners (`rounded-xl`), subtle border (`border-slate-700`) +- **Sidebar**: Fixed left, 240px wide, collapsible on mobile +- **Charts**: Plotly dark theme (`plotly_dark` template), consistent color palette + +## Key Decisions + +1. **Read-only**: No writes to DB. All data comes from CLI pipeline runs. +2. **Server-side rendering**: Templates with Jinja2, chart data passed as JSON. +3. **No build step**: All CSS/JS from CDN. Zero npm/webpack complexity. +4. **Reuse existing queries**: `data.py` calls `Database` methods directly. +5. **Responsive**: Tailwind responsive utilities, sidebar collapses to hamburger. diff --git a/src/webui/__init__.py b/src/webui/__init__.py new file mode 100644 index 0000000..1493712 --- /dev/null +++ b/src/webui/__init__.py @@ -0,0 +1 @@ +# IETF Draft Analyzer — Web Dashboard diff --git a/src/webui/app.py b/src/webui/app.py new file mode 100644 index 0000000..0c7fdef --- /dev/null +++ b/src/webui/app.py @@ -0,0 +1,297 @@ +"""IETF Draft Analyzer — Web Dashboard. + +Run with: python src/webui/app.py +""" + +from __future__ import annotations + +import sys +from pathlib import Path + +# Ensure project src is on path +_project_root = Path(__file__).resolve().parent.parent.parent +sys.path.insert(0, str(_project_root / "src")) + +from flask import Flask, render_template, request, jsonify, abort, g + +from webui.data import ( + get_db, + get_overview_stats, + get_category_counts, + get_drafts_page, + get_draft_detail, + get_rating_distributions, + get_timeline_data, + get_ideas_by_type, + get_all_gaps, + get_gap_detail, + get_generated_drafts, + read_generated_draft, + get_top_authors, + get_org_data, + get_category_radar_data, + get_score_histogram, + get_coauthor_network, + get_cross_org_data, + get_landscape_tsne, + get_similarity_graph, + get_timeline_animation_data, + get_idea_clusters, + get_monitor_status, + get_author_network_full, +) + +app = Flask( + __name__, + template_folder=str(Path(__file__).parent / "templates"), +) +app.config["SECRET_KEY"] = "ietf-dashboard-dev" + + +# --- Database lifecycle (per-request to avoid SQLite threading issues) --- + + +def db(): + if "db" not in g: + g.db = get_db() + return g.db + + +# --- Routes --- + + +@app.route("/") +def overview(): + stats = get_overview_stats(db()) + categories = get_category_counts(db()) + timeline = get_timeline_data(db()) + scores = get_score_histogram(db()) + radar = get_category_radar_data(db()) + return render_template( + "overview.html", + stats=stats, + categories=categories, + timeline=timeline, + scores=scores, + radar=radar, + ) + + +@app.route("/drafts") +def drafts(): + page = request.args.get("page", 1, type=int) + search = request.args.get("q", "") + category = request.args.get("cat", "") + min_score = request.args.get("min_score", 0.0, type=float) + sort = request.args.get("sort", "score") + sort_dir = request.args.get("dir", "desc") + + result = get_drafts_page( + db(), + page=page, + search=search, + category=category, + min_score=min_score, + sort=sort, + sort_dir=sort_dir, + ) + categories = get_category_counts(db()) + return render_template( + "drafts.html", + result=result, + categories=categories, + search=search, + current_cat=category, + min_score=min_score, + sort=sort, + sort_dir=sort_dir, + ) + + +@app.route("/drafts/") +def draft_detail(name: str): + detail = get_draft_detail(db(), name) + if not detail: + abort(404) + return render_template("draft_detail.html", draft=detail) + + +@app.route("/ideas") +def ideas(): + data = get_ideas_by_type(db()) + return render_template("ideas.html", data=data) + + +@app.route("/gaps") +def gaps(): + gap_list = get_all_gaps(db()) + generated = get_generated_drafts() + return render_template("gaps.html", gaps=gap_list, generated_drafts=generated) + + +@app.route("/gaps/demo") +def gaps_demo(): + """Show a pre-generated example draft so users can see output without API calls.""" + generated = get_generated_drafts() + # Default to the first generated draft, or allow selection via query param + selected = request.args.get("file", "") + draft_text = None + draft_info = None + if selected: + draft_text = read_generated_draft(selected) + for g in generated: + if g["filename"] == selected: + draft_info = g + break + elif generated: + draft_info = generated[0] + draft_text = read_generated_draft(draft_info["filename"]) + return render_template( + "gap_demo.html", + generated_drafts=generated, + draft_text=draft_text, + draft_info=draft_info, + selected=selected, + ) + + +@app.route("/gaps/") +def gap_detail(gap_id: int): + gap = get_gap_detail(db(), gap_id) + if not gap: + abort(404) + generated = get_generated_drafts() + return render_template("gap_detail.html", gap=gap, generated_drafts=generated) + + +@app.route("/gaps//generate", methods=["POST"]) +def gap_generate(gap_id: int): + """Trigger draft generation for a gap. Returns JSON with the generated text.""" + gap = get_gap_detail(db(), gap_id) + if not gap: + return jsonify({"error": "Gap not found"}), 404 + + try: + from ietf_analyzer.config import Config + from ietf_analyzer.analyzer import Analyzer + from ietf_analyzer.draftgen import DraftGenerator + + cfg = Config.load() + database = db() + analyzer = Analyzer(cfg, database) + generator = DraftGenerator(cfg, database, analyzer) + + # Generate into a file named after the gap + slug = gap["topic"].lower().replace(" ", "-")[:40] + output_path = str(Path(_project_root) / "data" / "reports" / "generated-drafts" / f"draft-gap-{gap_id}-{slug}.txt") + path = generator.generate(gap["topic"], output_path=output_path) + draft_text = Path(path).read_text(errors="replace") + + return jsonify({ + "success": True, + "text": draft_text, + "filename": Path(path).name, + "path": path, + }) + except Exception as e: + return jsonify({"error": str(e)}), 500 + + +@app.route("/ratings") +def ratings(): + distributions = get_rating_distributions(db()) + radar = get_category_radar_data(db()) + return render_template( + "ratings.html", + dist=distributions, + radar=radar, + ) + + +@app.route("/landscape") +def landscape(): + distributions = get_rating_distributions(db()) + tsne_data = get_landscape_tsne(db()) + return render_template( + "landscape.html", + dist=distributions, + tsne_data=tsne_data, + ) + + +@app.route("/timeline") +def timeline_animation(): + data = get_timeline_animation_data(db()) + return render_template("timeline.html", animation=data) + + +@app.route("/idea-clusters") +def idea_clusters(): + data = get_idea_clusters(db()) + return render_template("idea_clusters.html", clusters=data) + + +@app.route("/similarity") +def similarity(): + network = get_similarity_graph(db()) + return render_template("similarity.html", network=network) + + +@app.route("/authors") +def authors(): + top = get_top_authors(db(), limit=50) + orgs = get_org_data(db(), limit=20) + network = get_author_network_full(db()) + cross_org = get_cross_org_data(db(), limit=20) + return render_template( + "authors.html", + authors=top, + orgs=orgs, + orgs_data=orgs, + network=network, + cross_org=cross_org, + ) + + +@app.route("/monitor") +def monitor_page(): + status = get_monitor_status(db()) + return render_template("monitor.html", status=status) + + +@app.route("/about") +def about(): + stats = get_overview_stats(db()) + return render_template("about.html", stats=stats) + + +# --- API endpoints for AJAX (used by client-side charts) --- + + +@app.route("/api/drafts") +def api_drafts(): + page = request.args.get("page", 1, type=int) + search = request.args.get("q", "") + category = request.args.get("cat", "") + min_score = request.args.get("min_score", 0.0, type=float) + sort = request.args.get("sort", "score") + sort_dir = request.args.get("dir", "desc") + return jsonify( + get_drafts_page(db(), page=page, search=search, category=category, + min_score=min_score, sort=sort, sort_dir=sort_dir) + ) + + +@app.route("/api/stats") +def api_stats(): + return jsonify(get_overview_stats(db())) + + +@app.route("/api/authors/network") +def api_author_network(): + return jsonify(get_author_network_full(db())) + + +if __name__ == "__main__": + print("Starting IETF Draft Analyzer Dashboard on http://127.0.0.1:5000") + app.run(debug=True, host="127.0.0.1", port=5000) diff --git a/src/webui/data.py b/src/webui/data.py new file mode 100644 index 0000000..3b9f04a --- /dev/null +++ b/src/webui/data.py @@ -0,0 +1,767 @@ +"""Data access layer for the web dashboard. + +Thin wrapper around ietf_analyzer.db.Database that returns plain dicts +ready for JSON serialization or Jinja2 template rendering. +""" + +from __future__ import annotations + +import json +import sys +from collections import Counter, defaultdict +from pathlib import Path + +# Add project root to path so we can import ietf_analyzer +_project_root = Path(__file__).resolve().parent.parent.parent +if str(_project_root) not in sys.path: + sys.path.insert(0, str(_project_root / "src")) + +from ietf_analyzer.config import Config +from ietf_analyzer.db import Database + + +def get_db() -> Database: + """Get a Database instance using default config.""" + config = Config.load() + return Database(config) + + +def get_overview_stats(db: Database) -> dict: + """Return high-level stats for the dashboard home page.""" + total_drafts = db.count_drafts() + rated_pairs = db.drafts_with_ratings(limit=1000) + rated_count = len(rated_pairs) + author_count = db.author_count() + idea_count = db.idea_count() + gaps = db.all_gaps() + input_tok, output_tok = db.total_tokens_used() + + return { + "total_drafts": total_drafts, + "rated_count": rated_count, + "author_count": author_count, + "idea_count": idea_count, + "gap_count": len(gaps), + "input_tokens": input_tok, + "output_tokens": output_tok, + } + + +def get_category_counts(db: Database) -> dict[str, int]: + """Return {category: draft_count} for all categories.""" + pairs = db.drafts_with_ratings(limit=1000) + counts: dict[str, int] = Counter() + for _, rating in pairs: + for cat in rating.categories: + counts[cat] += 1 + return dict(counts.most_common()) + + +def get_drafts_page( + db: Database, + page: int = 1, + per_page: int = 50, + search: str = "", + category: str = "", + min_score: float = 0.0, + sort: str = "score", + sort_dir: str = "desc", +) -> dict: + """Return a paginated, filtered list of drafts with ratings. + + Returns dict with keys: drafts, total, page, per_page, pages. + """ + pairs = db.drafts_with_ratings(limit=1000) + + # Filter + filtered = [] + for draft, rating in pairs: + if min_score > 0 and rating.composite_score < min_score: + continue + if category and category not in rating.categories: + continue + if search: + haystack = f"{draft.name} {draft.title} {rating.summary}".lower() + if not all(w in haystack for w in search.lower().split()): + continue + filtered.append((draft, rating)) + + # Sort + sort_keys = { + "score": lambda p: p[1].composite_score, + "name": lambda p: p[0].name, + "date": lambda p: p[0].time or "", + "novelty": lambda p: p[1].novelty, + "maturity": lambda p: p[1].maturity, + "relevance": lambda p: p[1].relevance, + "overlap": lambda p: p[1].overlap, + "momentum": lambda p: p[1].momentum, + } + key_fn = sort_keys.get(sort, sort_keys["score"]) + reverse = sort_dir == "desc" + filtered.sort(key=key_fn, reverse=reverse) + + total = len(filtered) + pages = max(1, (total + per_page - 1) // per_page) + page = max(1, min(page, pages)) + start = (page - 1) * per_page + page_items = filtered[start : start + per_page] + + drafts = [] + for draft, rating in page_items: + drafts.append({ + "name": draft.name, + "title": draft.title, + "date": draft.date, + "url": draft.datatracker_url, + "pages": draft.pages or 0, + "group": draft.group or "individual", + "score": round(rating.composite_score, 2), + "novelty": rating.novelty, + "maturity": rating.maturity, + "overlap": rating.overlap, + "momentum": rating.momentum, + "relevance": rating.relevance, + "categories": rating.categories, + "summary": rating.summary, + }) + + return { + "drafts": drafts, + "total": total, + "page": page, + "per_page": per_page, + "pages": pages, + } + + +def get_draft_detail(db: Database, name: str) -> dict | None: + """Return full detail for a single draft.""" + draft = db.get_draft(name) + if not draft: + return None + + rating = db.get_rating(name) + authors = db.get_authors_for_draft(name) + ideas = db.get_ideas_for_draft(name) + refs = db.get_refs_for_draft(name) + + result = { + "name": draft.name, + "title": draft.title, + "rev": draft.rev, + "abstract": draft.abstract, + "date": draft.date, + "time": draft.time, + "url": draft.datatracker_url, + "text_url": draft.text_url, + "pages": draft.pages, + "words": draft.words, + "group": draft.group or "individual", + "categories": draft.categories, + "tags": draft.tags, + "authors": [ + {"name": a.name, "affiliation": a.affiliation, "person_id": a.person_id} + for a in authors + ], + "ideas": ideas, + "refs": [{"type": t, "id": rid} for t, rid in refs], + } + + if rating: + result["rating"] = { + "score": round(rating.composite_score, 2), + "novelty": rating.novelty, + "maturity": rating.maturity, + "overlap": rating.overlap, + "momentum": rating.momentum, + "relevance": rating.relevance, + "summary": rating.summary, + "novelty_note": rating.novelty_note, + "maturity_note": rating.maturity_note, + "overlap_note": rating.overlap_note, + "momentum_note": rating.momentum_note, + "relevance_note": rating.relevance_note, + "categories": rating.categories, + } + + return result + + +def get_rating_distributions(db: Database) -> dict: + """Return arrays for each rating dimension, suitable for Plotly.""" + pairs = db.drafts_with_ratings(limit=1000) + dims = { + "novelty": [], + "maturity": [], + "overlap": [], + "momentum": [], + "relevance": [], + "scores": [], + "categories": [], + "names": [], + } + for draft, rating in pairs: + dims["novelty"].append(rating.novelty) + dims["maturity"].append(rating.maturity) + dims["overlap"].append(rating.overlap) + dims["momentum"].append(rating.momentum) + dims["relevance"].append(rating.relevance) + dims["scores"].append(round(rating.composite_score, 2)) + dims["categories"].append(rating.categories[0] if rating.categories else "Other") + dims["names"].append(draft.name) + return dims + + +def get_timeline_data(db: Database) -> dict: + """Return monthly counts by category for timeline chart.""" + pairs = db.drafts_with_ratings(limit=1000) + all_drafts = db.list_drafts(limit=1000, order_by="time ASC") + rating_map = {d.name: r for d, r in pairs} + + month_cat: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int)) + for d in all_drafts: + month = d.time[:7] if d.time else "unknown" + r = rating_map.get(d.name) + if r: + cat = r.categories[0] if r.categories else "Other" + month_cat[month][cat] += 1 + + months = sorted(month_cat.keys()) + cat_totals: Counter = Counter() + for mc in month_cat.values(): + for c, cnt in mc.items(): + cat_totals[c] += cnt + top_cats = [c for c, _ in cat_totals.most_common(10)] + + series = {} + for cat in top_cats: + series[cat] = [month_cat[m].get(cat, 0) for m in months] + + return {"months": months, "series": series, "categories": top_cats} + + +def get_ideas_by_type(db: Database) -> dict: + """Return ideas grouped by type with counts.""" + all_ideas = db.all_ideas() + type_counts = Counter(i.get("type", "other") or "other" for i in all_ideas) + return { + "total": len(all_ideas), + "by_type": dict(type_counts.most_common()), + "ideas": all_ideas, + } + + +def get_all_gaps(db: Database) -> list[dict]: + """Return all gap analysis results.""" + return db.all_gaps() + + +def get_gap_detail(db: Database, gap_id: int) -> dict | None: + """Return a single gap by ID, or None if not found.""" + gaps = db.all_gaps() + for g in gaps: + if g["id"] == gap_id: + return g + return None + + +def get_generated_drafts() -> list[dict]: + """Return list of pre-generated draft files in data/reports/generated-drafts/.""" + drafts_dir = _project_root / "data" / "reports" / "generated-drafts" + if not drafts_dir.exists(): + return [] + results = [] + for f in sorted(drafts_dir.glob("draft-*.txt")): + # Extract title from first non-empty content line after header + title = f.stem + text = f.read_text(errors="replace") + for line in text.splitlines(): + stripped = line.strip() + if stripped and not stripped.startswith("Internet-Draft") and \ + not stripped.startswith("Intended status") and \ + not stripped.startswith("Expires:") and stripped != "": + title = stripped + break + results.append({ + "filename": f.name, + "stem": f.stem, + "title": title, + "size": f.stat().st_size, + "path": str(f), + }) + return results + + +def read_generated_draft(filename: str) -> str | None: + """Read a generated draft file by filename. Returns text or None.""" + drafts_dir = _project_root / "data" / "reports" / "generated-drafts" + path = drafts_dir / filename + if not path.exists() or not path.is_file(): + return None + # Safety: ensure we're not reading outside the directory + if not str(path.resolve()).startswith(str(drafts_dir.resolve())): + return None + return path.read_text(errors="replace") + + +def get_top_authors(db: Database, limit: int = 30) -> list[dict]: + """Return top authors by draft count.""" + rows = db.top_authors(limit=limit) + return [ + {"name": name, "affiliation": aff, "draft_count": cnt, "drafts": drafts} + for name, aff, cnt, drafts in rows + ] + + +def get_org_data(db: Database, limit: int = 20) -> list[dict]: + """Return organization contribution data.""" + rows = db.top_orgs(limit=limit) + return [ + {"org": org, "author_count": authors, "draft_count": drafts} + for org, authors, drafts in rows + ] + + +def get_category_radar_data(db: Database) -> dict: + """Return average rating profiles per category for radar chart.""" + pairs = db.drafts_with_ratings(limit=1000) + cat_ratings: dict[str, list] = defaultdict(list) + for _, r in pairs: + for c in r.categories: + cat_ratings[c].append(r) + + top_cats = sorted(cat_ratings.keys(), key=lambda c: len(cat_ratings[c]), reverse=True)[:8] + result = {} + for cat in top_cats: + ratings = cat_ratings[cat] + n = len(ratings) + result[cat] = { + "count": n, + "novelty": round(sum(r.novelty for r in ratings) / n, 2), + "maturity": round(sum(r.maturity for r in ratings) / n, 2), + "relevance": round(sum(r.relevance for r in ratings) / n, 2), + "momentum": round(sum(r.momentum for r in ratings) / n, 2), + "low_overlap": round(sum(6 - r.overlap for r in ratings) / n, 2), + } + return result + + +def get_score_histogram(db: Database) -> list[float]: + """Return list of composite scores for histogram.""" + pairs = db.drafts_with_ratings(limit=1000) + return [round(r.composite_score, 2) for _, r in pairs] + + +def get_coauthor_network(db: Database, min_shared: int = 1) -> dict: + """Return co-authorship network data for force-directed graph. + + Returns {nodes: [{id, name, org, draft_count}], edges: [{source, target, weight}]} + """ + pairs = db.coauthor_pairs() + top = db.top_authors(limit=100) + + # Build node set from authors who have co-authorships + author_info = {name: {"org": aff, "draft_count": cnt} for name, aff, cnt, _ in top} + node_set = set() + edges = [] + for a, b, shared in pairs: + if shared >= min_shared: + node_set.add(a) + node_set.add(b) + edges.append({"source": a, "target": b, "weight": shared}) + + nodes = [] + for name in node_set: + info = author_info.get(name, {"org": "", "draft_count": 1}) + nodes.append({ + "id": name, + "name": name, + "org": info["org"], + "draft_count": info["draft_count"], + }) + + return {"nodes": nodes, "edges": edges} + + +def get_similarity_graph(db: Database, threshold: float = 0.75) -> dict: + """Return draft similarity network for force-directed graph. + + Returns {nodes: [{name, title, category, score}], + edges: [{source, target, similarity}], + stats: {node_count, edge_count, avg_similarity}} + """ + import numpy as np + + embeddings = db.all_embeddings() + if len(embeddings) < 2: + return {"nodes": [], "edges": [], "stats": {"node_count": 0, "edge_count": 0, "avg_similarity": 0}} + + pairs = db.drafts_with_ratings(limit=1000) + rating_map = {d.name: r for d, r in pairs} + draft_map = {d.name: d for d, _ in pairs} + + # Filter to drafts with both embeddings and ratings + names = [n for n in embeddings if n in rating_map] + if len(names) < 2: + return {"nodes": [], "edges": [], "stats": {"node_count": 0, "edge_count": 0, "avg_similarity": 0}} + + matrix = np.array([embeddings[n] for n in names]) + + # L2-normalize and compute cosine similarity + norms = np.linalg.norm(matrix, axis=1, keepdims=True) + norms[norms == 0] = 1.0 + normalized = matrix / norms + sim_matrix = normalized @ normalized.T + + # Find pairs above threshold (upper triangle only) + edges = [] + node_set = set() + for i in range(len(names)): + for j in range(i + 1, len(names)): + sim = float(sim_matrix[i, j]) + if sim >= threshold: + edges.append({"source": names[i], "target": names[j], "similarity": round(sim, 4)}) + node_set.add(names[i]) + node_set.add(names[j]) + + # Build nodes from connected drafts only + nodes = [] + for name in names: + if name not in node_set: + continue + r = rating_map[name] + d = draft_map.get(name) + nodes.append({ + "name": name, + "title": d.title if d else name, + "category": r.categories[0] if r.categories else "Other", + "score": round(r.composite_score, 2), + }) + + avg_sim = round(sum(e["similarity"] for e in edges) / max(len(edges), 1), 4) + + return { + "nodes": nodes, + "edges": edges, + "stats": {"node_count": len(nodes), "edge_count": len(edges), "avg_similarity": avg_sim}, + } + + +def get_cross_org_data(db: Database, limit: int = 20) -> list[dict]: + """Return cross-org collaboration pairs.""" + rows = db.cross_org_collaborations(limit=limit) + return [ + {"org_a": a, "org_b": b, "shared_drafts": cnt} + for a, b, cnt in rows + ] + + +def get_author_network_full(db: Database) -> dict: + """Return enriched co-authorship network with avg scores and cluster info. + + Returns { + nodes: [{id, name, org, draft_count, avg_score, drafts: [name,...]}], + edges: [{source, target, weight}], + clusters: [{id, members: [name,...], org_mix: {org: count}, size}], + } + """ + pairs = db.coauthor_pairs() + top = db.top_authors(limit=500) + + # Build rating lookup for avg scores + rated = db.drafts_with_ratings(limit=2000) + draft_score = {d.name: r.composite_score for d, r in rated} + + # Author info map + author_info = {} + for name, aff, cnt, drafts in top: + scores = [draft_score[dn] for dn in drafts if dn in draft_score] + avg = round(sum(scores) / len(scores), 2) if scores else 0 + author_info[name] = { + "org": aff, "draft_count": cnt, "drafts": drafts, "avg_score": avg + } + + # Build node set: authors with 2+ drafts OR 1+ co-authorship + node_set = set() + edges = [] + for a, b, shared in pairs: + if shared >= 1: + node_set.add(a) + node_set.add(b) + edges.append({"source": a, "target": b, "weight": shared}) + + # Also include authors with 2+ drafts even if no co-authorships + for name, info in author_info.items(): + if info["draft_count"] >= 2: + node_set.add(name) + + nodes = [] + for name in node_set: + info = author_info.get(name, {"org": "", "draft_count": 1, "drafts": [], "avg_score": 0}) + nodes.append({ + "id": name, + "name": name, + "org": info["org"], + "draft_count": info["draft_count"], + "avg_score": info["avg_score"], + "drafts": info["drafts"][:8], # cap for JSON size + }) + + # Cluster detection via connected components (BFS) + adjacency: dict[str, set[str]] = defaultdict(set) + for e in edges: + adjacency[e["source"]].add(e["target"]) + adjacency[e["target"]].add(e["source"]) + + visited: set[str] = set() + clusters = [] + + for node in sorted(node_set): + if node in visited: + continue + component: list[str] = [] + queue = [node] + while queue: + current = queue.pop(0) + if current in visited: + continue + visited.add(current) + component.append(current) + for neighbor in adjacency.get(current, []): + if neighbor not in visited: + queue.append(neighbor) + + if len(component) >= 2: + org_mix: dict[str, int] = Counter() + for m in component: + org = author_info.get(m, {}).get("org", "") + if org: + org_mix[org] += 1 + clusters.append({ + "id": len(clusters), + "members": component, + "org_mix": dict(org_mix.most_common()), + "size": len(component), + }) + + clusters.sort(key=lambda c: c["size"], reverse=True) + + return {"nodes": nodes, "edges": edges, "clusters": clusters} + + +def get_idea_clusters(db: Database) -> dict: + """Cluster ideas by embedding similarity, return clusters + t-SNE scatter.""" + import numpy as np + + embeddings = db.all_idea_embeddings() + if not embeddings: + return {"clusters": [], "scatter": [], "stats": {"total": 0, "clustered": 0, "num_clusters": 0}, "empty": True} + + # Fetch ideas with IDs for metadata lookup + rows = db.conn.execute("SELECT id, title, description, idea_type, draft_name FROM ideas").fetchall() + idea_map = {r["id"]: {"title": r["title"], "description": r["description"], + "type": r["idea_type"], "draft_name": r["draft_name"]} for r in rows} + + # Build matrix from embeddings that have matching ideas + idea_ids = [iid for iid in embeddings if iid in idea_map] + if len(idea_ids) < 5: + return {"clusters": [], "scatter": [], "stats": {"total": len(idea_ids), "clustered": 0, "num_clusters": 0}, "empty": True} + + matrix = np.array([embeddings[iid] for iid in idea_ids]) + + # Agglomerative clustering with cosine distance + try: + from sklearn.cluster import AgglomerativeClustering + clustering = AgglomerativeClustering( + n_clusters=None, distance_threshold=0.5, + metric='cosine', linkage='average', + ) + labels = clustering.fit_predict(matrix) + except Exception: + return {"clusters": [], "scatter": [], "stats": {"total": len(idea_ids), "clustered": 0, "num_clusters": 0}, "empty": True} + + # Build cluster data + cluster_ideas: dict[int, list] = defaultdict(list) + for idx, iid in enumerate(idea_ids): + cluster_ideas[labels[idx]].append(iid) + + # Filter to clusters with 2+ ideas + stop = {"a", "an", "the", "of", "for", "in", "to", "and", "or", "with", "on", "by", "is", "as", "at", "from", "that", "this", "it"} + clusters = [] + for cid in sorted(cluster_ideas.keys()): + members = cluster_ideas[cid] + if len(members) < 2: + continue + ideas_in_cluster = [idea_map[iid] for iid in members if iid in idea_map] + # Theme: most common significant words in titles + words = Counter() + for idea in ideas_in_cluster: + for w in idea["title"].lower().split(): + w_clean = w.strip("()[].,;:-\"'") + if len(w_clean) > 2 and w_clean not in stop: + words[w_clean] += 1 + top_words = [w for w, _ in words.most_common(4)] + theme = " ".join(top_words).title() if top_words else f"Cluster {cid}" + + drafts = list({idea["draft_name"] for idea in ideas_in_cluster}) + clusters.append({ + "id": len(clusters), + "theme": theme, + "size": len(ideas_in_cluster), + "ideas": ideas_in_cluster[:20], + "drafts": drafts, + }) + + # t-SNE for scatter + scatter = [] + try: + from sklearn.manifold import TSNE + perp = min(30, len(idea_ids) - 1) + tsne = TSNE(n_components=2, perplexity=perp, random_state=42, max_iter=500) + coords = tsne.fit_transform(matrix) + for idx, iid in enumerate(idea_ids): + info = idea_map.get(iid, {}) + scatter.append({ + "x": round(float(coords[idx, 0]), 3), + "y": round(float(coords[idx, 1]), 3), + "cluster_id": int(labels[idx]), + "title": info.get("title", ""), + "draft_name": info.get("draft_name", ""), + }) + except Exception: + pass + + total = len(idea_ids) + clustered = sum(c["size"] for c in clusters) + return { + "clusters": clusters, + "scatter": scatter, + "stats": {"total": total, "clustered": clustered, "num_clusters": len(clusters)}, + "empty": False, + } + + +def get_timeline_animation_data(db: Database) -> dict: + """Compute t-SNE on all drafts, return points with month info + category_monthly. + + t-SNE is computed once on ALL drafts so coordinates are stable across + animation frames. Each point carries a ``month`` field (YYYY-MM) so the + front-end can build cumulative animation frames. + """ + import numpy as np + + embeddings = db.all_embeddings() + if len(embeddings) < 5: + return {"points": [], "months": [], "category_monthly": {}} + + pairs = db.drafts_with_ratings(limit=1000) + rating_map = {d.name: r for d, r in pairs} + draft_map = {d.name: d for d, _ in pairs} + + # Filter to drafts that have both embeddings and ratings + names = [n for n in embeddings if n in rating_map] + if len(names) < 5: + return {"points": [], "months": [], "category_monthly": {}} + + matrix = np.array([embeddings[n] for n in names]) + + try: + from sklearn.manifold import TSNE + tsne = TSNE(n_components=2, perplexity=min(30, len(names) - 1), + random_state=42, max_iter=500) + coords = tsne.fit_transform(matrix) + except Exception: + return {"points": [], "months": [], "category_monthly": {}} + + # Build points with month + points = [] + month_set: set[str] = set() + category_monthly: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int)) + + for i, name in enumerate(names): + r = rating_map[name] + d = draft_map.get(name) + month = (d.time[:7] if d and d.time else "unknown") + cat = r.categories[0] if r.categories else "Other" + month_set.add(month) + category_monthly[month][cat] += 1 + points.append({ + "name": name, + "title": d.title if d else name, + "x": round(float(coords[i, 0]), 3), + "y": round(float(coords[i, 1]), 3), + "category": cat, + "score": round(r.composite_score, 2), + "month": month, + }) + + months = sorted(month_set) + # Convert defaultdict to plain dict for JSON + cat_monthly_plain = {m: dict(cats) for m, cats in category_monthly.items()} + + return { + "points": points, + "months": months, + "category_monthly": cat_monthly_plain, + } + + +def get_monitor_status(db: Database) -> dict: + """Return monitoring status data for dashboard.""" + runs = db.get_monitor_runs(limit=20) + last = runs[0] if runs else None + unrated = len(db.unrated_drafts(limit=9999)) + unembedded = len(db.drafts_without_embeddings(limit=9999)) + no_ideas = len(db.drafts_without_ideas(limit=9999)) + return { + "last_run": last, + "runs": runs, + "unprocessed": {"unrated": unrated, "unembedded": unembedded, "no_ideas": no_ideas}, + "total_runs": len(runs), + } + + +def get_landscape_tsne(db: Database) -> list[dict]: + """Compute t-SNE from embeddings, return [{name, title, x, y, category, score}]. + + Uses cached coordinates if available, otherwise computes fresh. + """ + import numpy as np + + embeddings = db.all_embeddings() + if len(embeddings) < 5: + return [] + + pairs = db.drafts_with_ratings(limit=1000) + rating_map = {d.name: r for d, r in pairs} + draft_map = {d.name: d for d, _ in pairs} + + # Filter to drafts that have both embeddings and ratings + names = [n for n in embeddings if n in rating_map] + if len(names) < 5: + return [] + + matrix = np.array([embeddings[n] for n in names]) + + try: + from sklearn.manifold import TSNE + tsne = TSNE(n_components=2, perplexity=min(30, len(names) - 1), + random_state=42, max_iter=500) + coords = tsne.fit_transform(matrix) + except Exception: + return [] + + result = [] + for i, name in enumerate(names): + r = rating_map[name] + d = draft_map.get(name) + result.append({ + "name": name, + "title": d.title if d else name, + "x": round(float(coords[i, 0]), 3), + "y": round(float(coords[i, 1]), 3), + "category": r.categories[0] if r.categories else "Other", + "score": round(r.composite_score, 2), + }) + return result diff --git a/src/webui/templates/about.html b/src/webui/templates/about.html new file mode 100644 index 0000000..163d625 --- /dev/null +++ b/src/webui/templates/about.html @@ -0,0 +1,65 @@ +{% extends "base.html" %} +{% set active_page = "about" %} + +{% block title %}About — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

About IETF Draft Analyzer

+ +
+

What is this?

+

+ A tool for tracking, categorizing, rating, and mapping IETF Internet-Drafts + focused on AI and agent-related topics. It uses Claude for analysis and rating, + Ollama for embeddings, and SQLite for storage. +

+

+ The dashboard provides interactive visualizations of the draft landscape, + including category breakdowns, rating distributions, author networks, + extracted ideas, and gap analysis. +

+
+ +
+

Current Data

+
+
+
Total Drafts
+
{{ stats.total_drafts }}
+
+
+
Rated Drafts
+
{{ stats.rated_count }}
+
+
+
Authors Tracked
+
{{ stats.author_count }}
+
+
+
Ideas Extracted
+
{{ stats.idea_count }}
+
+
+
Gaps Identified
+
{{ stats.gap_count }}
+
+
+
API Tokens Used
+
{{ "{:,}".format(stats.input_tokens + stats.output_tokens) }}
+
+
+
+ +
+

Tech Stack

+
    +
  • Analysis: Claude (Sonnet for analysis, Haiku for bulk)
  • +
  • Embeddings: Ollama (nomic-embed-text)
  • +
  • Storage: SQLite with FTS5 full-text search
  • +
  • Dashboard: Flask, Tailwind CSS, Plotly.js
  • +
  • Data source: IETF Datatracker API
  • +
+
+
+{% endblock %} diff --git a/src/webui/templates/authors.html b/src/webui/templates/authors.html new file mode 100644 index 0000000..4104c0c --- /dev/null +++ b/src/webui/templates/authors.html @@ -0,0 +1,598 @@ +{% extends "base.html" %} +{% set active_page = "authors" %} + +{% block title %}Author Network — IETF Draft Analyzer{% endblock %} + +{% block extra_head %} + + +{% endblock %} + +{% block content %} +
+

Author Network

+

Interactive collaboration graph of {{ network.nodes | length }} authors across {{ orgs | length }} organizations

+
+ + +
+
+
+
Authors Shown
+
{{ network.nodes | length }}
+
+
+
+
Organizations
+
{{ orgs | length }}
+
+
+
+
Co-Author Links
+
{{ network.edges | length }}
+
+
+
+
Clusters
+
{{ network.clusters | length }}
+
+
+
+
Multi-Draft
+
{{ authors | selectattr('draft_count', 'gt', 1) | list | length }}
+
+
+ + +
+
+
+

Co-Authorship Network

+

Node size = draft count. Color = organization. Edge thickness = shared drafts. Drag nodes to rearrange. Scroll to zoom.

+
+
+ + +
+
+
+ +
+
+ +
+
+ +
+ +
+

Organizations by Draft Count

+

Color intensity = number of authors from that org.

+
+
+ + +
+

Cross-Organization Collaboration

+

Organizations co-authoring drafts together.

+
+
+
+ + +{% if network.clusters %} +
+

Collaboration Clusters

+

Connected groups of authors who co-author drafts. Click a cluster to highlight it in the graph.

+
+ {% for c in network.clusters[:12] %} +
+
+ Cluster #{{ c.id + 1 }} + {{ c.size }} authors +
+
+ {% for org, count in c.org_mix.items() %} + {{ org }} ({{ count }}) + {% endfor %} +
+
+ {{ c.members[:5] | join(', ') }}{% if c.members | length > 5 %} +{{ c.members | length - 5 }} more{% endif %} +
+
+ {% endfor %} +
+
+{% endif %} + + +
+ +
+
+

Top Authors

+ Showing top {{ authors | length }} +
+
+ + + + + + + + + + + {% for a in authors %} + + + + + + + {% endfor %} + +
#AuthorOrganizationDrafts
{{ loop.index }} + {{ a.name }} + {{ a.affiliation }} + + {{ a.draft_count }} + +
+
+
+ + +
+
+

Organization Stats

+
+
+ {% for o in orgs %} +
+
+ {{ o.org }} + {{ o.draft_count }} drafts +
+
+ {{ o.author_count }} author{{ 's' if o.author_count != 1 }} + | + {{ (o.draft_count / o.author_count) | round(1) }} drafts/author +
+
+
+
+
+ {% endfor %} +
+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/base.html b/src/webui/templates/base.html new file mode 100644 index 0000000..3966059 --- /dev/null +++ b/src/webui/templates/base.html @@ -0,0 +1,165 @@ + + + + + + {% block title %}IETF Draft Analyzer{% endblock %} + + + + + + + {% block extra_head %}{% endblock %} + + + + + + + + + +
+
+ {% block content %}{% endblock %} +
+
+ + + {% block extra_scripts %}{% endblock %} + + diff --git a/src/webui/templates/draft_detail.html b/src/webui/templates/draft_detail.html new file mode 100644 index 0000000..5486d17 --- /dev/null +++ b/src/webui/templates/draft_detail.html @@ -0,0 +1,298 @@ +{% extends "base.html" %} +{% set active_page = "drafts" %} + +{% block title %}{{ draft.name }} — IETF Draft Analyzer{% endblock %} + +{% block extra_head %} + +{% endblock %} + +{% block content %} + +
+ + + + + Back to Explorer + +

{{ draft.title }}

+
+ {{ draft.name }} + {% if draft.rev %} + rev {{ draft.rev }} + {% endif %} + {{ draft.date }} + {% if draft.rating %} + + {{ draft.rating.score }} + + {% endif %} +
+
+ +
+ +
+ +
+

+ + Abstract +

+

{{ draft.abstract or "No abstract available." }}

+
+ + + {% if draft.rating %} +
+

+ + AI Rating Analysis +

+ {% if draft.rating.summary %} +

{{ draft.rating.summary }}

+ {% endif %} +
+ {% for dim, label, icon in [ + ("novelty", "Novelty", "M13 10V3L4 14h7v7l9-11h-7z"), + ("maturity", "Maturity", "M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z"), + ("overlap", "Overlap", "M8 16H6a2 2 0 01-2-2V6a2 2 0 012-2h8a2 2 0 012 2v2m-6 12h8a2 2 0 002-2v-8a2 2 0 00-2-2h-8a2 2 0 00-2 2v8a2 2 0 002 2z"), + ("momentum", "Momentum", "M13 7h8m0 0v8m0-8l-8 8-4-4-6 6"), + ("relevance", "Relevance", "M5 3v4M3 5h4M6 17v4m-2-2h4m5-16l2.286 6.857L21 12l-5.714 2.143L13 21l-2.286-6.857L5 12l5.714-2.143L13 3z") + ] %} + {% set val = draft.rating[dim] %} +
+
+
+ + {{ label }} +
+ {{ val }}/5 +
+
+
+
+ {% if draft.rating[dim + '_note'] %} +

{{ draft.rating[dim + '_note'] }}

+ {% endif %} +
+ {% endfor %} +
+
+ {% endif %} + + + {% if draft.ideas %} +
+

+ + Extracted Ideas ({{ draft.ideas|length }}) +

+
+ {% for idea in draft.ideas %} +
+
+ {{ idea.title }} + {% if idea.type %} + {% set type_lower = idea.type|lower %} + + {{ idea.type }} + + {% endif %} +
+ {% if idea.description %} +

{{ idea.description }}

+ {% endif %} +
+ {% endfor %} +
+
+ {% endif %} +
+ + +
+ + {% if draft.rating %} +
+
+
+
+ {{ draft.rating.score }} +
+
Score
+
+
+ +
+ {% for dim, abbr in [("novelty","N"), ("maturity","M"), ("overlap","O"), ("momentum","Mo"), ("relevance","R")] %} + {% set v = draft.rating[dim] %} +
+
{{ v }}
+
{{ abbr }}
+
+ {% endfor %} +
+
+ {% endif %} + + +
+

+ + Metadata +

+
+
Date
{{ draft.date }}
+
Revision
{{ draft.rev or 'N/A' }}
+
Pages
{{ draft.pages or 'N/A' }}
+
Words
{{ '{:,}'.format(draft.words) if draft.words else 'N/A' }}
+
Working Group
{{ draft.group }}
+
+
+ + + View on Datatracker + + {% if draft.text_url %} + + + Read Full Text + + {% endif %} +
+
+ + + {% if draft.authors %} +
+

+ + Authors ({{ draft.authors|length }}) +

+
    + {% for a in draft.authors %} +
  • +
    + {{ a.name[0]|upper if a.name else '?' }} +
    +
    + {{ a.name }} + {% if a.affiliation %} +
    {{ a.affiliation }}
    + {% endif %} +
    +
  • + {% endfor %} +
+
+ {% endif %} + + + {% if draft.rating and draft.rating.categories %} +
+

+ + Categories +

+
+ {% for cat in draft.rating.categories %} + + {{ cat }} + + {% endfor %} +
+
+ {% endif %} + + + {% if draft.refs %} +
+

+ + References ({{ draft.refs|length }}) +

+
+ {% for ref in draft.refs %} + {% if ref.type == 'rfc' %} + + RFC {{ ref.id.replace('rfc', '') }} + + {% elif ref.type == 'draft' %} + + {{ ref.id }} + + {% else %} + + {{ ref.type|upper }} {{ ref.id }} + + {% endif %} + {% endfor %} +
+
+ {% endif %} +
+
+{% endblock %} diff --git a/src/webui/templates/drafts.html b/src/webui/templates/drafts.html new file mode 100644 index 0000000..04d6e14 --- /dev/null +++ b/src/webui/templates/drafts.html @@ -0,0 +1,369 @@ +{% extends "base.html" %} +{% set active_page = "drafts" %} + +{% block title %}Draft Explorer — IETF Draft Analyzer{% endblock %} + +{% block extra_head %} + +{% endblock %} + +{% block content %} + +
+

Draft Explorer

+

Browse, search, and filter {{ result.total }} rated Internet-Drafts on AI and agent topics.

+
+ + +
+
+ +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+ +
+ + +
+
+ + +
+
+ + + {{ min_score }} +
+
+ + + Reset + +
+
+ + + {% if categories %} +
+
+ All + {% for cat, count in categories.items() %} + + {{ cat }} {{ count }} + + {% endfor %} +
+
+ {% endif %} +
+
+ + +
+

+ Showing {{ result.drafts|length }} of + {{ result.total }} drafts + {% if search %} matching "{{ search }}"{% endif %} + {% if current_cat %} in {{ current_cat }}{% endif %} + {% if min_score > 0 %} with score >= {{ min_score }}{% endif %} +

+ {% if result.pages > 1 %} +

Page {{ result.page }} of {{ result.pages }}

+ {% endif %} +
+ + +
+
+ + + + {% macro sort_header(field, label, extra_class="", title="") %} + {% set is_active = sort == field %} + {% set next_dir = 'asc' if (is_active and sort_dir == 'desc') else 'desc' %} + + {% endmacro %} + {{ sort_header("score", "Score", "w-20") }} + {{ sort_header("name", "Draft") }} + {{ sort_header("date", "Date", "w-24 hidden md:table-cell") }} + {{ sort_header("novelty", "Nov", "w-20 hidden lg:table-cell", "Novelty") }} + {{ sort_header("maturity", "Mat", "w-20 hidden lg:table-cell", "Maturity") }} + {{ sort_header("relevance", "Rel", "w-20 hidden lg:table-cell", "Relevance") }} + {{ sort_header("momentum", "Mom", "w-20 hidden xl:table-cell", "Momentum") }} + {{ sort_header("overlap", "Ovl", "w-20 hidden xl:table-cell", "Overlap") }} + + + + + {% for d in result.drafts %} + + + + + + + + + {% macro dim_cell(value) %} + + {% endmacro %} + {{ dim_cell(d.novelty) }} + {{ dim_cell(d.maturity) }} + {{ dim_cell(d.relevance) }} + + + + + + {% endfor %} + {% if not result.drafts %} + + + + {% endif %} + +
+ + {{ label }} + {% if is_active %} + + + + {% endif %} + +
+ + {{ d.score }} + + + + {{ d.title }} + +
{{ d.name }}
+ {% if d.summary %} +
{{ d.summary }}
+ {% endif %} +
+ + + +

No drafts match your filters.

+ Clear all filters +
+
+
+ + +{% if result.pages > 1 %} + +{% endif %} +{% endblock %} diff --git a/src/webui/templates/gap_demo.html b/src/webui/templates/gap_demo.html new file mode 100644 index 0000000..05581ad --- /dev/null +++ b/src/webui/templates/gap_demo.html @@ -0,0 +1,118 @@ +{% extends "base.html" %} +{% set active_page = "gaps" %} + +{% block title %}Draft Demo — Gap Explorer{% endblock %} + +{% block extra_head %} + +{% endblock %} + +{% block content %} + + + +
+

Generated Draft Demo

+

+ Pre-generated Internet-Drafts addressing identified gaps. + These were generated by the gap-to-draft pipeline using Claude to write each section. +

+
+ +{% if not generated_drafts %} +
+ +

No generated drafts found yet.

+

Use the gap detail page to generate one, or run ietf draft-gen from the CLI.

+
+{% else %} + +
+ +
+
+
+

{{ generated_drafts | length }} Generated Draft{{ 's' if generated_drafts | length != 1 }}

+
+
+ {% for gd in generated_drafts %} + +
{{ gd.title }}
+
{{ gd.stem }}
+
{{ (gd.size / 1024) | round(1) }} KB
+
+ {% endfor %} +
+
+
+ + +
+ {% if draft_text %} +
+
+
+

{{ draft_info.title if draft_info else 'Draft' }}

+ {{ draft_info.filename if draft_info }} +
+ +
+
+
{{ draft_text }}
+
+
+ {% else %} +
+

Select a draft from the list to view it.

+
+ {% endif %} +
+
+{% endif %} +{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/gap_detail.html b/src/webui/templates/gap_detail.html new file mode 100644 index 0000000..51d7cc3 --- /dev/null +++ b/src/webui/templates/gap_detail.html @@ -0,0 +1,211 @@ +{% extends "base.html" %} +{% set active_page = "gaps" %} + +{% block title %}{{ gap.topic }} — Gap Explorer{% endblock %} + +{% block extra_head %} + +{% endblock %} + +{% block content %} + + + + +
+ +
+

{{ gap.topic }}

+ + {{ gap.severity | upper }} + +
+ + {% if gap.category %} + {{ gap.category }} + {% endif %} + +
+
+

Description

+

{{ gap.description }}

+
+ + {% if gap.evidence %} +
+

Evidence

+

{{ gap.evidence }}

+
+ {% endif %} +
+ + +
+ + + Search related drafts + + {% if gap.category %} + | + + + Browse {{ gap.category }} drafts + + {% endif %} +
+
+ + +
+
+
+

Generate Internet-Draft

+

Use AI to generate a full Internet-Draft addressing this gap

+
+ +
+ + + + + + + + + + + +
+ Want to see what generated drafts look like without waiting? + View the demo page + with {{ generated_drafts | length }} pre-generated examples. +
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/gaps.html b/src/webui/templates/gaps.html new file mode 100644 index 0000000..c19d814 --- /dev/null +++ b/src/webui/templates/gaps.html @@ -0,0 +1,89 @@ +{% extends "base.html" %} +{% set active_page = "gaps" %} + +{% block title %}Gap Explorer — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Gap Explorer

+

{{ gaps | length }} identified gaps in AI/agent standards coverage — click any gap to explore details or generate a draft

+
+ + +
+ + + View Demo Draft + + {% if generated_drafts %} + + {{ generated_drafts | length }} draft{{ 's' if generated_drafts | length != 1 }} already generated + + {% endif %} +
+ + +{% set ns = namespace(critical=0, high=0, medium=0, low=0) %} +{% for gap in gaps %} + {% if gap.severity == 'critical' %}{% set ns.critical = ns.critical + 1 %} + {% elif gap.severity == 'high' %}{% set ns.high = ns.high + 1 %} + {% elif gap.severity == 'medium' %}{% set ns.medium = ns.medium + 1 %} + {% else %}{% set ns.low = ns.low + 1 %} + {% endif %} +{% endfor %} + +
+
+
{{ gaps | length }}
+
Total Gaps
+
+
+
{{ ns.critical }}
+
Critical
+
+
+
{{ ns.high }}
+
High
+
+
+
{{ ns.medium }}
+
Medium
+
+
+
{{ ns.low }}
+
Low
+
+
+ + + +{% endblock %} diff --git a/src/webui/templates/idea_clusters.html b/src/webui/templates/idea_clusters.html new file mode 100644 index 0000000..531ac75 --- /dev/null +++ b/src/webui/templates/idea_clusters.html @@ -0,0 +1,200 @@ +{% extends "base.html" %} +{% set active_page = "idea_clusters" %} + +{% block title %}Idea Clusters — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Idea Clusters

+

Extracted ideas grouped by semantic similarity using embedding-based clustering

+
+ + + + +{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/ideas.html b/src/webui/templates/ideas.html new file mode 100644 index 0000000..8c651cf --- /dev/null +++ b/src/webui/templates/ideas.html @@ -0,0 +1,124 @@ +{% extends "base.html" %} +{% set active_page = "ideas" %} + +{% block title %}Ideas — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Extracted Ideas

+

{{ data.total }} technical ideas extracted from rated drafts

+
+ + +
+
+
{{ data.total }}
+
Total Ideas
+
+
+
{{ data.by_type | length }}
+
Idea Types
+
+ {% set top_type = data.by_type.keys() | list %} + {% if top_type %} +
+
{{ top_type[0] }}
+
Most Common Type
+
+
+
{{ data.by_type[top_type[0]] }}
+
{{ top_type[0] }} Count
+
+ {% endif %} +
+ + +
+

Ideas by Type

+
+
+ + +
+
+ + +
+
+ {{ data.ideas | length }} ideas shown +
+
+ {% for idea in data.ideas %} +
+
+ {{ idea.title }} + {% if idea.type %} + {{ idea.type }} + {% endif %} +
+

{{ idea.description }}

+ {{ idea.draft_name }} +
+ {% endfor %} +
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/landscape.html b/src/webui/templates/landscape.html new file mode 100644 index 0000000..810c86d --- /dev/null +++ b/src/webui/templates/landscape.html @@ -0,0 +1,232 @@ +{% extends "base.html" %} +{% set active_page = "landscape" %} + +{% block title %}Landscape — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Draft Landscape

+

Multi-dimensional visualization of the AI/agent draft space

+
+ + +
+

Embedding Landscape (t-SNE)

+

768-dim embeddings projected to 2D. Color = category, size = composite score. Click for draft detail.

+
+
+ + +
+

Novelty vs Maturity

+

Bubble size = composite score, color = category. Hover for details.

+
+
+ +
+ +
+

Innovation-Uniqueness Quadrant

+

Novelty vs Overlap — find the novel and unique drafts.

+
+
+ + +
+

Score Distributions

+

Violin plots for each rating dimension.

+
+
+
+ + +
+

Category Distribution

+

Number of rated drafts per primary category.

+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/monitor.html b/src/webui/templates/monitor.html new file mode 100644 index 0000000..9e5df7a --- /dev/null +++ b/src/webui/templates/monitor.html @@ -0,0 +1,191 @@ +{% extends "base.html" %} +{% set active_page = "monitor" %} + +{% block title %}Monitor — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Live Monitor

+

Track automated monitoring runs and pipeline status

+
+ +
+ +{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/overview.html b/src/webui/templates/overview.html new file mode 100644 index 0000000..2065cd6 --- /dev/null +++ b/src/webui/templates/overview.html @@ -0,0 +1,205 @@ +{% extends "base.html" %} +{% set active_page = "overview" %} + +{% block title %}Overview — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Dashboard Overview

+

IETF AI/Agent Internet-Drafts at a glance

+
+ + + + + +
+
+

Composite Score Distribution

+
+
+
+

Drafts by Category

+
+
+
+ + +
+

Submissions Over Time

+
+
+ + +
+

Category Rating Profiles

+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/ratings.html b/src/webui/templates/ratings.html new file mode 100644 index 0000000..6c297ad --- /dev/null +++ b/src/webui/templates/ratings.html @@ -0,0 +1,211 @@ +{% extends "base.html" %} +{% set active_page = "ratings" %} + +{% block title %}Ratings — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Rating Analytics

+

Distribution and analysis of AI-generated ratings

+
+ + +
+

Composite Score Distribution

+
+
+ +
+ +
+

Score Distributions by Dimension

+
+
+ +
+

Category Rating Profiles

+
+
+
+ + +
+

Novelty vs Maturity (bubble = relevance)

+
+
+ + +
+
+

Top 20 Drafts by Composite Score

+
+
+ + + + + + + + + + + + + + + + +
#DraftScoreNoveltyMaturityRelevanceMomentumOverlapCategory
+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/similarity.html b/src/webui/templates/similarity.html new file mode 100644 index 0000000..771fcfd --- /dev/null +++ b/src/webui/templates/similarity.html @@ -0,0 +1,249 @@ +{% extends "base.html" %} +{% set active_page = "similarity" %} + +{% block title %}Similarity — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Draft Similarity Graph

+

Force-directed graph of draft-to-draft semantic similarity based on embeddings

+
+ + +
+
+
+
Connected Drafts
+
0
+
+
+
+
Similarity Links
+
0
+
+
+
+
Avg Similarity
+
0
+
+
+ + +
+
+ + + 0.75 + (0 edges visible) +
+
+ + +
+

Similarity Network

+

Node size = composite score, color = category. Edge opacity = similarity strength. Click a node to view draft detail.

+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/timeline.html b/src/webui/templates/timeline.html new file mode 100644 index 0000000..417634e --- /dev/null +++ b/src/webui/templates/timeline.html @@ -0,0 +1,241 @@ +{% extends "base.html" %} +{% set active_page = "timeline" %} + +{% block title %}Timeline — IETF Draft Analyzer{% endblock %} + +{% block content %} +
+

Timeline Animation

+

Watch the AI/agent draft landscape evolve month by month

+
+ + +
+
+ + +
+

Animated Embedding Landscape

+

t-SNE projection with cumulative drafts per month. Color = category, size = composite score. Press Play to animate.

+
+ +
+
+
+ + +
+

Category Submissions Over Time

+

Stacked area chart showing draft submissions by category per month.

+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %}