From 8515e46d5d8ae3576581a2bbbed5361e45e76fba Mon Sep 17 00:00:00 2001 From: Christian Nennemann Date: Sun, 8 Mar 2026 19:58:40 +0100 Subject: [PATCH] Architecture designer, author cluster names, FP filtering, new pages - Add /architecture page: system-of-systems view with 8 layers, component cards, gap markers, source coverage chart, and clickable detail sidebar - Give author clusters meaningful names from orgs + draft topic keywords - Filter false positives (73 drafts, 54 ideas) from idea clusters, architecture, ideas listing, and search results - Add NIST source fetcher with curated catalog of 11 AI publications - New pages: trends, complexity, sources, false positives, idea analysis - Clickable gap cards with full details (evidence, priority, nearby work) - Component detail panel with linked drafts and top ideas Co-Authored-By: Claude Opus 4.6 --- data/reports/complexity.md | 53 + data/reports/false-positives.md | 152 ++ data/reports/sources.md | 44 + data/reports/trends.md | 190 ++- src/ietf_analyzer/cli.py | 45 + src/ietf_analyzer/db.py | 26 +- src/ietf_analyzer/reports.py | 842 +++++++++++ src/ietf_analyzer/sources/nist.py | 227 +++ src/webui/app.py | 105 +- src/webui/data.py | 1693 +++++++++++++++++++++- src/webui/templates/architecture.html | 465 ++++++ src/webui/templates/authors.html | 2 +- src/webui/templates/base.html | 24 + src/webui/templates/citations.html | 647 +++++++-- src/webui/templates/complexity.html | 332 +++++ src/webui/templates/false_positives.html | 215 +++ src/webui/templates/idea_analysis.html | 330 +++++ src/webui/templates/sources.html | 198 +++ src/webui/templates/trends_analysis.html | 284 ++++ 19 files changed, 5672 insertions(+), 202 deletions(-) create mode 100644 data/reports/complexity.md create mode 100644 data/reports/false-positives.md create mode 100644 data/reports/sources.md create mode 100644 src/ietf_analyzer/sources/nist.py create mode 100644 src/webui/templates/architecture.html create mode 100644 src/webui/templates/complexity.html create mode 100644 src/webui/templates/false_positives.html create mode 100644 src/webui/templates/idea_analysis.html create mode 100644 src/webui/templates/sources.html create mode 100644 src/webui/templates/trends_analysis.html diff --git a/data/reports/complexity.md b/data/reports/complexity.md new file mode 100644 index 0000000..8779886 --- /dev/null +++ b/data/reports/complexity.md @@ -0,0 +1,53 @@ +# Draft Complexity Matrix +*Generated 2026-03-08 18:05 UTC — 688 rated drafts (57.6% have page data)* + +## Correlation Matrix + +Pearson r between complexity metrics and rating dimensions. + +| Metric | Novelty | Maturity | Overlap | Momentum | Relevance | +|--------|------: | ------: | ------: | ------: | ------: | +| Pages | +0.164 | +0.475 | -0.072 | +0.260 | +0.140 | +| Author Count | +0.151 | -0.152 | +0.011 | +0.016 | +0.169 | +| Citation Count | +0.254 | +0.033 | -0.058 | +0.069 | +0.212 | +| Idea Count | +0.187 | +0.019 | -0.033 | +0.137 | +0.127 | +| Category Count | +0.236 | -0.021 | +0.031 | +0.165 | +0.301 | + +## Top 10 Most Complex Drafts + +| # | Draft | Pages | Authors | Citations | Ideas | Score | Complexity | +|---|-------|------:|--------:|----------:|------:|------:|-----------:| +| 1 | draft-templin-intarea-aero2 | 113 | 1 | 83 | 1 | 3.40 | 0.572 | +| 2 | draft-templin-intarea-aero | 120 | 1 | 73 | 1 | 3.40 | 0.556 | +| 3 | draft-templin-6man-aero3 | 99 | 1 | 82 | 1 | 3.40 | 0.542 | +| 4 | draft-ietf-anima-constrained-voucher | 93 | 4 | 62 | 1 | 4.20 | 0.520 | +| 5 | draft-ietf-ace-edhoc-oscore-profile | 89 | 4 | 52 | 1 | 3.60 | 0.482 | +| 6 | draft-ietf-rats-corim | 127 | 5 | 0 | 1 | 3.40 | 0.417 | +| 7 | draft-barnes-hpke-hpke | 125 | 4 | 0 | 1 | 3.80 | 0.396 | +| 8 | draft-xu-rtgwg-fare-in-sun | 9 | 15 | 6 | 1 | 2.80 | 0.369 | +| 9 | draft-cui-ai-agent-discovery-invocation | 18 | 3 | 10 | 3 | 3.80 | 0.366 | +| 10 | draft-narajala-ans | 48 | 4 | 12 | 2 | 3.60 | 0.364 | + +## Top 10 Most Efficient Drafts + +*High ratings despite low structural complexity.* + +| # | Draft | Pages | Authors | Score | Efficiency | +|---|-------|------:|--------:|------:|-----------:| +| 1 | w3c-UAAG20-Reference | - | 0 | 3.60 | 20.6 | +| 2 | iso-ts-23860-2022 | - | 0 | 3.40 | 19.4 | +| 3 | iso-ts-17575-3-2011-cor-1-2013 | - | 0 | 3.20 | 18.3 | +| 4 | iso-awi-tr-24492 | - | 0 | 3.20 | 18.3 | +| 5 | iso-iec-22989-2022-awi-amd-2 | - | 0 | 3.20 | 18.3 | +| 6 | iso-iec-23053-2022-awi-amd-2 | - | 0 | 3.20 | 18.3 | +| 7 | iso-iec-15938-18-2023 | - | 0 | 3.00 | 17.1 | +| 8 | iso-iec-tr-24030-2024 | - | 0 | 3.00 | 17.1 | +| 9 | iso-iec-cd-ts-22440-3 | - | 0 | 3.00 | 17.1 | +| 10 | iso-37181-2022 | - | 0 | 4.40 | 17.1 | + +## Summary Statistics + +- **Average pages**: 20.2 (57.6% coverage) +- **Average authors**: 1.7 +- **Average citations**: 4.9 +- **Total drafts analyzed**: 688 \ No newline at end of file diff --git a/data/reports/false-positives.md b/data/reports/false-positives.md new file mode 100644 index 0000000..d5a4b96 --- /dev/null +++ b/data/reports/false-positives.md @@ -0,0 +1,152 @@ +# False Positive Profile Report +*Generated 2026-03-08 18:04 UTC* + +## Overview + +- **False positives**: 73 (9.6% of 761 total drafts) +- **% of rated**: 9.6% of 761 rated drafts +- These drafts matched AI/agent search keywords but were flagged as not genuinely about AI agent infrastructure. + +## By Source + +| Source | FP Count | % of FPs | +|--------|--------:|---------:| +| IETF | 73 | 100.0% | + +## Rating Comparison: FP vs Non-FP + +| Dimension | FP Avg | Non-FP Avg | Delta | +|-----------|------:|-----------:|------:| +| Novelty | 2.51 | 3.16 | -0.66 | +| Maturity | 3.26 | 3.13 | +0.13 | +| Overlap | 2.68 | 2.61 | +0.07 | +| Momentum | 2.68 | 3.07 | -0.39 | +| Relevance | 2.51 | 3.77 | -1.27 | + +## Categories Assigned to False Positives + +| Category | Count | +|----------|------:| +| Data formats/interop | 28 | +| Agent identity/auth | 25 | +| Other AI/agent | 16 | +| Policy/governance | 12 | +| Autonomous netops | 11 | +| A2A protocols | 9 | +| Agent discovery/reg | 7 | +| AI safety/alignment | 3 | +| ML traffic mgmt | 2 | +| Human-agent interaction | 1 | + +## Top Terms in FP Abstracts + +| Term | Occurrences | +|------|------------:| +| document | 79 | +| protocol | 60 | +| key | 60 | +| agent | 38 | +| network | 33 | +| data | 29 | +| defines | 27 | +| ipv | 27 | +| edhoc | 27 | +| information | 26 | +| rfc | 26 | +| diffie | 25 | +| hellman | 25 | +| list | 25 | +| security | 23 | +| user | 20 | +| authentication | 19 | +| public | 18 | +| control | 17 | +| configuration | 17 | +| mechanism | 17 | +| provides | 16 | +| header | 16 | +| specifies | 16 | +| ephemeral | 16 | +| secure | 15 | +| server | 15 | +| internet | 15 | +| certificate | 15 | +| multi | 15 | + +## All False Positives + +| Draft | Title | Source | Relevance | Categories | +|-------|-------|--------|----------:|------------| +| draft-ahc-green-smartpdu-yang | A YANG Model for SmartPDU Monitoring and Control | IETF | 3 | Data formats/interop, Policy/governance | +| draft-allman-tcpx2-hack | TCPx2: Don't Fence Me In | IETF | 3 | Other AI/agent | +| draft-amsuess-ace-brski-ace | Provisioning ACE credentials through BRSKI | IETF | 3 | Agent identity/auth, Autonomous netops | +| draft-bastian-jose-dvs | Public Key Derived HMAC for JOSE | IETF | 3 | Agent identity/auth, Data formats/interop | +| draft-bastian-jose-pkdh | Public Key Derived HMAC for JOSE | IETF | 3 | Data formats/interop | +| draft-condrey-rats-witnessd-enrollment | Trust Anchor Bootstrap Protocol for Proof of Proce | IETF | 3 | Agent identity/auth, Policy/governance | +| draft-contario-totp-secure-enrollment | TOTP Secure Enrollment | IETF | 3 | Agent identity/auth, AI safety/alignment | +| draft-daviel-html-geo-tag | Geographic registration of HTML documents | IETF | 2 | Data formats/interop, Agent discovery/reg | +| draft-doujiali-cloudnetwork-intelligentoperation | An requirement of Cloud Network Intelligent Operat | IETF | 2 | Autonomous netops, Policy/governance | +| draft-ecdh-psi | PSI based on ECDH | IETF | 3 | Data formats/interop | +| draft-eggert-mailmaint-uaautoconf | Automatic Configuration of Email, Calendar, and Co | IETF | 4 | Agent discovery/reg, Data formats/interop | +| draft-gont-dhcwg-dhcpv6-iids | A Method for Generating Semantically Opaque IPv6 I | IETF | 2 | Agent identity/auth, Data formats/interop | +| draft-hackett-ures | Unified Rendering of Email Standard (URES) | IETF | 2 | Data formats/interop, Other AI/agent | +| draft-he-yi-srv6ops-ipv6-enhancemnet-in-cloud-uc | Use Cases and Requirements for IPv6 enhancement te | IETF | 3 | Agent identity/auth, Autonomous netops | +| draft-housley-lamps-private-key-attest-attr | An Attribute for Statement of Possession of a Priv | IETF | 3 | Agent identity/auth, Data formats/interop | +| draft-hy-srv6ops-sfc-in-cloud-uc | Use Cases and Requirements for Service Function Ch | IETF | 3 | Autonomous netops, Data formats/interop | +| draft-ietf-anima-brski-prm | BRSKI with Pledge in Responder Mode (BRSKI-PRM) | IETF | 2 | Agent identity/auth, Autonomous netops | +| draft-ietf-bgp-idrp-usage | Application of the Border Gateway Protocol and IDR | IETF | 2 | A2A protocols, Policy/governance | +| draft-ietf-dnsop-ds-automation | Operational Recommendations for DS Automation | IETF | 1 | Other AI/agent | +| draft-ietf-dtn-bpv7-admin-iana | Bundle Protocol Version 7 Administrative Record Ty | IETF | 2 | Data formats/interop | +| draft-ietf-emu-eap-edhoc | Using the Extensible Authentication Protocol (EAP) | IETF | 3 | Agent identity/auth | +| draft-ietf-hpke-hpke | Hybrid Public Key Encryption | IETF | 5 | Agent identity/auth, Data formats/interop | +| draft-ietf-httpbis-layered-cookies | Cookies: HTTP State Management Mechanism | IETF | 2 | Other AI/agent | +| draft-ietf-httpbis-rfc6265bis | Cookies: HTTP State Management Mechanism | IETF | 2 | Other AI/agent | +| draft-ietf-idr-bgp-dpa | Destination Preference Attribute for BGP | IETF | 3 | A2A protocols | +| draft-ietf-lake-edhoc-impl-cons | Implementation Considerations for Ephemeral Diffie | IETF | 3 | Agent identity/auth | +| draft-ietf-lake-traces | Traces of EDHOC | IETF | 3 | Data formats/interop | +| draft-ietf-lamps-e2e-mail-guidance | Guidance on End-to-End E-mail Security | IETF | 2 | Data formats/interop, Policy/governance | +| draft-ietf-lamps-private-key-stmt-attr | An Attribute for Statement of Possession of a Priv | IETF | 3 | Data formats/interop | +| draft-ietf-lamps-rfc5274bis | Certificate Management Messages over CMS (CMC): Co | IETF | 3 | Data formats/interop | +| draft-ietf-mailmaint-pacc | Automatic Configuration of Email, Calendar, and Co | IETF | 2 | Data formats/interop | +| draft-ietf-pim-zeroconf-mcast-addr-alloc-ps | Zeroconf Multicast Address Allocation Problem Stat | IETF | 2 | Agent discovery/reg, Data formats/interop | +| draft-ietf-roll-enrollment-priority | Controlling Secure Network Enrollment in RPL netwo | IETF | 3 | Agent discovery/reg, Autonomous netops | +| draft-ietf-sip-location-conveyance | Location Conveyance for the Session Initiation Pro | IETF | 3 | A2A protocols, Agent discovery/reg | +| draft-ietf-sshm-ssh-agent | SSH Agent Protocol | IETF | 2 | Agent identity/auth | +| draft-ietf-suit-firmware-encryption | Encrypted Payloads in SUIT Manifests | IETF | 4 | Data formats/interop | +| draft-ingles-eap-edhoc | Using the Extensible Authentication Protocol with | IETF | 3 | Agent identity/auth, Data formats/interop | +| draft-jaju-httpbis-zstd-window-size | Window Sizing for Zstandard Content Encoding | IETF | 2 | Data formats/interop | +| draft-khatri-sipcore-call-transfer-fail-response | A SIP Response Code (497) for Call Transfer Failur | IETF | 3 | Other AI/agent | +| draft-kompella-lsr-mptecap | Multipath Traffic Engineering Capabilities | IETF | 2 | ML traffic mgmt | +| draft-lenders-core-dnr | Discovery of Network-designated OSCORE-based Resol | IETF | 3 | Agent discovery/reg, Data formats/interop | +| draft-leon-distributed-multi-signer | Distributed DNSSEC Multi-Signer Bootstrap | IETF | 2 | Autonomous netops, Data formats/interop | +| draft-liu-access-collaboration-agent | Ubiquitous Access Collaboration Requirements for A | IETF | 2 | A2A protocols, Other AI/agent | +| draft-lopez-lake-edhoc-psk | EDHOC PSK authentication | IETF | 3 | Agent identity/auth | +| draft-ma-v6ops-pe-ipv6only-reqs | Requirements for Provider Edge in IPv6-only Underl | IETF | 3 | Other AI/agent | +| draft-men-rtgwg-agent-networking-in-digibank | Agent Networking Scenarios of Digital Banking | IETF | 2 | A2A protocols, Agent discovery/reg | +| draft-meyerzuselha-oauth-web-message-response-mode | OAuth 2.0 Web Message Response Mode for Popup- and | IETF | 2 | Agent identity/auth, Human-agent interaction | +| draft-moonesamy-rfc2369bis | The Use of URIs as Meta-Syntax for Core Mail List | IETF | 2 | Other AI/agent | +| draft-moonesamy-rfc2919bis | List-Id: A Structured Field and Namespace for the | IETF | 2 | Other AI/agent | +| draft-moreno-lisp-uberlay | Uberlay Interconnection of Multiple LISP overlays | IETF | 3 | Data formats/interop, Autonomous netops | +| draft-mvieuille-kerpass-ephemsec | KerPass EPHEMSEC One-Time Password Algorithm | IETF | 3 | Agent identity/auth, AI safety/alignment | +| draft-mzhang-nfsv4-recursively-setting | Recursively Setting Attributes of Subdirectories a | IETF | 2 | Data formats/interop | +| draft-nsiangani-authenticatedsecuredlayer | ASL Authenticated Secure Layer Protocol | IETF | 1 | Other AI/agent | +| draft-ounsworth-lamps-cms-dhkem | Use of the DH-Based KEM (DHKEM) in the Cryptograph | IETF | 3 | Data formats/interop, A2A protocols | +| draft-pan-aqm-pie | PIE: A Lightweight Control Scheme To Address the B | IETF | 2 | Other AI/agent | +| draft-pan-tsvwg-pie | PIE: A Lightweight Control Scheme To Address the B | IETF | 3 | ML traffic mgmt | +| draft-ruas-cfrg-ecdp | ECDP: Elliptic Curve Data Protocol | IETF | 3 | A2A protocols, Agent identity/auth | +| draft-serafin-lake-ta-hint | Trust Anchor Hints in Ephemeral Diffie-Hellman Ove | IETF | 3 | Agent identity/auth | +| draft-sipos-dtn-bp-safe | Bundle Protocol (BP) Security Associations with Fe | IETF | 2 | Agent identity/auth, Autonomous netops | +| draft-steckbeck-ua-conn-sec | User Agent Connection Security | IETF | 2 | Policy/governance, Other AI/agent | +| draft-takagi-srta-trinity | SRTA and the Trinity Configuration: A Conceptual A | IETF | 2 | AI safety/alignment, Policy/governance | +| draft-templin-6man-mla | IPv6 Addresses for Ad Hoc Networks | IETF | 3 | A2A protocols | +| draft-templin-manet-inet | MANET Internetworking: Problem Statement and Gap A | IETF | 2 | Other AI/agent | +| draft-tiloca-lake-exporter-output-length | In-band Agreement of Output Lengths for the EDHOC_ | IETF | 2 | Agent identity/auth | +| draft-tjw-dbound2-problem-statement | Domain Boundaries 2.0 Problem Statement | IETF | 2 | Agent identity/auth, Policy/governance | +| draft-tojens-dhcp-option-concat-considerations | DHCP Option Concatenation Considerations | IETF | 3 | Data formats/interop | +| draft-tu-nmrg-blockchain-trusted-protocol | A Blockchain Trusted Protocol for Intelligent Comm | IETF | 2 | Policy/governance, Agent identity/auth | +| draft-vattaparambil-positioning-of-poa | Positioning of PoA | IETF | 2 | Agent identity/auth, Policy/governance | +| draft-wang-data-transmission-security-irii | Data Transmission Security of Identity Resolution | IETF | 2 | Agent identity/auth, Policy/governance | +| draft-wendt-stir-vesper | VESPER - Framework for VErifiable STI Personas | IETF | 2 | Agent identity/auth, Data formats/interop | +| draft-willman-rtgwg-conduit-tunnels | Underlay for IPsec Transport | IETF | 2 | Autonomous netops, Policy/governance | +| draft-zhul-dhc-bnc-up-specific-suboption | Broadband Network UP-Specific Information Suboptio | IETF | 2 | Other AI/agent | +| draft-zhul-intarea-bnc-up-specific-suboption | Broadband Network UP-Specific Information Suboptio | IETF | 2 | Other AI/agent | \ No newline at end of file diff --git a/data/reports/sources.md b/data/reports/sources.md new file mode 100644 index 0000000..c61f22b --- /dev/null +++ b/data/reports/sources.md @@ -0,0 +1,44 @@ +# Cross-Source Comparison Report +*Generated 2026-03-08 18:04 UTC — 761 drafts across 6 sources* + +## Summary + +| Source | Drafts | Rated | Authors | Ideas | Avg Score | Top Category | +|--------|-------:|------:|--------:|------:|----------:|--------------| +| ETSI | 12 | 12 | 0 | 13 | 3.16 | Policy/governance | +| IETF | 469 | 396 | 722 | 495 | 3.43 | Data formats/interop | +| ISO | 238 | 238 | 0 | 245 | 3.12 | Policy/governance | +| ITU | 20 | 20 | 0 | 20 | 3.52 | Policy/governance | +| NIST | 11 | 11 | 0 | 12 | 3.89 | AI safety/alignment | +| W3C | 11 | 11 | 0 | 11 | 2.61 | Human-agent interaction | + +## Rating Dimensions by Source + +| Source | Novelty | Maturity | Overlap | Momentum | Relevance | +|--------|--------:|---------:|--------:|---------:|----------:| +| ETSI | 2.58 | 3.08 | 2.75 | 3.08 | 3.92 | +| IETF | 3.41 | 2.98 | 2.55 | 3.09 | 4.03 | +| ISO | 2.84 | 3.26 | 2.65 | 2.96 | 3.35 | +| ITU | 2.95 | 3.85 | 2.95 | 3.80 | 3.95 | +| NIST | 3.36 | 4.00 | 2.64 | 4.18 | 4.45 | +| W3C | 2.09 | 3.55 | 3.64 | 2.36 | 2.73 | + +## Category Distribution by Source + +| Category | ETSI | IETF | ISO | ITU | NIST | W3C | +|----------|------:|------:|------:|------:|------:|------:| +| A2A protocols | 1 | 151 | 27 | 2 | 1 | 0 | +| AI safety/alignment | 7 | 46 | 90 | 8 | 9 | 0 | +| Agent discovery/reg | 1 | 86 | 6 | 4 | 0 | 1 | +| Agent identity/auth | 2 | 148 | 20 | 3 | 1 | 0 | +| Autonomous netops | 3 | 113 | 39 | 8 | 2 | 0 | +| Data formats/interop | 4 | 162 | 118 | 8 | 3 | 4 | +| Human-agent interaction | 0 | 32 | 39 | 3 | 2 | 9 | +| ML traffic mgmt | 2 | 79 | 10 | 3 | 0 | 0 | +| Model serving/inference | 0 | 44 | 41 | 5 | 6 | 0 | +| Other AI/agent | 7 | 22 | 69 | 5 | 3 | 1 | +| Policy/governance | 10 | 114 | 157 | 17 | 8 | 5 | + +## Category Coverage Analysis + +**Shared categories** (covered by 2+ bodies): A2A protocols, AI safety/alignment, Agent discovery/reg, Agent identity/auth, Autonomous netops, Data formats/interop, Human-agent interaction, ML traffic mgmt, Model serving/inference, Other AI/agent, Policy/governance diff --git a/data/reports/trends.md b/data/reports/trends.md index dc76431..a44ad1c 100644 --- a/data/reports/trends.md +++ b/data/reports/trends.md @@ -1,72 +1,132 @@ -# Category Trend Analysis -*Generated 2026-03-03 19:59 UTC — 361 drafts, 19 months, 19 categories* +# Temporal Evolution Report +*Generated 2026-03-08 18:05 UTC — 500 drafts, 87 months* -## Growth Summary +## Monthly Overview + +| Month | Submissions | New Authors | Cum. Ideas | Avg Novelty | Avg Maturity | Avg Relevance | Safety Ratio | +|-------|------------:|------------:|-----------:|------------:|-------------:|--------------:|-------------:| +| 1995-12 | 1 | 0 | 0 | 0.0 | 0.0 | 0.0 | - | +| 1996112 | 1 → | 0 | 1 | 0.0 | 0.0 | 0.0 | - | +| 1997-12 | 1 → | 0 | 0 | 0.0 | 0.0 | 0.0 | - | +| 1997123 | 1 → | 0 | 2 | 0.0 | 0.0 | 0.0 | - | +| 1999-08 | 2 ↑ | 0 | 0 | 0.0 | 0.0 | 0.0 | - | +| 1999020 | 1 ↓ | 0 | 3 | 0.0 | 0.0 | 0.0 | - | +| 2002121 | 2 ↑ | 0 | 5 | 3.0 | 5.0 | 3.0 | - | +| 2003012 | 1 ↓ | 0 | 7 | 0.0 | 0.0 | 0.0 | - | +| 2004-01 | 1 → | 0 | 9 | 0.0 | 0.0 | 0.0 | - | +| 2007-02 | 1 → | 0 | 0 | 0.0 | 0.0 | 0.0 | - | +| 2007103 | 1 → | 0 | 10 | 0.0 | 0.0 | 0.0 | - | +| 2009-10 | 1 → | 0 | 11 | 0.0 | 0.0 | 0.0 | - | +| 2009-11 | 1 → | 0 | 12 | 0.0 | 0.0 | 0.0 | - | +| 2010-02 | 1 → | 0 | 13 | 0.0 | 0.0 | 0.0 | - | +| 2010-06 | 2 ↑ | 0 | 16 | 0.0 | 0.0 | 0.0 | - | +| 2011-04 | 2 → | 0 | 18 | 0.0 | 0.0 | 0.0 | - | +| 2012-02 | 1 ↓ | 0 | 19 | 2.0 | 5.0 | 2.0 | - | +| 2013-03 | 3 ↑ | 0 | 0 | 0.0 | 0.0 | 0.0 | - | +| 2014-03 | 1 ↓ | 0 | 20 | 0.0 | 0.0 | 0.0 | - | +| 2014-10 | 1 → | 0 | 21 | 0.0 | 0.0 | 0.0 | - | +| 2014032 | 1 → | 0 | 22 | 2.0 | 5.0 | 4.0 | - | +| 2015-11 | 2 ↑ | 0 | 25 | 0.0 | 0.0 | 0.0 | - | +| 2015/CD | 1 ↓ | 0 | 0 | 0.0 | 0.0 | 0.0 | - | +| 2015121 | 2 ↑ | 0 | 26 | 2.5 | 5.0 | 4.0 | - | +| 2016-01 | 3 ↑ | 0 | 30 | 0.0 | 0.0 | 0.0 | - | +| 2017-05 | 1 ↓ | 0 | 31 | 2.0 | 4.0 | 3.0 | - | +| 2017-06 | 3 ↑ | 0 | 32 | 0.0 | 0.0 | 0.0 | - | +| 2017-10 | 1 ↓ | 0 | 33 | 0.0 | 0.0 | 0.0 | - | +| 2018-01 | 1 → | 0 | 35 | 3.0 | 4.0 | 3.0 | - | +| 2018-02 | 1 → | 0 | 36 | 0.0 | 0.0 | 0.0 | - | +| 2019-02 | 1 → | 0 | 37 | 3.0 | 5.0 | 2.0 | - | +| 2019-07 | 1 → | 0 | 38 | 3.0 | 5.0 | 3.0 | - | +| 2019-11 | 1 → | 0 | 40 | 3.0 | 5.0 | 2.0 | - | +| 2020 | 1 → | 0 | 41 | 3.0 | 3.0 | 4.0 | - | +| 2020-03 | 1 → | 0 | 42 | 3.0 | 5.0 | 4.0 | - | +| 2020-08 | 1 → | 0 | 43 | 3.0 | 4.0 | 3.0 | - | +| 2020-09 | 1 → | 0 | 44 | 4.0 | 4.0 | 4.0 | - | +| 2021 | 2 ↑ | 0 | 45 | 0.0 | 0.0 | 0.0 | - | +| 2021-01 | 1 ↓ | 0 | 47 | 3.0 | 4.0 | 4.0 | - | +| 2021-03 | 1 → | 0 | 49 | 0.0 | 0.0 | 0.0 | - | +| 2021-07 | 1 → | 0 | 51 | 4.0 | 4.0 | 5.0 | - | +| 2021-08 | 1 → | 0 | 52 | 3.0 | 4.0 | 4.0 | - | +| 2021-11 | 1 → | 0 | 53 | 4.0 | 3.0 | 4.0 | - | +| 2021-12 | 1 → | 0 | 54 | 0.0 | 0.0 | 0.0 | - | +| 2022 | 3 ↑ | 0 | 57 | 3.0 | 4.3 | 4.3 | - | +| 2022-05 | 1 ↓ | 0 | 0 | 3.0 | 4.0 | 3.0 | - | +| 2022-06 | 2 ↑ | 0 | 59 | 4.0 | 5.0 | 4.5 | - | +| 2022-07 | 1 ↓ | 0 | 60 | 3.0 | 4.0 | 3.0 | - | +| 2022-08 | 3 ↑ | 0 | 64 | 2.5 | 4.5 | 2.5 | - | +| 2022-09 | 1 ↓ | 0 | 66 | 0.0 | 0.0 | 0.0 | - | +| 2022-10 | 2 ↑ | 0 | 68 | 3.0 | 4.0 | 4.0 | - | +| 2022-11 | 1 ↓ | 0 | 69 | 3.0 | 4.0 | 2.0 | - | +| 2022/AW | 2 ↑ | 0 | 0 | 0.0 | 0.0 | 0.0 | - | +| 2022/DA | 2 → | 0 | 71 | 2.0 | 4.0 | 3.5 | - | +| 2023 | 3 ↑ | 0 | 74 | 3.7 | 4.7 | 4.7 | - | +| 2023-01 | 2 ↓ | 0 | 76 | 3.0 | 4.5 | 5.0 | - | +| 2023-03 | 1 ↓ | 0 | 77 | 3.0 | 5.0 | 4.0 | - | +| 2023-05 | 1 → | 0 | 78 | 3.0 | 4.0 | 5.0 | - | +| 2023-06 | 1 → | 0 | 79 | 3.0 | 4.0 | 5.0 | - | +| 2023-07 | 3 ↑ | 0 | 81 | 3.5 | 4.0 | 4.0 | - | +| 2023-08 | 1 ↓ | 0 | 82 | 4.0 | 3.0 | 4.0 | - | +| 2023-11 | 1 → | 0 | 84 | 2.0 | 4.0 | 3.0 | - | +| 2024 | 3 ↑ | 0 | 87 | 3.7 | 3.7 | 5.0 | - | +| 2024-01 | 10 ↑ | 17 | 98 | 3.0 | 3.8 | 4.0 | - | +| 2024-02 | 5 ↓ | 5 | 103 | 3.0 | 3.5 | 4.0 | - | +| 2024-03 | 3 ↓ | 5 | 106 | 3.0 | 4.5 | 4.0 | - | +| 2024-04 | 9 ↑ | 9 | 114 | 3.5 | 3.2 | 4.0 | - | +| 2024-05 | 5 ↓ | 11 | 119 | 3.0 | 4.0 | 3.5 | - | +| 2024-06 | 3 ↓ | 8 | 121 | 3.0 | 5.0 | 2.0 | - | +| 2024-07 | 13 ↑ | 10 | 136 | 2.8 | 4.2 | 3.8 | - | +| 2024-08 | 4 ↓ | 5 | 140 | 3.5 | 4.0 | 4.0 | - | +| 2024-09 | 13 ↑ | 34 | 153 | 3.0 | 3.4 | 4.0 | - | +| 2024-10 | 3 ↓ | 9 | 157 | 3.0 | 4.0 | 4.0 | - | +| 2024-11 | 6 ↑ | 12 | 164 | 2.8 | 4.2 | 3.6 | - | +| 2024-12 | 10 ↑ | 14 | 174 | 3.6 | 4.0 | 4.4 | - | +| 2025 | 1 ↓ | 0 | 175 | 4.0 | 2.0 | 4.0 | - | +| 2025-01 | 8 ↑ | 18 | 182 | 3.7 | 3.0 | 4.3 | - | +| 2025-02 | 3 ↓ | 4 | 185 | 2.7 | 4.7 | 3.7 | - | +| 2025-03 | 6 ↑ | 4 | 192 | 3.0 | 4.3 | 3.2 | - | +| 2025-04 | 13 ↑ | 31 | 206 | 3.0 | 3.7 | 4.2 | - | +| 2025-05 | 8 ↓ | 13 | 214 | 3.3 | 3.4 | 4.3 | - | +| 2025-06 | 5 ↓ | 15 | 222 | 3.0 | 3.7 | 4.0 | - | +| 2025-07 | 9 ↑ | 4 | 231 | 3.8 | 3.1 | 4.1 | - | +| 2025-08 | 10 ↑ | 15 | 242 | 3.3 | 3.7 | 3.7 | - | +| 2025-09 | 19 ↑ | 38 | 263 | 3.5 | 3.2 | 3.8 | - | +| 2025-10 | 65 ↑ | 90 | 330 | 3.5 | 3.0 | 4.1 | - | +| 2025-11 | 27 ↓ | 65 | 390 | 3.6 | 3.1 | 4.3 | - | + +## Category Growth Summary | Category | Total | Last 3mo | Prev 3mo | Growth | |----------|------:|---------:|---------:|-------:| -| A2A protocols | 120 | 58 | 54 | +7% | -| AI safety / guardrails / alignment | 1 | 1 | 0 | new | -| AI safety/alignment | 44 | 21 | 15 | +40% | -| Agent discovery / registration | 14 | 9 | 5 | +80% | -| Agent discovery/reg | 65 | 28 | 32 | -12% | -| Agent identity/auth | 108 | 46 | 51 | -10% | -| Agent-to-agent communication protocols | 16 | 11 | 5 | +120% | -| Autonomous netops | 93 | 38 | 40 | -5% | -| Autonomous network operations | 5 | 4 | 1 | +300% | -| Data formats / semantics for AI interop | 3 | 2 | 1 | +100% | -| Data formats/interop | 145 | 52 | 66 | -21% | -| Human-agent interaction | 30 | 9 | 17 | -47% | -| Identity / authentication for AI agents | 13 | 9 | 4 | +125% | -| ML traffic mgmt | 73 | 33 | 25 | +32% | -| ML-based traffic management / optimization | 1 | 1 | 0 | new | -| Model serving/inference | 42 | 22 | 14 | +57% | -| Other AI/agent | 26 | 13 | 9 | +44% | -| Policy / governance / ethical frameworks | 2 | 2 | 0 | new | -| Policy/governance | 91 | 42 | 31 | +35% | - -## Monthly Breakdown - -| Month | A2A protocols | AI safety / gua | AI safety/align | Agent discovery | Agent discovery | Agent identity/ | Agent-to-agent | Autonomous neto | Autonomous netw | Data formats / | Data formats/in | Human-agent int | Identity / auth | ML traffic mgmt | ML-based traffi | Model serving/i | Other AI/agent | Policy / govern | Policy/governan | Total | -|-------|---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | -----:| -| 2024-01 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 5 | -| 2024-02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | -| 2024-04 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | -| 2024-09 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 3 | -| 2024-10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 2 | -| 2024-12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | -| 2025-01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 4 | 0 | 2 | 0 | 0 | 0 | 8 | -| 2025-04 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 3 | 10 | -| 2025-05 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 5 | -| 2025-06 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 3 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 1 | 12 | -| 2025-07 | 2 | 0 | 2 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 11 | -| 2025-08 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 6 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 3 | 16 | -| 2025-09 | 4 | 0 | 4 | 0 | 3 | 3 | 0 | 6 | 0 | 0 | 11 | 1 | 0 | 4 | 0 | 1 | 0 | 0 | 6 | 43 | -| 2025-10 | 26 | 0 | 5 | 2 | 13 | 23 | 2 | 19 | 0 | 1 | 35 | 7 | 1 | 13 | 0 | 9 | 5 | 0 | 10 | 171 | -| 2025-11 | 26 | 0 | 7 | 2 | 16 | 21 | 3 | 17 | 1 | 0 | 25 | 9 | 2 | 9 | 0 | 4 | 2 | 0 | 15 | 159 | -| 2025-12 | 2 | 0 | 3 | 1 | 3 | 7 | 0 | 4 | 0 | 0 | 6 | 1 | 1 | 3 | 0 | 1 | 2 | 0 | 6 | 40 | -| 2026-01 | 13 | 0 | 8 | 7 | 8 | 14 | 9 | 12 | 4 | 2 | 15 | 2 | 4 | 7 | 1 | 5 | 4 | 1 | 13 | 129 | -| 2026-02 | 36 | 1 | 12 | 2 | 17 | 29 | 2 | 18 | 0 | 0 | 32 | 6 | 5 | 15 | 0 | 8 | 8 | 1 | 23 | 215 | -| 2026-03 | 9 | 0 | 1 | 0 | 3 | 3 | 0 | 8 | 0 | 0 | 5 | 1 | 0 | 11 | 0 | 9 | 1 | 0 | 6 | 57 | +| A2A protocols | 49 | 34 | 3 | +1033% | +| AI safety/alignment | 53 | 13 | 5 | +160% | +| Agent discovery/reg | 33 | 20 | 3 | +567% | +| Agent identity/auth | 58 | 27 | 7 | +286% | +| Autonomous netops | 52 | 27 | 2 | +1250% | +| Data formats/interop | 105 | 44 | 6 | +633% | +| Human-agent interaction | 33 | 11 | 3 | +267% | +| ML traffic mgmt | 29 | 17 | 2 | +750% | +| Model serving/inference | 25 | 9 | 2 | +350% | +| Other AI/agent | 33 | 3 | 0 | new | +| Policy/governance | 101 | 19 | 5 | +280% | ## Fastest Growing Categories (early vs late half) -- **AI safety / guardrails / alignment**: new (0 -> 1 drafts) -- **Agent discovery / registration**: new (0 -> 14 drafts) -- **Agent-to-agent communication protocols**: new (0 -> 16 drafts) -- **Autonomous network operations**: new (0 -> 5 drafts) -- **Data formats / semantics for AI interop**: new (0 -> 3 drafts) -- **Identity / authentication for AI agents**: new (0 -> 13 drafts) -- **ML-based traffic management / optimization**: new (0 -> 1 drafts) -- **Policy / governance / ethical frameworks**: new (0 -> 2 drafts) -- **A2A protocols**: +11800% (1 -> 119 drafts) -- **Agent discovery/reg**: +6300% (1 -> 64 drafts) -- **Agent identity/auth**: +5200% (2 -> 106 drafts) -- **Human-agent interaction**: +2800% (1 -> 29 drafts) -- **Autonomous netops**: +2125% (4 -> 89 drafts) -- **AI safety/alignment**: +2000% (2 -> 42 drafts) -- **Data formats/interop**: +1871% (7 -> 138 drafts) -- **Other AI/agent**: +1100% (2 -> 24 drafts) -- **Policy/governance**: +1100% (7 -> 84 drafts) -- **Model serving/inference**: +850% (4 -> 38 drafts) -- **ML traffic mgmt**: +712% (8 -> 65 drafts) \ No newline at end of file +- **Model serving/inference**: new (0 → 25 drafts) +- **A2A protocols**: +4700% (1 → 48 drafts) +- **Agent discovery/reg**: +3100% (1 → 32 drafts) +- **Agent identity/auth**: +2700% (2 → 56 drafts) +- **Other AI/agent**: +1450% (2 → 31 drafts) +- **Data formats/interop**: +1112% (8 → 97 drafts) +- **Autonomous netops**: +1100% (4 → 48 drafts) +- **ML traffic mgmt**: +767% (3 → 26 drafts) +- **Policy/governance**: +718% (11 → 90 drafts) +- **AI safety/alignment**: +683% (6 → 47 drafts) +- **Human-agent interaction**: +350% (6 → 27 drafts) + +## Rating Dimension Trends + +- **Novelty**: 2.94 → 3.33 (+0.38) ↑ +- **Maturity**: 4.39 → 3.52 (-0.87) ↓ +- **Overlap**: 2.72 → 2.54 (-0.18) ↓ +- **Momentum**: 3.11 → 3.50 (+0.39) ↑ +- **Relevance**: 3.44 → 4.00 (+0.56) ↑ \ No newline at end of file diff --git a/src/ietf_analyzer/cli.py b/src/ietf_analyzer/cli.py index 98633b1..17f47d6 100644 --- a/src/ietf_analyzer/cli.py +++ b/src/ietf_analyzer/cli.py @@ -835,6 +835,51 @@ def wg_report(cfg, db): console.print(f"Report saved: [bold]{path}[/]") +@report.command("sources") +@pass_cfg_db +def sources_report(cfg, db): + """Cross-source comparison report — ratings and categories by standards body.""" + from .reports import Reporter + path = Reporter(cfg, db).sources_report() + console.print(f"Report saved: [bold]{path}[/]") + + +@report.command("false-positives") +@pass_cfg_db +def false_positives_report(cfg, db): + """False positive profiling report — what makes drafts look AI-related but not be.""" + from .reports import Reporter + path = Reporter(cfg, db).false_positives_report() + console.print(f"Report saved: [bold]{path}[/]") + + +@report.command("citations") +@pass_cfg_db +def citations_report(cfg, db): + """Citation influence and BCP dependency analysis.""" + from .reports import Reporter + path = Reporter(cfg, db).citations_report() + console.print(f"Report saved: [bold]{path}[/]") + + +@report.command("complexity") +@pass_cfg_db +def complexity_report(cfg, db): + """Draft complexity matrix: correlations between structural complexity and ratings.""" + from .reports import Reporter + path = Reporter(cfg, db).complexity_report() + console.print(f"Report saved: [bold]{path}[/]") + + +@report.command("idea-analysis") +@pass_cfg_db +def idea_analysis_report(cfg, db): + """Idea novelty deep dive — distribution, types, top ideas, cross-draft patterns.""" + from .reports import Reporter + path = Reporter(cfg, db).idea_analysis() + console.print(f"Report saved: [bold]{path}[/]") + + # ── wg (working group analysis) ───────────────────────────────────────── diff --git a/src/ietf_analyzer/db.py b/src/ietf_analyzer/db.py index b3ec938..776d5dd 100644 --- a/src/ietf_analyzer/db.py +++ b/src/ietf_analyzer/db.py @@ -761,16 +761,30 @@ class Database: ).fetchall() return [r["name"] for r in rows] - def all_ideas(self) -> list[dict]: - rows = self.conn.execute( - "SELECT * FROM ideas ORDER BY draft_name" - ).fetchall() + def all_ideas(self, include_false_positives: bool = False) -> list[dict]: + if include_false_positives: + rows = self.conn.execute( + "SELECT * FROM ideas ORDER BY draft_name" + ).fetchall() + else: + rows = self.conn.execute( + "SELECT i.* FROM ideas i " + "WHERE i.draft_name NOT IN " + "(SELECT draft_name FROM ratings WHERE false_positive = 1) " + "ORDER BY i.draft_name" + ).fetchall() return [{"title": r["title"], "description": r["description"], "type": r["idea_type"], "draft_name": r["draft_name"], "novelty_score": r["novelty_score"]} for r in rows] - def idea_count(self) -> int: - return self.conn.execute("SELECT COUNT(*) FROM ideas").fetchone()[0] + def idea_count(self, include_false_positives: bool = False) -> int: + if include_false_positives: + return self.conn.execute("SELECT COUNT(*) FROM ideas").fetchone()[0] + return self.conn.execute( + "SELECT COUNT(*) FROM ideas " + "WHERE draft_name NOT IN " + "(SELECT draft_name FROM ratings WHERE false_positive = 1)" + ).fetchone()[0] def ideas_with_drafts(self, unscored_only: bool = False, limit: int = 5000) -> list[dict]: """Return ideas joined with draft title, optionally only unscored ones.""" diff --git a/src/ietf_analyzer/reports.py b/src/ietf_analyzer/reports.py index 921a3d2..6d598a3 100644 --- a/src/ietf_analyzer/reports.py +++ b/src/ietf_analyzer/reports.py @@ -1813,3 +1813,845 @@ class Reporter: path = self.output_dir / "wg-analysis.md" path.write_text(report) return str(path) + + def idea_analysis(self) -> str: + """Generate an idea novelty deep-dive report with distribution, types, and top ideas.""" + from collections import Counter + from difflib import SequenceMatcher + + now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") + all_ideas = self.db.all_ideas() + total = len(all_ideas) + + # Rating lookup + pairs = self.db.drafts_with_ratings(limit=500) + rating_map: dict[str, Rating] = {} + draft_map: dict[str, Draft] = {} + for draft, rating in pairs: + rating_map[draft.name] = rating + draft_map[draft.name] = draft + + scored = [i for i in all_ideas if i.get("novelty_score") is not None] + avg_novelty = sum(i["novelty_score"] for i in scored) / len(scored) if scored else 0 + + # Embedding coverage + embed_count = self.db.conn.execute("SELECT COUNT(*) FROM idea_embeddings").fetchone()[0] + embed_pct = round(embed_count / total * 100, 1) if total > 0 else 0 + + lines = [ + "# Idea Novelty Deep Dive", + f"*Generated {now} — {total} ideas, {len(scored)} scored, avg novelty {avg_novelty:.2f}*\n", + f"**Embedding coverage**: {embed_count}/{total} ({embed_pct}%)\n", + ] + + # Novelty score distribution + novelty_dist = Counter(i["novelty_score"] for i in scored) + lines.extend([ + "## Novelty Score Distribution\n", + "| Score | Count | Bar |", + "|------:|------:|-----|", + ]) + for s in [1, 2, 3, 4, 5]: + count = novelty_dist.get(s, 0) + bar = _bar(s) * min(count, 40) + lines.append(f"| {s} | {count} | {bar} |") + + # Ideas by type with avg novelty + type_data: dict[str, dict] = defaultdict(lambda: {"count": 0, "n_sum": 0, "n_count": 0}) + for idea in all_ideas: + t = idea.get("type", "other") or "other" + type_data[t]["count"] += 1 + if idea.get("novelty_score") is not None: + type_data[t]["n_sum"] += idea["novelty_score"] + type_data[t]["n_count"] += 1 + + lines.extend([ + "\n## Ideas by Type\n", + "| Type | Count | Avg Novelty |", + "|------|------:|------------:|", + ]) + for t, d in sorted(type_data.items(), key=lambda x: x[1]["count"], reverse=True): + avg = d["n_sum"] / d["n_count"] if d["n_count"] > 0 else 0 + lines.append(f"| {t} | {d['count']} | {avg:.2f} |") + + # Top 20 most novel ideas + top_novel = sorted( + [i for i in all_ideas if i.get("novelty_score") and i["novelty_score"] >= 4], + key=lambda x: x["novelty_score"], + reverse=True, + )[:20] + + if top_novel: + lines.extend([ + "\n## Top 20 Most Novel Ideas\n", + "| # | Score | Idea | Type | Draft |", + "|--:|------:|------|------|-------|", + ]) + for idx, idea in enumerate(top_novel, 1): + title = idea["title"][:50] + draft_short = idea["draft_name"].replace("draft-", "")[:35] + lines.append( + f"| {idx} | {idea['novelty_score']} " + f"| {title} | {idea.get('type', 'other')} " + f"| [{draft_short}](https://datatracker.ietf.org/doc/{idea['draft_name']}/) |" + ) + + # Ideas per draft distribution + ideas_per_draft = Counter(i["draft_name"] for i in all_ideas) + ipd_dist = Counter(ideas_per_draft.values()) + lines.extend([ + "\n## Ideas per Draft\n", + "| Ideas/Draft | Drafts |", + "|------------:|-------:|", + ]) + for k in sorted(ipd_dist.keys()): + lines.append(f"| {k} | {ipd_dist[k]} |") + + # Most prolific drafts + lines.extend([ + "\n### Most Prolific Drafts\n", + "| Draft | Ideas | Score |", + "|-------|------:|------:|", + ]) + for name, count in ideas_per_draft.most_common(10): + r = rating_map.get(name) + score = f"{r.composite_score:.2f}" if r else "--" + short = name.replace("draft-", "")[:40] + lines.append(f"| {short} | {count} | {score} |") + + # Shared ideas + idea_groups: list[dict] = [] + for idea in all_ideas: + title_lower = idea["title"].lower().strip() + matched = False + for group in idea_groups: + ratio = SequenceMatcher(None, title_lower, group["canonical"]).ratio() + if ratio >= 0.75: + group["ideas"].append(idea) + group["drafts"].add(idea["draft_name"]) + matched = True + break + if not matched: + idea_groups.append({ + "canonical": title_lower, + "title": idea["title"], + "ideas": [idea], + "drafts": {idea["draft_name"]}, + }) + + shared = [g for g in idea_groups if len(g["drafts"]) >= 2] + shared.sort(key=lambda g: len(g["drafts"]), reverse=True) + + if shared: + lines.extend([ + f"\n## Shared Ideas ({len(shared)} ideas in 2+ drafts)\n", + "| Idea | Appearances | Drafts |", + "|------|------------:|--------|", + ]) + for g in shared[:30]: + draft_list = ", ".join(sorted(g["drafts"])[:5]) + if len(g["drafts"]) > 5: + draft_list += f" +{len(g['drafts']) - 5} more" + lines.append(f"| {g['title']} | {len(g['drafts'])} | {draft_list} |") + + # Cross-tab: type x source + type_source: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int)) + for idea in all_ideas: + t = idea.get("type", "other") or "other" + r = rating_map.get(idea["draft_name"]) + source = "ietf" # default + if idea["draft_name"] in draft_map: + source = getattr(draft_map[idea["draft_name"]], "source", "ietf") or "ietf" + type_source[t][source] += 1 + + sources = sorted(set(s for td in type_source.values() for s in td.keys())) + if len(sources) > 1: + header = "| Type | " + " | ".join(sources) + " |" + sep = "|------|" + "|".join(["-----:" for _ in sources]) + "|" + lines.extend(["\n## Ideas by Type x Source\n", header, sep]) + for t in sorted(type_source.keys(), key=lambda x: sum(type_source[x].values()), reverse=True): + row = f"| {t} | " + " | ".join(str(type_source[t].get(s, 0)) for s in sources) + " |" + lines.append(row) + + # Correlation hint + draft_idea_novelty: dict[str, list[int]] = defaultdict(list) + for idea in scored: + draft_idea_novelty[idea["draft_name"]].append(idea["novelty_score"]) + + corr_pairs = [] + for name, scores_list in draft_idea_novelty.items(): + r = rating_map.get(name) + if r and r.relevance: + corr_pairs.append((sum(scores_list) / len(scores_list), r.relevance)) + + if len(corr_pairs) > 10: + x_vals = [p[0] for p in corr_pairs] + y_vals = [p[1] for p in corr_pairs] + n = len(corr_pairs) + mean_x = sum(x_vals) / n + mean_y = sum(y_vals) / n + cov = sum((x - mean_x) * (y - mean_y) for x, y in zip(x_vals, y_vals)) / n + std_x = (sum((x - mean_x) ** 2 for x in x_vals) / n) ** 0.5 + std_y = (sum((y - mean_y) ** 2 for y in y_vals) / n) ** 0.5 + corr = cov / (std_x * std_y) if std_x > 0 and std_y > 0 else 0 + lines.extend([ + f"\n## Correlation: Idea Novelty vs Draft Relevance\n", + f"Pearson r = **{corr:.3f}** (n={n} drafts with both scores)\n", + f"{'Positive' if corr > 0 else 'Negative'} correlation — " + f"{'drafts with more novel ideas tend to receive higher relevance ratings' if corr > 0.1 else 'weak or no linear relationship between idea novelty and draft relevance'}.", + ]) + + # Embedding note + lines.extend([ + f"\n## Embedding Status\n", + f"{embed_count} of {total} ideas ({embed_pct}%) have embeddings.", + f"To complete the remaining {total - embed_count} embeddings, run:\n", + "```", + "ietf embed-ideas", + "```\n", + "This requires Ollama running locally with the configured embedding model.", + ]) + + report = "\n".join(lines) + path = self.output_dir / "idea-analysis.md" + path.write_text(report) + return str(path) + + + def sources_report(self) -> str: + """Cross-source comparison report — rating dimensions and categories by standards body.""" + from collections import Counter as _Counter + + now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") + pairs = self.db.drafts_with_ratings(limit=2000) + all_drafts = self.db.list_drafts(limit=5000) + + # Draft counts by source + source_draft_counts: dict[str, int] = defaultdict(int) + for d in all_drafts: + src = getattr(d, "source", "ietf") or "ietf" + source_draft_counts[src] += 1 + + # Rating stats by source + source_ratings: dict[str, dict[str, list]] = defaultdict(lambda: { + "novelty": [], "maturity": [], "overlap": [], "momentum": [], + "relevance": [], "scores": [], + }) + source_categories: dict[str, _Counter] = defaultdict(_Counter) + + for draft, rating in pairs: + src = getattr(draft, "source", "ietf") or "ietf" + source_ratings[src]["novelty"].append(rating.novelty) + source_ratings[src]["maturity"].append(rating.maturity) + source_ratings[src]["overlap"].append(rating.overlap) + source_ratings[src]["momentum"].append(rating.momentum) + source_ratings[src]["relevance"].append(rating.relevance) + source_ratings[src]["scores"].append(rating.composite_score) + for cat in rating.categories: + source_categories[src][cat] += 1 + + # Author counts by source + source_author_counts: dict[str, int] = {} + try: + rows = self.db.conn.execute( + """SELECT d.source, COUNT(DISTINCT da.person_id) as cnt + FROM drafts d JOIN draft_authors da ON d.name = da.draft_name + GROUP BY d.source""" + ).fetchall() + for r in rows: + source_author_counts[r["source"] or "ietf"] = r["cnt"] + except Exception: + pass + + # Idea counts by source + source_idea_counts: dict[str, int] = {} + try: + rows = self.db.conn.execute( + """SELECT d.source, COUNT(*) as cnt + FROM ideas i JOIN drafts d ON i.draft_name = d.name + GROUP BY d.source""" + ).fetchall() + for r in rows: + source_idea_counts[r["source"] or "ietf"] = r["cnt"] + except Exception: + pass + + all_sources = sorted(set(source_draft_counts.keys()) | set(source_ratings.keys())) + + lines = [ + "# Cross-Source Comparison Report", + f"*Generated {now} — {len(all_drafts)} drafts across {len(all_sources)} sources*\n", + "## Summary\n", + "| Source | Drafts | Rated | Authors | Ideas | Avg Score | Top Category |", + "|--------|-------:|------:|--------:|------:|----------:|--------------|", + ] + + for src in all_sources: + rats = source_ratings.get(src, {"scores": []}) + cats = source_categories.get(src, _Counter()) + top_cat = cats.most_common(1)[0][0] if cats else "N/A" + avg = sum(rats["scores"]) / len(rats["scores"]) if rats["scores"] else 0.0 + lines.append( + f"| {src.upper()} | {source_draft_counts.get(src, 0)} " + f"| {len(rats['scores'])} " + f"| {source_author_counts.get(src, 0)} " + f"| {source_idea_counts.get(src, 0)} " + f"| {avg:.2f} | {top_cat} |" + ) + + # Dimension averages per source + dims = ["novelty", "maturity", "overlap", "momentum", "relevance"] + lines.extend([ + "\n## Rating Dimensions by Source\n", + "| Source | Novelty | Maturity | Overlap | Momentum | Relevance |", + "|--------|--------:|---------:|--------:|---------:|----------:|", + ]) + for src in all_sources: + rats = source_ratings.get(src, {d: [] for d in dims}) + vals = [] + for d in dims: + v = rats.get(d, []) + vals.append(f"{sum(v)/len(v):.2f}" if v else "-") + lines.append(f"| {src.upper()} | {' | '.join(vals)} |") + + # Category distribution per source + all_cats = sorted({cat for cats in source_categories.values() for cat in cats}) + if all_cats: + lines.extend([ + "\n## Category Distribution by Source\n", + "| Category | " + " | ".join(s.upper() for s in all_sources) + " |", + "|----------|" + "|".join("------:" for _ in all_sources) + "|", + ]) + for cat in all_cats: + vals = [str(source_categories.get(src, {}).get(cat, 0)) for src in all_sources] + lines.append(f"| {cat} | {' | '.join(vals)} |") + + # Unique vs shared categories + source_cat_sets = {src: set(cats.keys()) for src, cats in source_categories.items()} + shared = set() + for s1, c1 in source_cat_sets.items(): + for s2, c2 in source_cat_sets.items(): + if s1 != s2: + shared |= (c1 & c2) + + lines.extend([ + "\n## Category Coverage Analysis\n", + f"**Shared categories** (covered by 2+ bodies): {', '.join(sorted(shared)) or 'none'}\n", + ]) + for src, cats in source_cat_sets.items(): + unique = cats - shared + if unique: + lines.append(f"**Unique to {src.upper()}**: {', '.join(sorted(unique))}") + + report = "\n".join(lines) + path = self.output_dir / "sources.md" + path.write_text(report) + return str(path) + + def false_positives_report(self) -> str: + """False positive profiling report — what makes drafts look AI-related but not be.""" + import json as _json + import re as _re + from collections import Counter as _Counter + + now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") + + # Get false positives + fp_rows = self.db.conn.execute( + """SELECT d.*, r.novelty, r.maturity, r.overlap, r.momentum, r.relevance, + r.summary, r.categories as r_categories, r.false_positive + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE r.false_positive = 1 + ORDER BY d.name""" + ).fetchall() + + # Get non-FP rated drafts for comparison + nonfp_rows = self.db.conn.execute( + """SELECT r.novelty, r.maturity, r.overlap, r.momentum, r.relevance + FROM ratings r WHERE COALESCE(r.false_positive, 0) = 0""" + ).fetchall() + + total_rated = self.db.conn.execute("SELECT COUNT(*) FROM ratings").fetchone()[0] + total_drafts = self.db.count_drafts(include_false_positives=True) + + fp_count = len(fp_rows) + pct_total = round(100 * fp_count / total_drafts, 1) if total_drafts else 0 + pct_rated = round(100 * fp_count / total_rated, 1) if total_rated else 0 + + lines = [ + "# False Positive Profile Report", + f"*Generated {now}*\n", + "## Overview\n", + f"- **False positives**: {fp_count} ({pct_total}% of {total_drafts} total drafts)", + f"- **% of rated**: {pct_rated}% of {total_rated} rated drafts", + f"- These drafts matched AI/agent search keywords but were flagged as not genuinely about AI agent infrastructure.\n", + ] + + # Source distribution + fp_sources: _Counter = _Counter() + fp_categories: _Counter = _Counter() + fp_dims = {"novelty": [], "maturity": [], "overlap": [], "momentum": [], "relevance": []} + nonfp_dims = {"novelty": [], "maturity": [], "overlap": [], "momentum": [], "relevance": []} + + for row in fp_rows: + src = row["source"] or "ietf" + fp_sources[src] += 1 + cats = _json.loads(row["r_categories"]) if row["r_categories"] else [] + for cat in cats: + fp_categories[cat] += 1 + for d in ["novelty", "maturity", "overlap", "momentum", "relevance"]: + fp_dims[d].append(row[d]) + + for row in nonfp_rows: + for d in ["novelty", "maturity", "overlap", "momentum", "relevance"]: + nonfp_dims[d].append(row[d]) + + lines.extend([ + "## By Source\n", + "| Source | FP Count | % of FPs |", + "|--------|--------:|---------:|", + ]) + for src, cnt in fp_sources.most_common(): + pct = round(100 * cnt / fp_count, 1) if fp_count else 0 + lines.append(f"| {src.upper()} | {cnt} | {pct}% |") + + # Rating comparison + dims = ["novelty", "maturity", "overlap", "momentum", "relevance"] + lines.extend([ + "\n## Rating Comparison: FP vs Non-FP\n", + "| Dimension | FP Avg | Non-FP Avg | Delta |", + "|-----------|------:|-----------:|------:|", + ]) + for d in dims: + fp_avg = sum(fp_dims[d]) / len(fp_dims[d]) if fp_dims[d] else 0 + nfp_avg = sum(nonfp_dims[d]) / len(nonfp_dims[d]) if nonfp_dims[d] else 0 + delta = fp_avg - nfp_avg + lines.append(f"| {d.capitalize()} | {fp_avg:.2f} | {nfp_avg:.2f} | {delta:+.2f} |") + + # Categories + lines.extend([ + "\n## Categories Assigned to False Positives\n", + "| Category | Count |", + "|----------|------:|", + ]) + for cat, cnt in fp_categories.most_common(): + lines.append(f"| {cat} | {cnt} |") + + # Top terms + stop_words = { + "the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", + "of", "with", "by", "from", "is", "it", "that", "this", "are", "was", + "be", "as", "can", "may", "will", "not", "has", "have", "been", "which", + "their", "its", "also", "such", "these", "would", "should", "could", + "more", "other", "than", "into", "about", "between", "over", "after", + "all", "one", "two", "new", "they", "we", "our", "each", "some", "any", + "there", "what", "when", "how", "where", "who", "does", "do", "did", + "no", "if", "so", "up", "out", "only", "used", "using", "use", "based", + "through", "both", "well", "within", "must", "while", "had", "were", + } + word_counter: _Counter = _Counter() + for row in fp_rows: + text = ((row["abstract"] or "") + " " + (row["title"] or "")).lower() + words = _re.findall(r'[a-z]{3,}', text) + for w in words: + if w not in stop_words: + word_counter[w] += 1 + + lines.extend([ + "\n## Top Terms in FP Abstracts\n", + "| Term | Occurrences |", + "|------|------------:|", + ]) + for term, cnt in word_counter.most_common(30): + lines.append(f"| {term} | {cnt} |") + + # Full list + lines.extend([ + "\n## All False Positives\n", + "| Draft | Title | Source | Relevance | Categories |", + "|-------|-------|--------|----------:|------------|", + ]) + for row in fp_rows: + cats = _json.loads(row["r_categories"]) if row["r_categories"] else [] + cat_str = ", ".join(cats[:2]) + src = (row["source"] or "ietf").upper() + title = (row["title"] or "")[:50] + lines.append(f"| {row['name']} | {title} | {src} | {row['relevance']} | {cat_str} |") + + report = "\n".join(lines) + path = self.output_dir / "false-positives.md" + path.write_text(report) + return str(path) + + def citations_report(self) -> str: + """Generate citation influence and BCP dependency analysis report.""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") + stats = self.db.ref_stats() + total_drafts = self.db.count_drafts() + + # Build rating lookup for categories + pairs_data = self.db.drafts_with_ratings(limit=500) + rating_map: dict[str, Rating] = {} + draft_cats: dict[str, str] = {} + for draft, rating in pairs_data: + rating_map[draft.name] = rating + cats = rating.categories if rating.categories else [] + draft_cats[draft.name] = cats[0] if cats else "Other" + + # Well-known RFC names + rfc_names = { + "2119": "Key words (MUST/SHALL/MAY)", "8174": "Key words update", + "8259": "JSON", "7519": "JWT", "6749": "OAuth 2.0", + "7540": "HTTP/2", "9110": "HTTP Semantics", "7525": "TLS Recommendations", + "8446": "TLS 1.3", "3986": "URIs", "7230": "HTTP/1.1 Syntax", + "7231": "HTTP/1.1 Semantics", "8288": "Web Linking", + "7515": "JWS", "7516": "JWE", "7517": "JWK", "7518": "JWA", + "9449": "DPoP", "6750": "OAuth Bearer", "8725": "JWT Best Practices", + "9396": "Rich Authorization Requests", "9101": "JAR", + "8414": "OAuth Server Metadata", "7591": "Dynamic Client Registration", + "8705": "mTLS for OAuth", "9068": "JWT Access Tokens", + "6819": "OAuth Threat Model", "9200": "ACE-OAuth", "9052": "COSE", + "8392": "CWT", "7252": "CoAP", + } + + lines = [ + "# Citation Influence & BCP Dependency Analysis", + f"*Generated {now} — {stats['drafts_with_refs']} of {total_drafts} drafts analyzed, " + f"{stats['total_refs']} total references " + f"({stats['rfc_refs']} RFC, {stats['draft_refs']} draft, {stats['bcp_refs']} BCP)*\n", + ] + + # ── Section 1: Top Cited RFCs ── + top_rfcs = self.db.top_referenced(ref_type="rfc", limit=20) + if top_rfcs: + lines.extend([ + "## Top 20 Most-Cited RFCs\n", + "| # | RFC | Name | Cited By |", + "|--:|-----|------|--------:|", + ]) + for rank, (rid, cnt, drafts) in enumerate(top_rfcs, 1): + name = rfc_names.get(rid, "") + lines.append(f"| {rank} | RFC {rid} | {name} | {cnt} drafts |") + + # ── Section 2: Top Citing Drafts ── + ref_counts = self.db.ref_counts_by_draft() + if ref_counts: + lines.extend([ + "\n## Top 20 Most-Citing Drafts\n", + "Drafts with the highest outgoing reference count.\n", + "| # | Draft | Category | RFCs | Drafts | BCPs | Total |", + "|--:|-------|----------|-----:|-------:|-----:|------:|", + ]) + for rank, (name, rfcs, drafts, bcps) in enumerate(ref_counts[:20], 1): + cat = draft_cats.get(name, "Other") + total = rfcs + drafts + bcps + lines.append(f"| {rank} | {name} | {cat} | {rfcs} | {drafts} | {bcps} | {total} |") + + # ── Section 3: PageRank-style influence ── + # Compute simple PageRank: sum of citation counts for each RFC cited + all_refs = self.db.conn.execute( + "SELECT draft_name, ref_type, ref_id FROM draft_refs" + ).fetchall() + rfc_in_degree: dict[str, int] = defaultdict(int) + for r in all_refs: + if r["ref_type"] == "rfc": + rfc_in_degree[r["ref_id"]] += 1 + + draft_influence: dict[str, float] = defaultdict(float) + draft_out: dict[str, int] = defaultdict(int) + for r in all_refs: + draft_out[r["draft_name"]] += 1 + if r["ref_type"] == "rfc" and r["ref_id"] in rfc_in_degree: + draft_influence[r["draft_name"]] += rfc_in_degree[r["ref_id"]] + + influence_sorted = sorted(draft_influence.items(), key=lambda x: x[1], reverse=True) + if influence_sorted: + lines.extend([ + "\n## Influence Score (PageRank-style)\n", + "Drafts ranked by weighted sum of how often their cited RFCs are themselves cited.\n", + "| # | Draft | Category | Out-Degree | Influence Score |", + "|--:|-------|----------|----------:|---------:|", + ]) + for rank, (name, score) in enumerate(influence_sorted[:20], 1): + cat = draft_cats.get(name, "Other") + out = draft_out.get(name, 0) + lines.append(f"| {rank} | {name} | {cat} | {out} | {score:.0f} |") + + # ── Section 4: Citation density by category ── + cat_totals: dict[str, int] = defaultdict(int) + cat_counts: dict[str, int] = defaultdict(int) + for name, count in draft_out.items(): + cat = draft_cats.get(name, "Other") + cat_totals[cat] += count + cat_counts[cat] += 1 + + if cat_totals: + lines.extend([ + "\n## Citation Density by Category\n", + "| Category | Drafts | Total Refs | Avg Refs/Draft |", + "|:---------|-------:|-----------:|---------------:|", + ]) + cat_items = sorted(cat_totals.items(), + key=lambda x: x[1] / cat_counts[x[0]] if cat_counts[x[0]] else 0, + reverse=True) + for cat, total in cat_items: + cnt = cat_counts[cat] + avg = total / cnt if cnt > 0 else 0 + lines.append(f"| {cat} | {cnt} | {total} | {avg:.1f} |") + + # ── Section 5: Draft-to-Draft citations ── + top_draft_refs = self.db.top_referenced(ref_type="draft", limit=20) + if top_draft_refs: + lines.extend([ + "\n## Most-Referenced Drafts (Draft-to-Draft)\n", + "| # | Draft | Cited By |", + "|--:|-------|--------:|", + ]) + for rank, (rid, cnt, _) in enumerate(top_draft_refs, 1): + lines.append(f"| {rank} | {rid} | {cnt} drafts |") + + # ═══════════════════════════════════════════════════ + # BCP DEPENDENCY ANALYSIS + # ═══════════════════════════════════════════════════ + lines.extend([ + "\n---\n", + "## BCP Dependency Analysis\n", + ]) + + # BCP stats + bcp_refs = self.db.top_referenced(ref_type="bcp", limit=100) + total_bcp_citations = sum(cnt for _, cnt, _ in bcp_refs) + unique_bcps = len(bcp_refs) + + # BCP coverage + drafts_with_bcp_rows = self.db.conn.execute( + "SELECT COUNT(DISTINCT draft_name) as cnt FROM draft_refs WHERE ref_type = 'bcp'" + ).fetchone() + drafts_with_bcp = drafts_with_bcp_rows["cnt"] + coverage = (drafts_with_bcp / total_drafts * 100) if total_drafts > 0 else 0 + + lines.extend([ + f"- **{unique_bcps}** unique BCPs cited across the corpus", + f"- **{total_bcp_citations}** total BCP citations", + f"- **{drafts_with_bcp}** of {total_drafts} drafts ({coverage:.1f}%) cite at least one BCP\n", + ]) + + # All BCPs ranked + if bcp_refs: + lines.extend([ + "### All BCPs by Citation Count\n", + "| # | BCP | Cited By | Example Drafts |", + "|--:|-----|--------:|:---------------|", + ]) + for rank, (bid, cnt, drafts) in enumerate(bcp_refs, 1): + examples = ", ".join(drafts[:3]) + more = f" +{len(drafts) - 3} more" if len(drafts) > 3 else "" + lines.append(f"| {rank} | BCP {bid} | {cnt} | {examples}{more} |") + + # BCP by category + bcp_all_rows = self.db.conn.execute( + "SELECT draft_name, ref_id FROM draft_refs WHERE ref_type = 'bcp'" + ).fetchall() + cat_bcp: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int)) + for r in bcp_all_rows: + cat = draft_cats.get(r["draft_name"], "Other") + cat_bcp[cat][r["ref_id"]] += 1 + + if cat_bcp: + lines.extend([ + "\n### BCP Usage by Category\n", + "| Category | BCP Refs | Unique BCPs | Top BCPs |", + "|:---------|--------:|-----------:|:---------|", + ]) + for cat in sorted(cat_bcp.keys(), + key=lambda c: sum(cat_bcp[c].values()), reverse=True): + total = sum(cat_bcp[cat].values()) + unique = len(cat_bcp[cat]) + top3 = sorted(cat_bcp[cat].items(), key=lambda x: x[1], reverse=True)[:3] + top_str = ", ".join(f"BCP{bid}({c})" for bid, c in top3) + lines.append(f"| {cat} | {total} | {unique} | {top_str} |") + + # BCP co-citation + draft_bcps: dict[str, list[str]] = defaultdict(list) + for r in bcp_all_rows: + draft_bcps[r["draft_name"]].append(r["ref_id"]) + + co_cite: dict[tuple[str, str], int] = defaultdict(int) + for _, bcps_list in draft_bcps.items(): + bcps_sorted = sorted(set(bcps_list)) + for i in range(len(bcps_sorted)): + for j in range(i + 1, len(bcps_sorted)): + co_cite[(bcps_sorted[i], bcps_sorted[j])] += 1 + + top_co = sorted(co_cite.items(), key=lambda x: x[1], reverse=True)[:15] + if top_co: + lines.extend([ + "\n### Top BCP Co-Citations\n", + "BCP pairs most frequently cited together in the same draft.\n", + "| BCP A | BCP B | Co-cited in |", + "|:------|:------|----------:|", + ]) + for (a, b), cnt in top_co: + lines.append(f"| BCP {a} | BCP {b} | {cnt} drafts |") + + report = "\n".join(lines) + path = self.output_dir / "citations.md" + path.write_text(report) + return str(path) + + def complexity_report(self) -> str: + """Generate draft complexity matrix report with correlations and outliers.""" + now = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC") + conn = self.db.conn + + # Gather per-draft complexity data + rows = conn.execute(""" + SELECT d.name, d.title, d.pages, d.source, + r.novelty, r.maturity, r.overlap, r.momentum, r.relevance, + r.categories, + (r.novelty + r.maturity + r.overlap + r.momentum + r.relevance) / 5.0 AS score + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE r.false_positive = 0 + """).fetchall() + + author_counts = dict(conn.execute( + "SELECT draft_name, COUNT(*) FROM draft_authors GROUP BY draft_name" + ).fetchall()) + citation_counts = dict(conn.execute( + "SELECT draft_name, COUNT(*) FROM draft_refs GROUP BY draft_name" + ).fetchall()) + idea_counts = dict(conn.execute( + "SELECT draft_name, COUNT(*) FROM ideas GROUP BY draft_name" + ).fetchall()) + + drafts_data = [] + for r in rows: + pages = r["pages"] + try: + cats = json.loads(r["categories"]) if r["categories"] else [] + except (json.JSONDecodeError, TypeError): + cats = [] + drafts_data.append({ + "name": r["name"], + "title": r["title"], + "pages": pages, + "author_count": author_counts.get(r["name"], 0), + "citation_count": citation_counts.get(r["name"], 0), + "idea_count": idea_counts.get(r["name"], 0), + "category_count": len(cats), + "novelty": r["novelty"], + "maturity": r["maturity"], + "overlap": r["overlap"], + "momentum": r["momentum"], + "relevance": r["relevance"], + "score": r["score"], + }) + + total_with_pages = sum(1 for d in drafts_data if d["pages"] is not None) + pages_pct = round(total_with_pages / len(drafts_data) * 100, 1) if drafts_data else 0 + + # Composite complexity + max_pages = max((d["pages"] for d in drafts_data if d["pages"] is not None), default=1) or 1 + max_auth = max((d["author_count"] for d in drafts_data), default=1) or 1 + max_cite = max((d["citation_count"] for d in drafts_data), default=1) or 1 + max_idea = max((d["idea_count"] for d in drafts_data), default=1) or 1 + for d in drafts_data: + p = (d["pages"] / max_pages) if d["pages"] is not None else 0.3 + d["complexity"] = round((p + d["author_count"] / max_auth + + d["citation_count"] / max_cite + + d["idea_count"] / max_idea) / 4, 3) + d["efficiency"] = round(d["score"] / (d["complexity"] + 0.1), 2) + + # Pearson correlation + metrics = ["pages", "author_count", "citation_count", "idea_count", "category_count"] + dimensions = ["novelty", "maturity", "overlap", "momentum", "relevance"] + + def _pearson(xs, ys): + n = len(xs) + if n < 3: + return 0.0 + mx, my = sum(xs) / n, sum(ys) / n + cov = sum((x - mx) * (y - my) for x, y in zip(xs, ys)) + sx = sum((x - mx) ** 2 for x in xs) ** 0.5 + sy = sum((y - my) ** 2 for y in ys) ** 0.5 + return round(cov / (sx * sy), 3) if sx and sy else 0.0 + + lines = [ + "# Draft Complexity Matrix", + f"*Generated {now} — {len(drafts_data)} rated drafts ({pages_pct}% have page data)*\n", + ] + + # Correlation matrix + lines.extend([ + "## Correlation Matrix\n", + "Pearson r between complexity metrics and rating dimensions.\n", + "| Metric |" + " | ".join(f" {d.capitalize()}" for d in dimensions) + " |", + "|--------|" + " | ".join("------:" for _ in dimensions) + " |", + ]) + + for metric in metrics: + vals = [] + for dim in dimensions: + if metric == "pages": + pairs_list = [(d[metric], d[dim]) for d in drafts_data if d[metric] is not None] + else: + pairs_list = [(d[metric], d[dim]) for d in drafts_data] + if len(pairs_list) >= 3: + xs, ys = zip(*pairs_list) + r_val = _pearson(list(xs), list(ys)) + else: + r_val = 0.0 + vals.append(f"{r_val:+.3f}") + label = metric.replace("_", " ").title() + lines.append(f"| {label} | " + " | ".join(vals) + " |") + + # Top 10 most complex + sorted_complex = sorted(drafts_data, key=lambda d: d["complexity"], reverse=True) + lines.extend([ + "\n## Top 10 Most Complex Drafts\n", + "| # | Draft | Pages | Authors | Citations | Ideas | Score | Complexity |", + "|---|-------|------:|--------:|----------:|------:|------:|-----------:|", + ]) + for i, d in enumerate(sorted_complex[:10], 1): + pages_str = str(d["pages"]) if d["pages"] is not None else "-" + lines.append( + f"| {i} | {d['name']} | {pages_str} | {d['author_count']} | " + f"{d['citation_count']} | {d['idea_count']} | {d['score']:.2f} | {d['complexity']:.3f} |" + ) + + # Top 10 most efficient + sorted_efficient = sorted(drafts_data, key=lambda d: d["efficiency"], reverse=True) + lines.extend([ + "\n## Top 10 Most Efficient Drafts\n", + "*High ratings despite low structural complexity.*\n", + "| # | Draft | Pages | Authors | Score | Efficiency |", + "|---|-------|------:|--------:|------:|-----------:|", + ]) + for i, d in enumerate(sorted_efficient[:10], 1): + pages_str = str(d["pages"]) if d["pages"] is not None else "-" + lines.append( + f"| {i} | {d['name']} | {pages_str} | {d['author_count']} | " + f"{d['score']:.2f} | {d['efficiency']:.1f} |" + ) + + # Stats summary + pages_vals = [d["pages"] for d in drafts_data if d["pages"] is not None] + avg_pages = sum(pages_vals) / len(pages_vals) if pages_vals else 0 + avg_auth = sum(d["author_count"] for d in drafts_data) / len(drafts_data) if drafts_data else 0 + avg_cite = sum(d["citation_count"] for d in drafts_data) / len(drafts_data) if drafts_data else 0 + + lines.extend([ + "\n## Summary Statistics\n", + f"- **Average pages**: {avg_pages:.1f} ({pages_pct}% coverage)", + f"- **Average authors**: {avg_auth:.1f}", + f"- **Average citations**: {avg_cite:.1f}", + f"- **Total drafts analyzed**: {len(drafts_data)}", + ]) + + report = "\n".join(lines) + path = self.output_dir / "complexity.md" + path.write_text(report) + return str(path) + diff --git a/src/ietf_analyzer/sources/nist.py b/src/ietf_analyzer/sources/nist.py new file mode 100644 index 0000000..829c233 --- /dev/null +++ b/src/ietf_analyzer/sources/nist.py @@ -0,0 +1,227 @@ +"""Fetch AI-related publications from NIST CSRC. + +NIST has no formal API but the publications search returns structured HTML. +We scrape the search results and supplement with a curated catalog of key +AI publications (AI RMF, AI 100-series, etc.). +""" + +from __future__ import annotations + +import re +import time as time_mod + +import httpx +from rich.console import Console + +from ..config import Config +from .base import SourceDocument + +console = Console() + +NIST_SEARCH_URL = "https://csrc.nist.gov/publications/search" +NIST_PUB_BASE = "https://csrc.nist.gov/publications/detail/" + +# Curated catalog of key NIST AI publications +NIST_AI_CATALOG = [ + # AI 100 series + ("AI 100-1", "Artificial Intelligence Risk Management Framework (AI RMF 1.0)", + "The AI RMF provides a framework for managing risks associated with AI systems throughout their lifecycle, " + "addressing characteristics of trustworthy AI including validity, reliability, safety, security, resilience, " + "accountability, transparency, explainability, interpretability, privacy, and fairness.", + "https://csrc.nist.gov/pubs/ai/100/1/final", "2023-01-26", "Final"), + ("AI 100-2 E2025", "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations", + "Establishes a taxonomy of adversarial machine learning attacks and mitigations, covering evasion, poisoning, " + "and privacy attacks against predictive AI and generative AI systems including LLMs.", + "https://csrc.nist.gov/pubs/ai/100/2/e2025/final", "2025-03-24", "Final"), + ("AI 100-4", "Reducing Risks Posed by Synthetic Content", + "Provides guidance on approaches to manage risks from synthetic content generated by AI, " + "including authentication, provenance tracking, and content labeling.", + "https://csrc.nist.gov/pubs/ai/100/4/final", "2024-07-26", "Final"), + ("AI 600-1", "Artificial Intelligence Risk Management Framework: Generative AI Profile", + "Companion resource for the AI RMF focusing on risks unique to or exacerbated by generative AI, " + "covering hallucinations, data privacy, intellectual property, and CBRN information risks.", + "https://csrc.nist.gov/pubs/ai/600/1/final", "2024-07-26", "Final"), + # Special Publications + ("SP 800-218A", "Secure Software Development Practices for Generative AI and Dual-Use Foundation Models", + "An SSDF community profile providing secure software development practices specific to generative AI " + "and dual-use foundation models, addressing data management, model training, and deployment security.", + "https://csrc.nist.gov/pubs/sp/800/218/a/final", "2024-07-26", "Final"), + ("IR 8596", "Cybersecurity Framework Profile for Artificial Intelligence", + "A community profile mapping NIST Cybersecurity Framework functions to AI-specific risks and controls, " + "helping organizations manage cybersecurity risks in AI systems.", + "https://csrc.nist.gov/pubs/ir/8596/ipd", "2025-12-16", "Draft"), + ("CSWP 31", "Proxy Validation and Verification for Critical AI Systems", + "Describes a proxy design process for validation and verification of critical AI systems, " + "addressing challenges of testing AI systems that are complex and opaque.", + "https://csrc.nist.gov/pubs/cswp/31/final", "2024-09-26", "Final"), + ("IR 8579", "Developing the NCCoE Chatbot: Technical and Security Learnings", + "Documents technical and security lessons from implementing an LLM-based chatbot at the " + "National Cybersecurity Center of Excellence, including prompt injection mitigations.", + "https://csrc.nist.gov/pubs/ir/8579/ipd", "2025-07-31", "Draft"), + # Concept papers and other + ("NIST-AI-Agent-IAM", "Accelerating the Adoption of Software and AI Agent Identity and Authorization", + "Concept paper on identity and authorization for AI agents, addressing how software agents " + "authenticate, authorize, and maintain accountability in automated systems.", + "https://csrc.nist.gov/publications/detail/other/2026/02/05/accelerating-the-adoption-of-software-and-artificial-intelligence-agent-identity-and-authorization/draft", + "2026-02-05", "Draft"), + # Trustworthy AI + ("AI 100-6", "AI Measurement and Evaluation", + "Guidance on measurement and evaluation approaches for AI systems to assess trustworthiness " + "characteristics including accuracy, fairness, robustness, and security.", + "https://csrc.nist.gov/pubs/ai/100/6/final", "2024-04-29", "Final"), + # AI RMF playbook + ("AI 100-1 Playbook", "AI RMF Playbook", + "Practical companion to the AI RMF providing suggested actions for achieving AI risk management outcomes, " + "organized by the framework's functions: Govern, Map, Measure, and Manage.", + "https://airc.nist.gov/AI_RMF_Playbook", "2023-01-26", "Final"), +] + + +def _nist_id_to_name(nist_id: str) -> str: + """Convert NIST ID to slug. E.g. 'AI 100-1' -> 'nist-ai-100-1'.""" + slug = nist_id.lower().replace(" ", "-").replace("/", "-").replace(".", "-") + return f"nist-{slug}" + + +class NISTFetcher: + """Fetch AI-related publications from NIST CSRC. + + Combines a curated catalog with search scraping for discovery. + """ + + def __init__(self, config: Config | None = None): + self.config = config or Config.load() + self.client = httpx.Client(timeout=30, follow_redirects=True) + + def search( + self, keywords: list[str], since: str | None = None + ) -> list[SourceDocument]: + """Return AI-relevant NIST publications.""" + seen: dict[str, SourceDocument] = {} + + # Strategy 1: Curated catalog + console.print(" Loading NIST AI publication catalog...") + for nist_id, title, abstract, url, date, status in NIST_AI_CATALOG: + if since and date and date < since: + continue + name = _nist_id_to_name(nist_id) + seen[name] = SourceDocument( + name=name, + title=f"NIST {nist_id}: {title}", + abstract=abstract, + source="nist", + source_id=nist_id, + source_url=url, + time=date, + doc_status=status.lower(), + ) + + # Strategy 2: Search CSRC for additional AI publications + console.print(" Searching NIST CSRC for AI publications...") + search_terms = ["artificial intelligence", "machine learning", "large language model"] + for term in search_terms: + new_docs = self._search_csrc(term, since) + for doc in new_docs: + if doc.name not in seen: + seen[doc.name] = doc + time_mod.sleep(0.5) + + console.print(f" Found [bold green]{len(seen)}[/] NIST publications") + return list(seen.values()) + + def _search_csrc(self, keyword: str, since: str | None) -> list[SourceDocument]: + """Search NIST CSRC publications page.""" + docs = [] + try: + resp = self.client.get( + NIST_SEARCH_URL, + params={ + "keywords": keyword, + "status": "Final,Draft", + "sortBy": "relevance", + }, + ) + if resp.status_code != 200: + return docs + + # Parse search results HTML + # Results are in structured divs with title, series, date, abstract + # Pattern: TITLE + entries = re.findall( + r']*href="(/publications/detail/[^"]+)"[^>]*>([^<]+)', + resp.text, + ) + + for href, title in entries: + title = title.strip() + if not title or len(title) < 10: + continue + + # Extract a usable ID from the URL path + parts = href.rstrip("/").split("/") + # e.g. /publications/detail/sp/800/218/a/final -> sp-800-218-a + slug_parts = [p for p in parts[3:] if p not in ("final", "draft", "ipd", "fpd")] + nist_id = "-".join(slug_parts).upper() + name = f"nist-{'-'.join(slug_parts).lower()}" + + if not name or name == "nist-": + continue + + docs.append(SourceDocument( + name=name, + title=title, + abstract=title, # Will be enriched later if needed + source="nist", + source_id=nist_id, + source_url=f"https://csrc.nist.gov{href}", + time="", + doc_status="published", + )) + + except httpx.HTTPError as e: + console.print(f"[yellow]NIST search error: {e}[/]") + return docs + + def download_text(self, doc: SourceDocument) -> str | None: + """NIST publications are free PDFs — try to fetch and extract.""" + url = doc.source_url + if not url: + return None + try: + # First get the publication page to find the PDF link + resp = self.client.get(url) + if resp.status_code != 200: + return None + + # Look for PDF download link + pdf_match = re.search(r'href="([^"]+\.pdf)"', resp.text) + if not pdf_match: + # Extract abstract from page instead + abstract_match = re.search( + r'(?:Abstract|Summary|Description)[:\s]*]+>\s*<[^>]+>(.+?)]+>', '', abstract_match.group(1)).strip() + return text[:5000] if text else None + return None + + pdf_url = pdf_match.group(1) + if not pdf_url.startswith("http"): + pdf_url = f"https://csrc.nist.gov{pdf_url}" + + resp = self.client.get(pdf_url) + resp.raise_for_status() + try: + from io import BytesIO + from pdfminer.high_level import extract_text + text = extract_text(BytesIO(resp.content)) + return text[:100000] if text else None + except ImportError: + return f"[PDF document: {doc.title}. Install pdfminer.six to extract text.]" + except httpx.HTTPError as e: + console.print(f"[dim]Could not download {doc.name}: {e}[/]") + return None + + def close(self) -> None: + self.client.close() diff --git a/src/webui/app.py b/src/webui/app.py index da544f1..0117cac 100644 --- a/src/webui/app.py +++ b/src/webui/app.py @@ -55,6 +55,14 @@ from webui.data import ( get_ask_synthesize, get_category_summary, global_search, + get_architecture, + get_source_comparison, + get_false_positive_profile, + get_citation_influence, + get_bcp_analysis, + get_trends_data, + get_complexity_data, + get_idea_analysis, ) app = Flask( @@ -306,6 +314,17 @@ def idea_clusters(): return render_template("idea_clusters.html", clusters=data) +@app.route("/architecture") +def architecture(): + data = get_architecture(db()) + return render_template("architecture.html", arch=data) + + +@app.route("/api/architecture") +def api_architecture(): + return jsonify(get_architecture(db())) + + @app.route("/similarity") def similarity(): network = get_similarity_graph(db()) @@ -331,7 +350,9 @@ def authors(): @app.route("/citations") def citations(): graph = get_citation_graph(db()) - return render_template("citations.html", graph=graph) + influence = get_citation_influence(db()) + bcp = get_bcp_analysis(db()) + return render_template("citations.html", graph=graph, influence=influence, bcp=bcp) @app.route("/monitor") @@ -674,6 +695,88 @@ def create_app(dev: bool = False) -> Flask: return app +# ── Sources & False Positives ──────────────────────────────────────────── + + +@app.route("/sources") +def sources_page(): + data = get_source_comparison(db()) + return render_template("sources.html", data=data) + + +@app.route("/false-positives") +def false_positives_page(): + data = get_false_positive_profile(db()) + return render_template("false_positives.html", data=data) + + +@app.route("/api/sources") +def api_sources(): + data = get_source_comparison(db()) + return jsonify(data) + + +@app.route("/api/false-positives") +def api_false_positives(): + data = get_false_positive_profile(db()) + return jsonify(data) + + +# ── Citation Influence & BCP ───────────────────────────────────────────── + + +@app.route("/api/citations/influence") +def api_citation_influence(): + return jsonify(get_citation_influence(db())) + + +@app.route("/api/citations/bcp") +def api_bcp_analysis(): + return jsonify(get_bcp_analysis(db())) + + +# ── Trends & Complexity ────────────────────────────────────────────────── + + +@app.route("/trends") +def trends(): + data = get_trends_data(db()) + return render_template("trends_analysis.html", data=data) + + +@app.route("/complexity") +def complexity(): + data = get_complexity_data(db()) + return render_template("complexity.html", data=data) + + +@app.route("/api/trends") +def api_trends(): + data = get_trends_data(db()) + return jsonify(data) + + +@app.route("/api/complexity") +def api_complexity(): + data = get_complexity_data(db()) + return jsonify(data) + + +# ── Idea Analysis ──────────────────────────────────────────────────────── + + +@app.route("/idea-analysis") +def idea_analysis(): + data = get_idea_analysis(db()) + return render_template("idea_analysis.html", data=data) + + +@app.route("/api/idea-analysis") +def api_idea_analysis(): + data = get_idea_analysis(db()) + return jsonify(data) + + if __name__ == "__main__": import argparse diff --git a/src/webui/data.py b/src/webui/data.py index c764d16..1fe8724 100644 --- a/src/webui/data.py +++ b/src/webui/data.py @@ -915,9 +915,74 @@ def _compute_author_network_full(db: Database) -> AuthorNetwork: clusters.sort(key=lambda c: c["size"], reverse=True) + # Generate meaningful names for clusters + for cl in clusters: + cl["name"] = _author_cluster_name(cl) + return {"nodes": nodes, "edges": edges, "clusters": clusters} +def _normalize_org(name: str) -> str: + """Shorten verbose org names for display.""" + # Remove common suffixes + for suffix in (", Inc.", " Inc.", ", Ltd.", " Ltd.", " Co.", " Technologies", + " Corporation", " Corp.", " Limited", " GmbH", " AG", + " Europe Ltd", " Research", " Systems"): + name = name.replace(suffix, "") + return name.strip().rstrip(",").rstrip("&").rstrip() + + +def _author_cluster_name(cluster: dict) -> str: + """Derive a meaningful name for an author cluster from orgs and draft titles.""" + # Org part: top 1-2 orgs, normalized + raw_orgs = list(cluster.get("org_mix", {}).keys()) + orgs = [] + seen_short: set[str] = set() + for o in raw_orgs: + short = _normalize_org(o) + if short.lower() not in seen_short: + seen_short.add(short.lower()) + orgs.append(short) + if len(orgs) >= 2: + org_label = f"{orgs[0]} + {orgs[1]}" + elif orgs: + org_label = orgs[0] + else: + # Fall back to first member's last name + members = cluster.get("members", []) + org_label = members[0].split()[-1] if members else "Unknown" + + # Topic part: extract common keywords from draft titles + stopwords = { + "a", "an", "the", "of", "for", "in", "to", "and", "on", "with", + "using", "based", "draft", "internet", "ietf", "protocol", "framework", + "requirements", "architecture", "considerations", "use", "cases", "via", + "towards", "over", "from", "into", "between", "specification", "extension", + "extensions", "mechanisms", "mechanism", "version", "new", "general", + } + word_counts: Counter = Counter() + for d in cluster.get("drafts", []): + title = d.get("title", "") + words = re.findall(r"[A-Za-z]{3,}", title) + for w in words: + wl = w.lower() + if wl not in stopwords: + word_counts[wl] += 1 + + # Pick top keyword(s) that appear in multiple drafts + top_words = [w for w, c in word_counts.most_common(3) if c >= 2] + if not top_words: + top_words = [w for w, _ in word_counts.most_common(1)] + + if top_words: + topic = " ".join(w.capitalize() for w in top_words[:2]) + name = f"{org_label} — {topic}" + else: + name = org_label + # Truncate if too long for display + return name if len(name) <= 50 else name[:47] + "…" + + def get_idea_clusters(db: Database) -> dict: """Cluster ideas (cached for 5 min).""" return _cached("idea_clusters", lambda: _compute_idea_clusters(db)) @@ -936,16 +1001,24 @@ def _compute_idea_clusters(db: Database) -> dict: if not embeddings: return {"clusters": [], "scatter": [], "stats": {"total": 0, "clustered": 0, "num_clusters": 0}, "empty": True} + # Exclude ideas from false-positive drafts + fp_names = {r[0] for r in db.conn.execute( + "SELECT draft_name FROM ratings WHERE false_positive = 1").fetchall()} + # Fetch ideas with IDs for metadata lookup rows = db.conn.execute("SELECT id, title, description, idea_type, draft_name FROM ideas").fetchall() idea_map = {r["id"]: {"title": r["title"], "description": r["description"], - "type": r["idea_type"], "draft_name": r["draft_name"]} for r in rows} + "type": r["idea_type"], "draft_name": r["draft_name"]} + for r in rows if r["draft_name"] not in fp_names} + + # Remove FP ideas from embeddings too + embeddings = {k: v for k, v in embeddings.items() if k in idea_map} # Draft -> WG and category lookup draft_rows = db.conn.execute('SELECT name, "group", title FROM drafts').fetchall() draft_wg = {r["name"]: r["group"] or "none" for r in draft_rows} draft_title_map = {r["name"]: r["title"] for r in draft_rows} - rating_rows = db.conn.execute("SELECT draft_name, categories FROM ratings").fetchall() + rating_rows = db.conn.execute("SELECT draft_name, categories FROM ratings WHERE COALESCE(false_positive, 0) = 0").fetchall() draft_cats: dict[str, list[str]] = {} for r in rating_rows: try: @@ -1424,7 +1497,1216 @@ def global_search(db: Database, query: str) -> SearchResults: like = f"%{q}%" rows = db.conn.execute( """SELECT id, title, description, idea_type, draft_name FROM ideas - WHERE title LIKE ? OR description LIKE ? + WHERE (title LIKE ? OR description LIKE ?) + AND draft_name NOT IN (SELECT draft_name FROM ratings WHERE false_positive = 1) + ORDER BY id LIMIT 50""", + (like, like), + ).fetchall() + for r in rows: + results["ideas"].append({ + "id": r["id"], + "title": r["title"], + "description": (r["description"] or "")[:200], + "type": r["idea_type"], + "draft_name": r["draft_name"], + }) + + # 3. Authors via LIKE + rows = db.conn.execute( + """SELECT person_id, name, affiliation FROM authors + WHERE name LIKE ? OR affiliation LIKE ? + ORDER BY name LIMIT 50""", + (like, like), + ).fetchall() + for r in rows: + results["authors"].append({ + "person_id": r["person_id"], + "name": r["name"], + "affiliation": r["affiliation"] or "", + }) + + # 4. Gaps via LIKE + rows = db.conn.execute( + """SELECT id, topic, description, category, severity FROM gaps + WHERE topic LIKE ? OR description LIKE ? + ORDER BY id LIMIT 50""", + (like, like), + ).fetchall() + for r in rows: + results["gaps"].append({ + "id": r["id"], + "topic": r["topic"], + "description": (r["description"] or "")[:200], + "category": r["category"], + "severity": r["severity"], + }) + + return results + + +def get_landscape_tsne(db: Database) -> list[dict]: + """Compute t-SNE (cached for 5 min).""" + return _cached("landscape_tsne", lambda: _compute_landscape_tsne(db)) + + +def _compute_landscape_tsne(db: Database) -> list[dict]: + """Compute t-SNE from embeddings, return [{name, title, x, y, category, score}].""" + + + embeddings = db.all_embeddings() + if len(embeddings) < 5: + return [] + + pairs = db.drafts_with_ratings(limit=1000) + rating_map = {d.name: r for d, r in pairs} + draft_map = {d.name: d for d, _ in pairs} + + # Filter to drafts that have both embeddings and ratings + names = [n for n in embeddings if n in rating_map] + if len(names) < 5: + return [] + + matrix = np.array([embeddings[n] for n in names]) + + try: + tsne = TSNE(n_components=2, perplexity=min(30, len(names) - 1), + random_state=42, max_iter=500) + coords = tsne.fit_transform(matrix) + except Exception: + return [] + + result = [] + for i, name in enumerate(names): + r = rating_map[name] + d = draft_map.get(name) + result.append({ + "name": name, + "title": d.title if d else name, + "x": round(float(coords[i, 0]), 3), + "y": round(float(coords[i, 1]), 3), + "category": r.categories[0] if r.categories else "Other", + "score": round(r.composite_score, 2), + }) + return result + + +def get_comparison_data(db: Database, names: list[str]) -> dict | None: + """Get comparison data for a list of drafts. + + Returns { + drafts: [{name, title, abstract, rating, ideas, refs, ...}], + shared_ideas: [{title, drafts: [name,...]}], + unique_ideas: {name: [{title, description}]}, + shared_refs: [{type, id, drafts: [name,...]}], + unique_refs: {name: [{type, id}]}, + similarities: [{a, b, similarity}], + comparison_text: str | None, + } + """ + + + drafts_data = [] + all_ideas: dict[str, list[dict]] = {} + all_refs: dict[str, list[tuple[str, str]]] = {} + + for name in names: + detail = get_draft_detail(db, name) + if not detail: + continue + drafts_data.append(detail) + all_ideas[name] = detail.get("ideas", []) + all_refs[name] = [(r["type"], r["id"]) for r in detail.get("refs", [])] + + if len(drafts_data) < 2: + return None + + # Find shared vs unique ideas (by title similarity) + idea_title_drafts: dict[str, list[str]] = {} + for name, ideas in all_ideas.items(): + for idea in ideas: + title_lower = idea["title"].lower().strip() + if title_lower not in idea_title_drafts: + idea_title_drafts[title_lower] = [] + idea_title_drafts[title_lower].append(name) + + shared_ideas = [ + {"title": title, "drafts": draft_list} + for title, draft_list in idea_title_drafts.items() + if len(set(draft_list)) > 1 + ] + unique_ideas: dict[str, list[dict]] = {} + for name, ideas in all_ideas.items(): + unique = [] + for idea in ideas: + title_lower = idea["title"].lower().strip() + if len(set(idea_title_drafts.get(title_lower, []))) <= 1: + unique.append({"title": idea["title"], "description": idea.get("description", "")}) + unique_ideas[name] = unique + + # Find shared vs unique references + ref_drafts: dict[tuple[str, str], list[str]] = {} + for name, refs in all_refs.items(): + for ref in refs: + if ref not in ref_drafts: + ref_drafts[ref] = [] + ref_drafts[ref].append(name) + + shared_refs = [ + {"type": ref[0], "id": ref[1], "drafts": draft_list} + for ref, draft_list in ref_drafts.items() + if len(set(draft_list)) > 1 + ] + unique_refs: dict[str, list[dict]] = {} + for name, refs in all_refs.items(): + unique = [] + for ref in refs: + if len(set(ref_drafts.get(ref, []))) <= 1: + unique.append({"type": ref[0], "id": ref[1]}) + unique_refs[name] = unique + + # Pairwise embedding similarities + embeddings = db.all_embeddings() + similarities = [] + valid_names = [d["name"] for d in drafts_data] + for i in range(len(valid_names)): + for j in range(i + 1, len(valid_names)): + a, b = valid_names[i], valid_names[j] + if a in embeddings and b in embeddings: + vec_a = embeddings[a] + vec_b = embeddings[b] + dot = np.dot(vec_a, vec_b) + norm = np.linalg.norm(vec_a) * np.linalg.norm(vec_b) + sim = float(dot / norm) if norm > 0 else 0.0 + similarities.append({"a": a, "b": b, "similarity": round(sim, 4)}) + + return { + "drafts": drafts_data, + "shared_ideas": shared_ideas, + "unique_ideas": unique_ideas, + "shared_refs": shared_refs, + "unique_refs": unique_refs, + "similarities": similarities, + "comparison_text": None, + } + + +# --------------------------------------------------------------------------- +# Architecture Designer — System-of-Systems view +# --------------------------------------------------------------------------- + +# Architectural layers (bottom-up stack) +_ARCH_LAYERS = [ + {"id": "transport", "label": "Transport & Networking", "order": 0, + "keywords": {"transport", "network", "routing", "tunnel", "packet", "flow", "traffic", "qos", "sdwan", "mpls", "bgp", "ospf", "segment", "srv6", "quic", "http", "grpc", "mqtt", "yang", "snmp", "netconf", "restconf"}}, + {"id": "identity", "label": "Identity & Trust", "order": 1, + "keywords": {"identity", "auth", "authentication", "authorization", "credential", "certificate", "trust", "attestation", "oauth", "token", "signing", "verification", "verifiable", "did", "vc", "pki", "spiffe", "acl"}}, + {"id": "discovery", "label": "Discovery & Registration", "order": 2, + "keywords": {"discovery", "registration", "registry", "catalog", "advertisement", "announce", "capability", "service", "lookup", "resolution", "dns", "directory"}}, + {"id": "communication", "label": "Agent Communication", "order": 3, + "keywords": {"a2a", "agent", "communication", "message", "messaging", "protocol", "exchange", "negotiation", "handshake", "session", "dialogue", "interaction", "mcp", "interop"}}, + {"id": "coordination", "label": "Task & Coordination", "order": 4, + "keywords": {"task", "delegation", "orchestration", "workflow", "planning", "coordination", "consensus", "collaboration", "multi-agent", "swarm", "composition", "scheduling"}}, + {"id": "intelligence", "label": "AI & Inference", "order": 5, + "keywords": {"model", "inference", "learning", "training", "ml", "neural", "llm", "embedding", "reasoning", "decision", "prediction", "classification", "generative", "rag", "fine-tuning"}}, + {"id": "safety", "label": "Safety & Governance", "order": 6, + "keywords": {"safety", "ethical", "governance", "policy", "audit", "explainability", "transparency", "accountability", "bias", "fairness", "compliance", "regulation", "risk", "shutdown", "alignment", "adversarial", "privacy", "consent"}}, + {"id": "application", "label": "Application Domains", "order": 7, + "keywords": {"healthcare", "autonomous", "vehicle", "robotics", "iot", "digital twin", "supply chain", "finance", "manufacturing", "energy", "smart", "edge", "cloud", "sensing"}}, +] + +_LAYER_KEYWORDS = {l["id"]: l["keywords"] for l in _ARCH_LAYERS} + + +def _classify_to_layer(text: str) -> str: + """Classify a piece of text to the best-matching architectural layer.""" + text_lower = text.lower() + words = set(re.findall(r"[a-z][a-z0-9-]+", text_lower)) + scores: dict[str, int] = {} + for layer_id, kws in _LAYER_KEYWORDS.items(): + scores[layer_id] = len(words & kws) + # Also check for multi-word keywords as substrings + for kw in kws: + if len(kw) > 4 and kw in text_lower: + scores[layer_id] += 1 + best = max(scores, key=lambda k: scores[k]) + return best if scores[best] > 0 else "communication" # default + + +def get_architecture(db: Database) -> dict: + """Build system-of-systems architecture from idea clusters, gaps, and source coverage.""" + return _cached("architecture", lambda: _compute_architecture(db), ttl=600) + + +def _compute_architecture(db: Database) -> dict: + """Compute the architecture view. + + Returns: + { + "components": [...], # architectural building blocks + "dependencies": [...], # edges between components + "gaps": [...], # gaps mapped to layers + "layers": [...], # layer definitions + "source_coverage": {...}, # per-layer source coverage + "stats": {...} + } + """ + # --- Gather raw data --- + cluster_data = get_idea_clusters(db) + clusters = cluster_data.get("clusters", []) + links = cluster_data.get("links", []) + all_gaps = db.all_gaps() + + # Source coverage: count drafts per source per layer + draft_rows = db.conn.execute( + "SELECT d.name, d.title, d.abstract, d.source, r.categories " + "FROM drafts d LEFT JOIN ratings r ON d.name = r.draft_name " + "WHERE COALESCE(r.false_positive, 0) = 0" + ).fetchall() + + # Build components from idea clusters + components = [] + cluster_to_component: dict[int, int] = {} # cluster_id -> component index + + for cl in clusters: + if cl["size"] < 3: + continue # skip tiny clusters + + # Determine layer from cluster theme + idea titles + text_blob = cl.get("theme", "") + for idea in cl.get("ideas", [])[:10]: + text_blob += " " + idea.get("title", "") + " " + idea.get("description", "") + layer = _classify_to_layer(text_blob) + + # Source coverage for this component's drafts + draft_names = set(cl.get("drafts", [])) + sources: Counter = Counter() + comp_drafts: list[dict] = [] + for dr in draft_rows: + if dr["name"] in draft_names: + sources[dr["source"] or "ietf"] += 1 + comp_drafts.append({"name": dr["name"], "title": (dr["title"] or dr["name"])[:80], "source": dr["source"] or "ietf"}) + + # Idea type breakdown + type_counts: Counter = Counter() + for idea in cl.get("ideas", []): + t = idea.get("type", "") + if t: + type_counts[t] += 1 + + # Maturity: rough proxy from idea count and source diversity + maturity = min(5, 1 + len(sources) + (1 if cl["size"] >= 10 else 0) + (1 if cl.get("cross_wg") else 0)) + + comp = { + "id": len(components), + "cluster_id": cl["id"], + "name": cl.get("theme", f"Component {cl['id']}"), + "layer": layer, + "size": cl["size"], + "draft_count": len(draft_names), + "drafts": comp_drafts[:20], + "sources": dict(sources.most_common()), + "type_breakdown": dict(type_counts.most_common(5)), + "maturity": maturity, + "wgs": cl.get("wgs", [])[:3], + "top_ideas": [{"title": i["title"], "type": i.get("type", ""), "draft_name": i.get("draft_name", "")} + for i in cl.get("ideas", [])[:5]], + "categories": cl.get("categories", []), + } + cluster_to_component[cl["id"]] = comp["id"] + components.append(comp) + + # Build dependencies from cross-cluster links + dependencies = [] + for link in links: + src_comp = cluster_to_component.get(link["source"]) + tgt_comp = cluster_to_component.get(link["target"]) + if src_comp is not None and tgt_comp is not None and src_comp != tgt_comp: + dependencies.append({ + "source": src_comp, + "target": tgt_comp, + "similarity": link.get("best_pair_sim", link.get("similarity", 0)), + "idea_a": link.get("idea_a", ""), + "idea_b": link.get("idea_b", ""), + }) + + # Map gaps to layers + gap_items = [] + for gap in all_gaps: + text = gap["topic"] + " " + gap.get("description", "") + " " + gap.get("category", "") + layer = _classify_to_layer(text) + gap_items.append({ + "id": gap["id"], + "topic": gap["topic"], + "description": gap["description"], + "evidence": gap.get("evidence", ""), + "severity": gap.get("severity", "medium"), + "category": gap.get("category", ""), + "layer": layer, + }) + + # Source coverage per layer + source_coverage: dict[str, dict[str, int]] = {l["id"]: Counter() for l in _ARCH_LAYERS} + for dr in draft_rows: + text = (dr["title"] or "") + " " + (dr["abstract"] or "")[:200] + layer = _classify_to_layer(text) + source_coverage[layer][dr["source"] or "ietf"] += 1 + # Convert Counters to dicts + source_coverage = {k: dict(v) for k, v in source_coverage.items()} + + # Layer summary stats + layer_info = [] + for l in _ARCH_LAYERS: + lid = l["id"] + comp_count = sum(1 for c in components if c["layer"] == lid) + idea_count = sum(c["size"] for c in components if c["layer"] == lid) + gap_count = sum(1 for g in gap_items if g["layer"] == lid) + layer_info.append({ + "id": l["id"], + "label": l["label"], + "order": l["order"], + "component_count": comp_count, + "idea_count": idea_count, + "gap_count": gap_count, + "coverage": source_coverage.get(lid, {}), + "total_drafts": sum(source_coverage.get(lid, {}).values()), + }) + + return { + "components": components, + "dependencies": dependencies, + "gaps": gap_items, + "layers": layer_info, + "stats": { + "total_components": len(components), + "total_dependencies": len(dependencies), + "total_gaps": len(gap_items), + "layers_with_gaps": len(set(g["layer"] for g in gap_items)), + }, + } + + +def get_ask_search(db: Database, question: str, top_k: int = 5) -> dict: + """Search-only (free) — returns sources + cached answer if available.""" + config = Config.load() + searcher = HybridSearch(config, db) + return searcher.search_only(question, top_k=top_k) + + +def get_ask_synthesize(db: Database, question: str, top_k: int = 5, cheap: bool = True) -> dict: + """Run Claude synthesis (costs tokens, result is cached permanently).""" + config = Config.load() + searcher = HybridSearch(config, db) + return searcher.ask(question, top_k=top_k, cheap=cheap) + + +# ── New Analysis Functions ────────────────────────────────────────────── + +def get_idea_analysis(db: Database) -> dict: + """Return comprehensive idea analysis data for the idea-analysis page. + + Includes novelty distribution, type breakdown with avg novelty, + top novel ideas, ideas-per-draft distribution, cross-tab of type x source, + shared ideas across drafts, and idea novelty vs draft rating correlation. + """ + from collections import Counter, defaultdict + from difflib import SequenceMatcher + + # Fetch raw data + all_ideas = db.conn.execute( + """SELECT i.id, i.draft_name, i.title, i.description, i.idea_type, + i.novelty_score + FROM ideas i ORDER BY i.novelty_score DESC NULLS LAST""" + ).fetchall() + all_ideas = [dict(r) for r in all_ideas] + + # Draft ratings lookup + ratings_rows = db.conn.execute( + """SELECT d.name, d.title as draft_title, d.source, + r.novelty AS r_novelty, r.maturity, r.overlap, r.momentum, r.relevance + FROM drafts d LEFT JOIN ratings r ON d.name = r.draft_name""" + ).fetchall() + draft_info = {} + for r in ratings_rows: + row = dict(r) + # Compute composite score (average of 5 dimensions) + dims = [row.get("r_novelty"), row.get("maturity"), row.get("overlap"), + row.get("momentum"), row.get("relevance")] + valid = [d for d in dims if d is not None] + row["composite_score"] = sum(valid) / len(valid) if valid else None + draft_info[row["name"]] = row + + total = len(all_ideas) + scored = [i for i in all_ideas if i.get("novelty_score") is not None] + unscored = total - len(scored) + avg_novelty = sum(i["novelty_score"] for i in scored) / len(scored) if scored else 0 + + # Embedding coverage + embed_count = db.conn.execute("SELECT COUNT(*) FROM idea_embeddings").fetchone()[0] + + # --- Novelty score distribution (histogram) --- + novelty_dist = Counter(i["novelty_score"] for i in scored) + novelty_histogram = { + "labels": [1, 2, 3, 4, 5], + "values": [novelty_dist.get(s, 0) for s in [1, 2, 3, 4, 5]], + } + + # --- Ideas by type with counts and avg novelty --- + type_data = defaultdict(lambda: {"count": 0, "novelty_sum": 0, "novelty_n": 0}) + for idea in all_ideas: + t = idea.get("idea_type") or "other" + type_data[t]["count"] += 1 + if idea.get("novelty_score") is not None: + type_data[t]["novelty_sum"] += idea["novelty_score"] + type_data[t]["novelty_n"] += 1 + + by_type = [] + for t, d in sorted(type_data.items(), key=lambda x: x[1]["count"], reverse=True): + avg = d["novelty_sum"] / d["novelty_n"] if d["novelty_n"] > 0 else 0 + by_type.append({"type": t, "count": d["count"], "avg_novelty": round(avg, 2)}) + + type_names = [t["type"] for t in by_type] + + # --- Top 20 most novel ideas (score 4-5) --- + top_novel = [] + for idea in all_ideas: + if idea.get("novelty_score") and idea["novelty_score"] >= 4: + di = draft_info.get(idea["draft_name"], {}) + top_novel.append({ + "title": idea["title"], + "description": idea["description"], + "type": idea.get("idea_type", "other"), + "novelty_score": idea["novelty_score"], + "draft_name": idea["draft_name"], + "draft_title": di.get("draft_title", ""), + "draft_score": di.get("composite_score"), + }) + top_novel.sort(key=lambda x: (x["novelty_score"], x.get("draft_score") or 0), reverse=True) + top_novel = top_novel[:20] + + # --- Ideas per draft distribution --- + ideas_per_draft = Counter(i["draft_name"] for i in all_ideas) + ipd_dist = Counter(ideas_per_draft.values()) + ideas_per_draft_hist = { + "labels": sorted(ipd_dist.keys()), + "values": [ipd_dist[k] for k in sorted(ipd_dist.keys())], + } + # Also top drafts by idea count + top_idea_drafts = [] + for name, count in ideas_per_draft.most_common(10): + di = draft_info.get(name, {}) + top_idea_drafts.append({ + "name": name, + "draft_title": di.get("draft_title", ""), + "idea_count": count, + "score": di.get("composite_score"), + }) + + # --- Cross-tabulation: idea_type x source --- + type_source = defaultdict(lambda: defaultdict(int)) + for idea in all_ideas: + t = idea.get("idea_type") or "other" + di = draft_info.get(idea["draft_name"], {}) + source = di.get("source", "ietf") or "ietf" + type_source[t][source] += 1 + + sources = sorted(set( + di.get("source", "ietf") or "ietf" for di in draft_info.values() + )) + cross_tab = [] + for t in type_names: + row = {"type": t} + for s in sources: + row[s] = type_source[t].get(s, 0) + cross_tab.append(row) + + # --- Shared ideas across drafts --- + idea_groups: list[dict] = [] + for idea in all_ideas: + title_lower = idea["title"].lower().strip() + matched = False + for group in idea_groups: + ratio = SequenceMatcher(None, title_lower, group["canonical"]).ratio() + if ratio >= 0.75: + group["ideas"].append(idea) + group["drafts"].add(idea["draft_name"]) + matched = True + break + if not matched: + idea_groups.append({ + "canonical": title_lower, + "title": idea["title"], + "ideas": [idea], + "drafts": {idea["draft_name"]}, + }) + + shared_ideas = [] + for g in sorted(idea_groups, key=lambda x: len(x["drafts"]), reverse=True): + if len(g["drafts"]) < 2: + break + shared_ideas.append({ + "title": g["title"], + "appearances": len(g["drafts"]), + "drafts": sorted(g["drafts"])[:8], + "types": list(set(i.get("idea_type", "other") for i in g["ideas"])), + }) + + # --- Scatter: draft avg idea novelty vs draft relevance --- + draft_idea_novelty = defaultdict(list) + for idea in scored: + draft_idea_novelty[idea["draft_name"]].append(idea["novelty_score"]) + + scatter_data = [] + for name, scores in draft_idea_novelty.items(): + di = draft_info.get(name, {}) + if di.get("relevance") is not None and di.get("composite_score") is not None: + scatter_data.append({ + "name": name, + "avg_idea_novelty": round(sum(scores) / len(scores), 2), + "relevance": di["relevance"], + "score": di["composite_score"], + "idea_count": len(scores), + "source": di.get("source", "ietf") or "ietf", + }) + + # --- Sunburst data: type -> novelty band --- + sunburst_labels = [] + sunburst_parents = [] + sunburst_values = [] + # Root + sunburst_labels.append("All Ideas") + sunburst_parents.append("") + sunburst_values.append(total) + + novelty_bands = {"High (4-5)": lambda s: s is not None and s >= 4, + "Medium (3)": lambda s: s is not None and s == 3, + "Low (1-2)": lambda s: s is not None and s <= 2, + "Unscored": lambda s: s is None} + + for t_info in by_type: + t = t_info["type"] + sunburst_labels.append(t) + sunburst_parents.append("All Ideas") + sunburst_values.append(t_info["count"]) + # Sub-bands + type_ideas = [i for i in all_ideas if (i.get("idea_type") or "other") == t] + for band, fn in novelty_bands.items(): + cnt = sum(1 for i in type_ideas if fn(i.get("novelty_score"))) + if cnt > 0: + sunburst_labels.append(f"{t} - {band}") + sunburst_parents.append(t) + sunburst_values.append(cnt) + + return { + "total": total, + "scored": len(scored), + "unscored": unscored, + "avg_novelty": round(avg_novelty, 2), + "embed_count": embed_count, + "embed_pct": round(embed_count / total * 100, 1) if total > 0 else 0, + "type_count": len(by_type), + "novelty_histogram": novelty_histogram, + "by_type": by_type, + "top_novel": top_novel, + "ideas_per_draft_hist": ideas_per_draft_hist, + "top_idea_drafts": top_idea_drafts, + "cross_tab": cross_tab, + "sources": sources, + "shared_ideas": shared_ideas, + "scatter_data": scatter_data, + "sunburst": { + "labels": sunburst_labels, + "parents": sunburst_parents, + "values": sunburst_values, + }, + } + + + + +def get_source_comparison(db: Database) -> dict: + """Cross-source comparison: ratings, categories, counts by standards body.""" + pairs_all = db.drafts_with_ratings(limit=2000) + # Also include false positives for completeness of source counts + pairs_fp = db.drafts_with_ratings(limit=2000, include_false_positives=True) + + # Build per-source data + source_stats: dict[str, dict] = {} + source_categories: dict[str, Counter] = defaultdict(Counter) + source_ratings: dict[str, dict[str, list]] = defaultdict(lambda: { + "novelty": [], "maturity": [], "overlap": [], "momentum": [], "relevance": [], "scores": [], + }) + # Collect author counts per source + all_authors_by_source: dict[str, set] = defaultdict(set) + + for draft, rating in pairs_all: + src = getattr(draft, "source", "ietf") or "ietf" + source_ratings[src]["novelty"].append(rating.novelty) + source_ratings[src]["maturity"].append(rating.maturity) + source_ratings[src]["overlap"].append(rating.overlap) + source_ratings[src]["momentum"].append(rating.momentum) + source_ratings[src]["relevance"].append(rating.relevance) + source_ratings[src]["scores"].append(round(rating.composite_score, 2)) + for cat in rating.categories: + source_categories[src][cat] += 1 + + # Get all drafts (including unrated) for draft counts + all_drafts = db.list_drafts(limit=5000) + source_draft_counts: Counter = Counter() + for d in all_drafts: + src = getattr(d, "source", "ietf") or "ietf" + source_draft_counts[src] += 1 + + # Author counts by source + try: + rows = db.conn.execute( + """SELECT d.source, COUNT(DISTINCT da.person_id) as author_count + FROM drafts d + JOIN draft_authors da ON d.name = da.draft_name + GROUP BY d.source""" + ).fetchall() + for r in rows: + src = r["source"] or "ietf" + all_authors_by_source[src] = r["author_count"] + except Exception: + pass + + # Idea counts by source + source_idea_counts: Counter = Counter() + try: + rows = db.conn.execute( + """SELECT d.source, COUNT(*) as idea_count + FROM ideas i + JOIN drafts d ON i.draft_name = d.name + GROUP BY d.source""" + ).fetchall() + for r in rows: + src = r["source"] or "ietf" + source_idea_counts[src] = r["idea_count"] + except Exception: + pass + + # Build summary table + all_sources = sorted(set(source_draft_counts.keys()) | set(source_ratings.keys())) + summary = [] + for src in all_sources: + rats = source_ratings.get(src, {"scores": []}) + cats = source_categories.get(src, Counter()) + top_cat = cats.most_common(1)[0][0] if cats else "N/A" + avg_score = round(sum(rats["scores"]) / len(rats["scores"]), 2) if rats["scores"] else 0.0 + summary.append({ + "source": src, + "drafts": source_draft_counts.get(src, 0), + "rated": len(rats["scores"]), + "authors": all_authors_by_source.get(src, 0), + "ideas": source_idea_counts.get(src, 0), + "avg_score": avg_score, + "top_category": top_cat, + }) + + # Radar data: average of each dimension per source + radar = {} + for src, rats in source_ratings.items(): + if not rats["scores"]: + continue + n = len(rats["scores"]) + radar[src] = { + "novelty": round(sum(rats["novelty"]) / n, 2), + "maturity": round(sum(rats["maturity"]) / n, 2), + "overlap": round(sum(rats["overlap"]) / n, 2), + "momentum": round(sum(rats["momentum"]) / n, 2), + "relevance": round(sum(rats["relevance"]) / n, 2), + "count": n, + } + + # Category distribution by source (for stacked bar / heatmap) + all_cats = sorted({cat for cats in source_categories.values() for cat in cats}) + heatmap = { + "sources": list(source_categories.keys()), + "categories": all_cats, + "values": [], + } + for src in heatmap["sources"]: + row = [source_categories[src].get(cat, 0) for cat in all_cats] + heatmap["values"].append(row) + + # Unique/shared categories analysis + source_cat_sets = {src: set(cats.keys()) for src, cats in source_categories.items()} + unique_cats = {} + for src, cats in source_cat_sets.items(): + others = set() + for s2, c2 in source_cat_sets.items(): + if s2 != src: + others |= c2 + unique_cats[src] = sorted(cats - others) + + shared_cats = set() + for src, cats in source_cat_sets.items(): + for s2, c2 in source_cat_sets.items(): + if s2 != src: + shared_cats |= (cats & c2) + shared_cats = sorted(shared_cats) + + return { + "summary": summary, + "radar": radar, + "heatmap": heatmap, + "unique_categories": unique_cats, + "shared_categories": shared_cats, + } + + +def get_false_positive_profile(db: Database) -> dict: + """Profile drafts flagged as false positives.""" + # Get false positives + fp_rows = db.conn.execute( + """SELECT d.*, r.novelty, r.maturity, r.overlap, r.momentum, r.relevance, + r.summary, r.categories as r_categories, r.false_positive + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE r.false_positive = 1 + ORDER BY d.name""" + ).fetchall() + + # Get non-FP rated drafts for comparison + nonfp_rows = db.conn.execute( + """SELECT r.novelty, r.maturity, r.overlap, r.momentum, r.relevance, + r.categories as r_categories + FROM ratings r + WHERE COALESCE(r.false_positive, 0) = 0""" + ).fetchall() + + total_rated = db.conn.execute("SELECT COUNT(*) FROM ratings").fetchone()[0] + total_drafts = db.count_drafts(include_false_positives=True) + + # Build FP list + fp_list = [] + fp_categories: Counter = Counter() + fp_sources: Counter = Counter() + fp_dims = {"novelty": [], "maturity": [], "overlap": [], "momentum": [], "relevance": []} + + for row in fp_rows: + cats = json.loads(row["r_categories"]) if row["r_categories"] else [] + src = row["source"] or "ietf" + fp_list.append({ + "name": row["name"], + "title": row["title"], + "source": src, + "categories": cats, + "relevance": row["relevance"], + "novelty": row["novelty"], + "maturity": row["maturity"], + "overlap": row["overlap"], + "momentum": row["momentum"], + "summary": row["summary"] or "", + }) + for cat in cats: + fp_categories[cat] += 1 + fp_sources[src] += 1 + fp_dims["novelty"].append(row["novelty"]) + fp_dims["maturity"].append(row["maturity"]) + fp_dims["overlap"].append(row["overlap"]) + fp_dims["momentum"].append(row["momentum"]) + fp_dims["relevance"].append(row["relevance"]) + + # Non-FP dimensions for comparison + nonfp_dims = {"novelty": [], "maturity": [], "overlap": [], "momentum": [], "relevance": []} + nonfp_categories: Counter = Counter() + for row in nonfp_rows: + nonfp_dims["novelty"].append(row["novelty"]) + nonfp_dims["maturity"].append(row["maturity"]) + nonfp_dims["overlap"].append(row["overlap"]) + nonfp_dims["momentum"].append(row["momentum"]) + nonfp_dims["relevance"].append(row["relevance"]) + cats = json.loads(row["r_categories"]) if row["r_categories"] else [] + for cat in cats: + nonfp_categories[cat] += 1 + + # Top terms from FP abstracts + from collections import Counter as _Counter + stop_words = { + "the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", + "of", "with", "by", "from", "is", "it", "that", "this", "are", "was", + "be", "as", "can", "may", "will", "not", "has", "have", "been", "which", + "their", "its", "also", "such", "these", "would", "should", "could", + "more", "other", "than", "into", "about", "between", "over", "after", + "all", "one", "two", "new", "they", "we", "our", "each", "some", "any", + "there", "what", "when", "how", "where", "who", "does", "do", "did", + "no", "if", "so", "up", "out", "only", "used", "using", "use", "based", + "through", "both", "well", "within", "must", "while", "had", "were", + } + word_counter: Counter = Counter() + for row in fp_rows: + abstract = (row["abstract"] or "").lower() + title = (row["title"] or "").lower() + text = abstract + " " + title + words = re.findall(r'[a-z]{3,}', text) + for w in words: + if w not in stop_words: + word_counter[w] += 1 + top_terms = word_counter.most_common(30) + + return { + "count": len(fp_list), + "total_rated": total_rated, + "total_drafts": total_drafts, + "pct_of_total": round(100 * len(fp_list) / total_drafts, 1) if total_drafts else 0, + "pct_of_rated": round(100 * len(fp_list) / total_rated, 1) if total_rated else 0, + "fp_list": fp_list, + "fp_categories": dict(fp_categories.most_common()), + "fp_sources": dict(fp_sources.most_common()), + "fp_dims": fp_dims, + "nonfp_dims": nonfp_dims, + "top_terms": top_terms, + "nonfp_categories": dict(nonfp_categories.most_common(20)), + } + + +def get_ask_search(db: Database, question: str, top_k: int = 5) -> dict: + """Search-only (free) — returns sources + cached answer if available.""" + config = Config.load() + searcher = HybridSearch(config, db) + return searcher.search_only(question, top_k=top_k) + + +def get_ask_synthesize(db: Database, question: str, top_k: int = 5, cheap: bool = True) -> dict: + """Run Claude synthesis (costs tokens, result is cached permanently).""" + config = Config.load() + searcher = HybridSearch(config, db) + return searcher.ask(question, top_k=top_k, cheap=cheap) + + +def get_citation_influence(db: Database) -> dict: + """Return citation influence analysis data (cached for 5 min).""" + return _cached("citation_influence", lambda: _compute_citation_influence(db)) + + +def _compute_citation_influence(db: Database) -> dict: + """Compute citation influence metrics from the draft_refs table. + + Returns dict with: + - top_cited_rfcs: top 20 most-cited RFCs with citation counts and citing drafts + - top_citing_drafts: top 20 drafts that cite the most references + - citations_by_category: average citations per category + - stats: total citations, unique RFCs, avg refs per draft + - draft_network: draft-to-draft citation edges for visualization + """ + # Get all references + rows = db.conn.execute( + "SELECT draft_name, ref_type, ref_id FROM draft_refs" + ).fetchall() + + # Get draft titles and categories + draft_rows = db.conn.execute("SELECT name, title FROM drafts").fetchall() + draft_titles = {r["name"]: r["title"] for r in draft_rows} + + rating_rows = db.conn.execute("SELECT draft_name, categories FROM ratings").fetchall() + draft_cats: dict[str, str] = {} + for r in rating_rows: + try: + cats = json.loads(r["categories"]) if r["categories"] else [] + draft_cats[r["draft_name"]] = cats[0] if cats else "Other" + except Exception: + draft_cats[r["draft_name"]] = "Other" + + # Well-known RFC names + rfc_names = { + "2119": "Key words (MUST/SHALL/MAY)", "8174": "Key words update", + "8259": "JSON", "7519": "JWT", "6749": "OAuth 2.0", + "7540": "HTTP/2", "9110": "HTTP Semantics", "7525": "TLS Recommendations", + "8446": "TLS 1.3", "3986": "URIs", "7230": "HTTP/1.1 Syntax", + "7231": "HTTP/1.1 Semantics", "8288": "Web Linking", "6125": "TLS Server Identity", + "7515": "JWS", "7516": "JWE", "7517": "JWK", "7518": "JWA", + "9449": "DPoP", "6750": "OAuth Bearer", "8725": "JWT Best Practices", + "9396": "Rich Authorization Requests", "9101": "JAR", + "8414": "OAuth Server Metadata", "7591": "Dynamic Client Registration", + "8705": "mTLS for OAuth", "9068": "JWT Access Tokens", + "6819": "OAuth Threat Model", "9200": "ACE-OAuth", "9052": "COSE", + "8392": "CWT", "7252": "CoAP", + } + + # In-degree: how many times each RFC is cited + rfc_citations: dict[str, list[str]] = defaultdict(list) + draft_out_count: dict[str, int] = Counter() + draft_to_draft_edges = [] + total_citations = 0 + + for r in rows: + draft_name = r["draft_name"] + ref_type = r["ref_type"] + ref_id = r["ref_id"] + total_citations += 1 + draft_out_count[draft_name] += 1 + + if ref_type == "rfc": + rfc_citations[ref_id].append(draft_name) + elif ref_type == "draft": + draft_to_draft_edges.append({ + "source": draft_name, + "target": ref_id, + "source_title": draft_titles.get(draft_name, draft_name), + "target_title": draft_titles.get(ref_id, ref_id), + }) + + # Top 20 most-cited RFCs + rfc_sorted = sorted(rfc_citations.items(), key=lambda x: len(x[1]), reverse=True) + top_cited_rfcs = [] + for ref_id, citing_drafts in rfc_sorted[:20]: + top_cited_rfcs.append({ + "rfc_id": ref_id, + "name": rfc_names.get(ref_id, ""), + "count": len(citing_drafts), + "drafts": citing_drafts[:10], # Limit to first 10 for display + "total_drafts": len(citing_drafts), + }) + + # Top 20 most-citing drafts (out-degree) + draft_sorted = sorted(draft_out_count.items(), key=lambda x: x[1], reverse=True) + top_citing_drafts = [] + for draft_name, count in draft_sorted[:20]: + top_citing_drafts.append({ + "name": draft_name, + "title": draft_titles.get(draft_name, draft_name), + "count": count, + "category": draft_cats.get(draft_name, "Other"), + }) + + # Citation density by category + cat_totals: dict[str, int] = Counter() + cat_counts: dict[str, int] = Counter() + for draft_name, count in draft_out_count.items(): + cat = draft_cats.get(draft_name, "Other") + cat_totals[cat] += count + cat_counts[cat] += 1 + + citations_by_category = [] + for cat in sorted(cat_totals.keys()): + avg = cat_totals[cat] / cat_counts[cat] if cat_counts[cat] > 0 else 0 + citations_by_category.append({ + "category": cat, + "total_citations": cat_totals[cat], + "draft_count": cat_counts[cat], + "avg_citations": round(avg, 1), + }) + citations_by_category.sort(key=lambda x: x["avg_citations"], reverse=True) + + # PageRank-style influence: drafts that cite highly-cited RFCs + # Simple approximation: sum of (1 / citation_count) for each RFC cited + rfc_influence = {rid: len(drafts) for rid, drafts in rfc_citations.items()} + draft_pagerank: dict[str, float] = Counter() + for r in rows: + if r["ref_type"] == "rfc" and r["ref_id"] in rfc_influence: + # Higher score for citing highly-cited RFCs + draft_pagerank[r["draft_name"]] += rfc_influence[r["ref_id"]] + + pagerank_sorted = sorted(draft_pagerank.items(), key=lambda x: x[1], reverse=True) + top_pagerank = [] + for draft_name, score in pagerank_sorted[:20]: + top_pagerank.append({ + "name": draft_name, + "title": draft_titles.get(draft_name, draft_name), + "score": round(score, 1), + "category": draft_cats.get(draft_name, "Other"), + "out_degree": draft_out_count.get(draft_name, 0), + }) + + # Stats + unique_rfcs = len(rfc_citations) + drafts_with_refs = len(draft_out_count) + avg_refs = total_citations / drafts_with_refs if drafts_with_refs > 0 else 0 + + return { + "top_cited_rfcs": top_cited_rfcs, + "top_citing_drafts": top_citing_drafts, + "top_pagerank": top_pagerank, + "citations_by_category": citations_by_category, + "draft_network": draft_to_draft_edges[:200], # Limit for perf + "stats": { + "total_citations": total_citations, + "unique_rfcs": unique_rfcs, + "drafts_with_refs": drafts_with_refs, + "avg_refs_per_draft": round(avg_refs, 1), + }, + } + + +def get_bcp_analysis(db: Database) -> dict: + """Return BCP dependency analysis data (cached for 5 min).""" + return _cached("bcp_analysis", lambda: _compute_bcp_analysis(db)) + + +def _compute_bcp_analysis(db: Database) -> dict: + """Compute BCP dependency analysis. + + Returns dict with: + - bcps: all BCPs with citation counts and citing drafts + - co_citation: which BCPs tend to be co-cited + - by_category: BCP citation patterns by category + - coverage: what % of drafts cite at least one BCP + """ + # Get all BCP references + bcp_rows = db.conn.execute( + "SELECT draft_name, ref_id FROM draft_refs WHERE ref_type = 'bcp'" + ).fetchall() + + # Get draft titles and categories + draft_rows = db.conn.execute("SELECT name, title FROM drafts").fetchall() + draft_titles = {r["name"]: r["title"] for r in draft_rows} + total_drafts = len(draft_titles) + + rating_rows = db.conn.execute("SELECT draft_name, categories FROM ratings").fetchall() + draft_cats: dict[str, str] = {} + for r in rating_rows: + try: + cats = json.loads(r["categories"]) if r["categories"] else [] + draft_cats[r["draft_name"]] = cats[0] if cats else "Other" + except Exception: + draft_cats[r["draft_name"]] = "Other" + + # BCP citation counts + bcp_citations: dict[str, list[str]] = defaultdict(list) + draft_bcps: dict[str, list[str]] = defaultdict(list) + + for r in bcp_rows: + bcp_citations[r["ref_id"]].append(r["draft_name"]) + draft_bcps[r["draft_name"]].append(r["ref_id"]) + + # All BCPs with counts + bcps = [] + for bcp_id, citing_drafts in sorted(bcp_citations.items(), + key=lambda x: len(x[1]), reverse=True): + bcps.append({ + "bcp_id": bcp_id, + "count": len(citing_drafts), + "drafts": citing_drafts[:10], + "total_drafts": len(citing_drafts), + }) + + # Co-citation matrix: which BCPs appear together in the same draft + bcp_ids = sorted(bcp_citations.keys()) + co_citation = [] + for i, bcp_a in enumerate(bcp_ids): + drafts_a = set(bcp_citations[bcp_a]) + for j, bcp_b in enumerate(bcp_ids): + if j <= i: + continue + drafts_b = set(bcp_citations[bcp_b]) + shared = len(drafts_a & drafts_b) + if shared > 0: + co_citation.append({ + "bcp_a": bcp_a, + "bcp_b": bcp_b, + "count": shared, + }) + + # Heatmap data: full matrix for all BCPs (top 20 by citation count) + top_bcp_ids = [b["bcp_id"] for b in bcps[:20]] + heatmap_matrix = [] + for bcp_a in top_bcp_ids: + row = [] + drafts_a = set(bcp_citations.get(bcp_a, [])) + for bcp_b in top_bcp_ids: + drafts_b = set(bcp_citations.get(bcp_b, [])) + shared = len(drafts_a & drafts_b) + row.append(shared) + heatmap_matrix.append(row) + + # BCP citations by category + cat_bcp_count: dict[str, Counter] = defaultdict(Counter) + for draft_name, bcp_list in draft_bcps.items(): + cat = draft_cats.get(draft_name, "Other") + for bcp_id in bcp_list: + cat_bcp_count[cat][bcp_id] += 1 + + by_category = [] + for cat in sorted(cat_bcp_count.keys()): + top_bcps = cat_bcp_count[cat].most_common(5) + by_category.append({ + "category": cat, + "total_bcp_refs": sum(cat_bcp_count[cat].values()), + "unique_bcps": len(cat_bcp_count[cat]), + "top_bcps": [{"bcp_id": bid, "count": c} for bid, c in top_bcps], + }) + by_category.sort(key=lambda x: x["total_bcp_refs"], reverse=True) + + # Coverage + drafts_with_bcp = len(draft_bcps) + coverage_pct = (drafts_with_bcp / total_drafts * 100) if total_drafts > 0 else 0 + + return { + "bcps": bcps, + "co_citation": co_citation, + "heatmap_labels": top_bcp_ids, + "heatmap_matrix": heatmap_matrix, + "by_category": by_category, + "coverage": { + "total_drafts": total_drafts, + "drafts_with_bcp": drafts_with_bcp, + "coverage_pct": round(coverage_pct, 1), + "unique_bcps": len(bcp_citations), + "total_bcp_refs": len(bcp_rows), + }, + } + + +def global_search(db: Database, query: str) -> SearchResults: + """Search across drafts (FTS5), ideas, authors, and gaps. + + Returns {drafts: [...], ideas: [...], authors: [...], gaps: [...]}. + """ + results: dict = {"drafts": [], "ideas": [], "authors": [], "gaps": []} + if not query or not query.strip(): + return results + + q = query.strip() + + # 1. Drafts via FTS5 + try: + fts_query = re.sub(r'[^\w\s]', '', q) + fts_query = re.sub(r'\b(NEAR|OR|AND|NOT)\b', '', fts_query, flags=re.IGNORECASE) + fts_query = re.sub(r'\s+', ' ', fts_query).strip() + if not fts_query: + raise ValueError("empty query after sanitization") + rows = db.conn.execute( + """SELECT d.name, d.title, d.abstract, d.time, d."group" + FROM drafts d + JOIN drafts_fts f ON d.rowid = f.rowid + WHERE drafts_fts MATCH ? + ORDER BY rank + LIMIT 50""", + (fts_query,), + ).fetchall() + for r in rows: + results["drafts"].append({ + "name": r["name"], + "title": r["title"], + "abstract": (r["abstract"] or "")[:200], + "date": r["time"], + "group": r["group"] or "individual", + }) + except Exception: + # FTS5 match can fail on certain query syntax; fall back to LIKE + like = f"%{q}%" + rows = db.conn.execute( + """SELECT name, title, abstract, time, "group" FROM drafts + WHERE title LIKE ? OR name LIKE ? OR abstract LIKE ? + LIMIT 50""", + (like, like, like), + ).fetchall() + for r in rows: + results["drafts"].append({ + "name": r["name"], + "title": r["title"], + "abstract": (r["abstract"] or "")[:200], + "date": r["time"], + "group": r["group"] or "individual", + }) + + # 2. Ideas via LIKE + like = f"%{q}%" + rows = db.conn.execute( + """SELECT id, title, description, idea_type, draft_name FROM ideas + WHERE (title LIKE ? OR description LIKE ?) + AND draft_name NOT IN (SELECT draft_name FROM ratings WHERE false_positive = 1) ORDER BY id LIMIT 50""", (like, like), ).fetchall() @@ -1628,3 +2910,408 @@ def get_ask_synthesize(db: Database, question: str, top_k: int = 5, cheap: bool config = Config.load() searcher = HybridSearch(config, db) return searcher.ask(question, top_k=top_k, cheap=cheap) + + +def get_trends_data(db: Database) -> dict: + """Return temporal evolution data for the /trends page. + + Returns dict with: + - monthly_submissions: [{month, source, count}, ...] + - monthly_ratings: [{month, novelty, maturity, overlap, momentum, relevance}, ...] + - monthly_categories: [{month, category, count}, ...] + - safety_ratio: [{month, safety, capability, ratio}, ...] + - cumulative_ideas: [{month, total}, ...] + - monthly_new_authors: [{month, count}, ...] + - stats: {fastest_growing, newest_active} + - monthly_table: [{month, total, sources: {}, avg_score}, ...] + """ + conn = db.conn + + # 1. Monthly submissions by source + rows = conn.execute(""" + SELECT substr(time, 1, 7) AS month, source, COUNT(*) AS cnt + FROM drafts + WHERE time IS NOT NULL AND time != '' + GROUP BY month, source + ORDER BY month + """).fetchall() + monthly_submissions = [{"month": r["month"], "source": r["source"], "count": r["cnt"]} for r in rows] + + # 2. Monthly average ratings (all 5 dimensions) + rows = conn.execute(""" + SELECT substr(d.time, 1, 7) AS month, + AVG(r.novelty) AS novelty, AVG(r.maturity) AS maturity, + AVG(r.overlap) AS overlap, AVG(r.momentum) AS momentum, + AVG(r.relevance) AS relevance, + COUNT(*) AS cnt + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE d.time IS NOT NULL AND d.time != '' AND r.false_positive = 0 + GROUP BY month + ORDER BY month + """).fetchall() + monthly_ratings = [{ + "month": r["month"], + "novelty": round(r["novelty"], 2), + "maturity": round(r["maturity"], 2), + "overlap": round(r["overlap"], 2), + "momentum": round(r["momentum"], 2), + "relevance": round(r["relevance"], 2), + "count": r["cnt"], + } for r in rows] + + # 3. Monthly category distribution + rows = conn.execute(""" + SELECT substr(d.time, 1, 7) AS month, r.categories + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE d.time IS NOT NULL AND d.time != '' AND r.false_positive = 0 + """).fetchall() + cat_monthly: dict[str, Counter] = defaultdict(Counter) + all_cats: Counter = Counter() + for r in rows: + month = r["month"] + try: + cats = json.loads(r["categories"]) if r["categories"] else [] + except (json.JSONDecodeError, TypeError): + cats = [] + for c in cats: + cat_monthly[month][c] += 1 + all_cats[c] += 1 + + # Top 8 categories + top_cats = [c for c, _ in all_cats.most_common(8)] + months_sorted = sorted(cat_monthly.keys()) + monthly_categories = [] + for month in months_sorted: + for cat in top_cats: + monthly_categories.append({ + "month": month, + "category": cat, + "count": cat_monthly[month].get(cat, 0), + }) + + # 4. Safety ratio over time + safety_ratio = [] + for month in months_sorted: + safety = sum(cat_monthly[month].get(c, 0) for c in SAFETY_CATEGORIES) + capability = sum(cat_monthly[month].get(c, 0) for c in CAPABILITY_CATEGORIES) + ratio = round(safety / capability, 2) if capability > 0 else 0 + safety_ratio.append({ + "month": month, + "safety": safety, + "capability": capability, + "ratio": ratio, + }) + + # 5. Cumulative idea count over time + rows = conn.execute(""" + SELECT substr(d.time, 1, 7) AS month, COUNT(i.id) AS cnt + FROM ideas i + JOIN drafts d ON i.draft_name = d.name + WHERE d.time IS NOT NULL AND d.time != '' + GROUP BY month + ORDER BY month + """).fetchall() + cumulative = 0 + cumulative_ideas = [] + for r in rows: + cumulative += r["cnt"] + cumulative_ideas.append({"month": r["month"], "total": cumulative}) + + # 6. Monthly new author count (first-time contributors) + rows = conn.execute(""" + SELECT da.person_id, MIN(substr(d.time, 1, 7)) AS first_month + FROM draft_authors da + JOIN drafts d ON da.draft_name = d.name + WHERE d.time IS NOT NULL AND d.time != '' + GROUP BY da.person_id + """).fetchall() + new_author_monthly: Counter = Counter() + for r in rows: + if r["first_month"]: + new_author_monthly[r["first_month"]] += 1 + monthly_new_authors = [ + {"month": m, "count": new_author_monthly.get(m, 0)} + for m in months_sorted + ] + + # 7. Stats: fastest growing category, newest active category + fastest_growing = "" + newest_active = "" + if len(months_sorted) >= 4: + mid = len(months_sorted) // 2 + early_months = months_sorted[:mid] + late_months = months_sorted[mid:] + best_growth = -999 + for cat in top_cats: + early = sum(cat_monthly[m].get(cat, 0) for m in early_months) + late = sum(cat_monthly[m].get(cat, 0) for m in late_months) + if early > 0: + growth = (late - early) / early + elif late > 0: + growth = float("inf") + else: + growth = 0 + if growth > best_growth: + best_growth = growth + fastest_growing = cat + + # Newest active: category with latest first appearance + cat_first_month: dict[str, str] = {} + for month in months_sorted: + for cat in all_cats: + if cat not in cat_first_month and cat_monthly[month].get(cat, 0) > 0: + cat_first_month[cat] = month + if cat_first_month: + newest_active = max(cat_first_month, key=lambda c: cat_first_month[c]) + + # 8. Monthly breakdown table + monthly_table = [] + for month in months_sorted: + # Get per-source counts + sources: dict[str, int] = {} + total = 0 + for s in monthly_submissions: + if s["month"] == month: + sources[s["source"]] = s["count"] + total += s["count"] + # Get avg score + avg_row = conn.execute(""" + SELECT AVG((r.novelty + r.maturity + r.overlap + r.momentum + r.relevance) / 5.0) AS avg_score + FROM drafts d JOIN ratings r ON d.name = r.draft_name + WHERE substr(d.time, 1, 7) = ? AND r.false_positive = 0 + """, (month,)).fetchone() + avg_score = round(avg_row["avg_score"], 2) if avg_row and avg_row["avg_score"] else 0 + monthly_table.append({ + "month": month, + "total": total, + "sources": sources, + "avg_score": avg_score, + }) + + return { + "monthly_submissions": monthly_submissions, + "monthly_ratings": monthly_ratings, + "monthly_categories": monthly_categories, + "safety_ratio": safety_ratio, + "cumulative_ideas": cumulative_ideas, + "monthly_new_authors": monthly_new_authors, + "top_categories": top_cats, + "months": months_sorted, + "stats": { + "fastest_growing": fastest_growing, + "newest_active": newest_active, + }, + "monthly_table": monthly_table, + } + + +# --------------------------------------------------------------------------- +# Draft Complexity Matrix +# --------------------------------------------------------------------------- + + +def get_complexity_data(db: Database) -> dict: + """Return draft complexity analysis data for the /complexity page. + + For each rated draft, compute structural complexity metrics and + correlate with rating dimensions. + + Returns dict with: + - drafts: [{name, title, pages, author_count, citation_count, idea_count, + category_count, novelty, maturity, overlap, momentum, relevance, + score, composite_complexity}, ...] + - correlations: {metric: {dimension: r_value}} + - top_complex: top 10 most complex drafts + - top_efficient: top 10 high-rating low-complexity drafts + - stats: {avg_pages, avg_authors, avg_citations, pages_coverage_pct} + - category_complexity: [{category, avg_pages, avg_authors, avg_citations, count}, ...] + - source_complexity: [{source, avg_pages, avg_authors, avg_citations, count}, ...] + """ + conn = db.conn + + # Build per-draft complexity data + rows = conn.execute(""" + SELECT d.name, d.title, d.pages, d.source, + r.novelty, r.maturity, r.overlap, r.momentum, r.relevance, + r.categories, + (r.novelty + r.maturity + r.overlap + r.momentum + r.relevance) / 5.0 AS score + FROM drafts d + JOIN ratings r ON d.name = r.draft_name + WHERE r.false_positive = 0 + """).fetchall() + + # Author counts + author_counts = {} + for row in conn.execute(""" + SELECT draft_name, COUNT(*) AS cnt FROM draft_authors GROUP BY draft_name + """).fetchall(): + author_counts[row["draft_name"]] = row["cnt"] + + # Citation counts (outgoing refs) + citation_counts = {} + for row in conn.execute(""" + SELECT draft_name, COUNT(*) AS cnt FROM draft_refs GROUP BY draft_name + """).fetchall(): + citation_counts[row["draft_name"]] = row["cnt"] + + # Idea counts + idea_counts = {} + for row in conn.execute(""" + SELECT draft_name, COUNT(*) AS cnt FROM ideas GROUP BY draft_name + """).fetchall(): + idea_counts[row["draft_name"]] = row["cnt"] + + drafts_data = [] + total_with_pages = 0 + total_drafts = 0 + for r in rows: + total_drafts += 1 + pages = r["pages"] + if pages is not None: + total_with_pages += 1 + try: + cats = json.loads(r["categories"]) if r["categories"] else [] + except (json.JSONDecodeError, TypeError): + cats = [] + ac = author_counts.get(r["name"], 0) + cc = citation_counts.get(r["name"], 0) + ic = idea_counts.get(r["name"], 0) + cat_count = len(cats) + # Composite complexity: normalize each metric to 0-1 scale and average + # (raw values stored; composite calculated after we know max values) + drafts_data.append({ + "name": r["name"], + "title": r["title"], + "pages": pages, + "source": r["source"] or "ietf", + "author_count": ac, + "citation_count": cc, + "idea_count": ic, + "category_count": cat_count, + "categories": cats, + "novelty": r["novelty"], + "maturity": r["maturity"], + "overlap": r["overlap"], + "momentum": r["momentum"], + "relevance": r["relevance"], + "score": round(r["score"], 2), + }) + + # Compute composite complexity score (normalized 0-1 each, then averaged) + max_pages = max((d["pages"] for d in drafts_data if d["pages"] is not None), default=1) or 1 + max_authors = max((d["author_count"] for d in drafts_data), default=1) or 1 + max_citations = max((d["citation_count"] for d in drafts_data), default=1) or 1 + max_ideas = max((d["idea_count"] for d in drafts_data), default=1) or 1 + + for d in drafts_data: + p = (d["pages"] / max_pages) if d["pages"] is not None else 0.3 # default to median-ish + a = d["author_count"] / max_authors + c = d["citation_count"] / max_citations + i = d["idea_count"] / max_ideas + d["composite_complexity"] = round((p + a + c + i) / 4, 3) + + # Correlation matrix: complexity metrics vs rating dimensions + metrics = ["pages", "author_count", "citation_count", "idea_count", "category_count"] + dimensions = ["novelty", "maturity", "overlap", "momentum", "relevance"] + + def _pearson(xs: list[float], ys: list[float]) -> float: + """Compute Pearson correlation coefficient.""" + n = len(xs) + if n < 3: + return 0.0 + mean_x = sum(xs) / n + mean_y = sum(ys) / n + cov = sum((x - mean_x) * (y - mean_y) for x, y in zip(xs, ys)) + std_x = (sum((x - mean_x) ** 2 for x in xs)) ** 0.5 + std_y = (sum((y - mean_y) ** 2 for y in ys)) ** 0.5 + if std_x == 0 or std_y == 0: + return 0.0 + return round(cov / (std_x * std_y), 3) + + correlations: dict[str, dict[str, float]] = {} + for metric in metrics: + correlations[metric] = {} + for dim in dimensions: + if metric == "pages": + # Filter to drafts with pages data + pairs = [(d[metric], d[dim]) for d in drafts_data if d[metric] is not None] + else: + pairs = [(d[metric], d[dim]) for d in drafts_data] + if len(pairs) >= 3: + xs, ys = zip(*pairs) + correlations[metric][dim] = _pearson(list(xs), list(ys)) + else: + correlations[metric][dim] = 0.0 + + # Top 10 most complex + sorted_by_complexity = sorted(drafts_data, key=lambda d: d["composite_complexity"], reverse=True) + top_complex = sorted_by_complexity[:10] + + # Top 10 efficient: high score but low complexity + # Efficiency = score / (composite_complexity + 0.1) (avoid div by zero) + for d in drafts_data: + d["efficiency"] = round(d["score"] / (d["composite_complexity"] + 0.1), 2) + sorted_by_efficiency = sorted(drafts_data, key=lambda d: d["efficiency"], reverse=True) + top_efficient = sorted_by_efficiency[:10] + + # Stats + pages_vals = [d["pages"] for d in drafts_data if d["pages"] is not None] + avg_pages = round(sum(pages_vals) / len(pages_vals), 1) if pages_vals else 0 + avg_authors = round(sum(d["author_count"] for d in drafts_data) / len(drafts_data), 1) if drafts_data else 0 + avg_citations = round(sum(d["citation_count"] for d in drafts_data) / len(drafts_data), 1) if drafts_data else 0 + pages_coverage = round(total_with_pages / total_drafts * 100, 1) if total_drafts else 0 + + # Category complexity averages + cat_data: dict[str, list[dict]] = defaultdict(list) + for d in drafts_data: + for cat in d.get("categories", []): + cat_data[cat].append(d) + + category_complexity = [] + for cat, ds in sorted(cat_data.items(), key=lambda x: -len(x[1])): + p_vals = [d["pages"] for d in ds if d["pages"] is not None] + category_complexity.append({ + "category": cat, + "avg_pages": round(sum(p_vals) / len(p_vals), 1) if p_vals else 0, + "avg_authors": round(sum(d["author_count"] for d in ds) / len(ds), 1), + "avg_citations": round(sum(d["citation_count"] for d in ds) / len(ds), 1), + "avg_score": round(sum(d["score"] for d in ds) / len(ds), 2), + "count": len(ds), + }) + + # Source complexity + source_data: dict[str, list[dict]] = defaultdict(list) + for d in drafts_data: + source_data[d["source"]].append(d) + + source_complexity = [] + for src, ds in sorted(source_data.items(), key=lambda x: -len(x[1])): + p_vals = [d["pages"] for d in ds if d["pages"] is not None] + source_complexity.append({ + "source": src, + "avg_pages": round(sum(p_vals) / len(p_vals), 1) if p_vals else 0, + "avg_authors": round(sum(d["author_count"] for d in ds) / len(ds), 1), + "avg_citations": round(sum(d["citation_count"] for d in ds) / len(ds), 1), + "avg_score": round(sum(d["score"] for d in ds) / len(ds), 2), + "count": len(ds), + }) + + return { + "drafts": drafts_data, + "correlations": correlations, + "metrics": metrics, + "dimensions": dimensions, + "top_complex": top_complex, + "top_efficient": top_efficient, + "stats": { + "avg_pages": avg_pages, + "avg_authors": avg_authors, + "avg_citations": avg_citations, + "pages_coverage_pct": pages_coverage, + "total_drafts": total_drafts, + }, + "category_complexity": category_complexity, + "source_complexity": source_complexity, + } diff --git a/src/webui/templates/architecture.html b/src/webui/templates/architecture.html new file mode 100644 index 0000000..83058ed --- /dev/null +++ b/src/webui/templates/architecture.html @@ -0,0 +1,465 @@ +{% extends "base.html" %} +{% set active_page = "architecture" %} + +{% block title %}Architecture — IETF Draft Analyzer{% endblock %} + +{% block extra_head %} + +{% endblock %} + +{% block content %} + +
+

System-of-Systems Architecture

+

+ Holistic view of the AI agent standards landscape — {{ arch.stats.total_components }} components across + {{ arch.layers|length }} architectural layers, with {{ arch.stats.total_gaps }} identified gaps. + Built from {{ arch.stats.total_dependencies }} cross-component relationships. +

+
+ + +
+ {% set severity_colors = {"critical": "red", "high": "amber", "medium": "blue", "low": "slate"} %} +
+
{{ arch.stats.total_components }}
+
Components
+
+
+
{{ arch.layers|length }}
+
Layers
+
+
+
{{ arch.stats.total_dependencies }}
+
Dependencies
+
+
+
{{ arch.stats.total_gaps }}
+
Gaps
+
+
+ + +
+
+

Coverage by Standards Body

+
+ {% for src in ['ietf', 'w3c', 'etsi', 'itu', 'iso', 'nist'] %} +
+ + {{ src|upper }} +
+ {% endfor %} +
+ + Gaps +
+
+
+
+
+ + +
+
+ {% for layer in arch.layers | sort(attribute='order', reverse=True) %} +
+
+
+

{{ layer.label }}

+
+ {{ layer.component_count }} components + {{ layer.idea_count }} ideas + {{ layer.total_drafts }} drafts + {% if layer.gap_count > 0 %} + {{ layer.gap_count }} gap{{ 's' if layer.gap_count > 1 }} + {% endif %} +
+
+ +
+ {% for src, count in layer.coverage.items() %} +
+ + {{ count }} +
+ {% endfor %} +
+
+ + +
+ {% for comp in arch.components if comp.layer == layer.id %} +
+
+ {{ comp.name }} + {{ comp.size }}i / {{ comp.draft_count }}d +
+ +
+
+
+ +
+ {% for src, cnt in comp.sources.items() %} + + {% endfor %} + {% if comp.type_breakdown %} + + {{ comp.type_breakdown.keys() | list | first }} + + {% endif %} +
+
+ {% endfor %} + + + {% for gap in arch.gaps if gap.layer == layer.id %} +
+
+ + + + GAP + {{ gap.severity }} +
+
{{ gap.topic }}
+
{{ gap.description[:120] }}
+
+ {% endfor %} +
+
+ {% endfor %} +
+ + + +
+ +{% endblock %} + +{% block extra_scripts %} + + +{% endblock %} diff --git a/src/webui/templates/authors.html b/src/webui/templates/authors.html index 31cc1de..9af62b8 100644 --- a/src/webui/templates/authors.html +++ b/src/webui/templates/authors.html @@ -119,7 +119,7 @@
- Cluster #{{ c.id + 1 }} + {{ c.name }}
{{ c.size }} authors {{ c.draft_count }} drafts diff --git a/src/webui/templates/base.html b/src/webui/templates/base.html index c15cc77..8448252 100644 --- a/src/webui/templates/base.html +++ b/src/webui/templates/base.html @@ -117,6 +117,14 @@ Idea Clusters + + + Idea Analysis + + + + Architecture + {% if is_admin %} @@ -127,6 +135,14 @@ Timeline + + + Trends + + + + Complexity + Landscape @@ -139,6 +155,14 @@ Citations + + + Sources + + + + False Positives + Authors diff --git a/src/webui/templates/citations.html b/src/webui/templates/citations.html index 332151b..d42dea9 100644 --- a/src/webui/templates/citations.html +++ b/src/webui/templates/citations.html @@ -1,10 +1,11 @@ {% extends "base.html" %} {% set active_page = "citations" %} -{% block title %}Citation Graph — IETF Draft Analyzer{% endblock %} +{% block title %}Citations & Influence — IETF Draft Analyzer{% endblock %} {% block extra_head %} + {% endblock %} {% block content %}
-

Citation Graph

-

Cross-reference network: {{ graph.stats.draft_count }} drafts referencing {{ graph.stats.rfc_count }} RFCs. References are extracted from each draft's text (RFC mentions, draft citations, BCP references). Node size reflects influence — how many other documents cite it. Highly-cited RFCs represent foundational standards that AI/agent drafts build upon.

+

Citations & Influence

+

Cross-reference network, citation influence metrics, and BCP dependency analysis across {{ influence.stats.drafts_with_refs }} drafts and {{ influence.stats.total_citations }} total citations.

-
+
-
Drafts
-
{{ graph.stats.draft_count }}
+
Total Citations
+
{{ influence.stats.total_citations }}
-
Referenced RFCs
-
{{ graph.stats.rfc_count }}
+
Unique RFCs Cited
+
{{ influence.stats.unique_rfcs }}
-
Total Nodes
-
{{ graph.stats.node_count }}
+
Drafts with Refs
+
{{ influence.stats.drafts_with_refs }}
-
Citation Links
-
{{ graph.stats.edge_count }}
+
Avg Refs/Draft
+
{{ influence.stats.avg_refs_per_draft }}
+
+
+
+
BCP Coverage
+
{{ bcp.coverage.coverage_pct }}%
- -
-
-
-

Cross-Reference Network

-

- Drafts - RFCs - — Node size = influence (in-degree). Drag to rearrange. Scroll to zoom. -

+ +
+ +
+ + +
+ +
+
+
+

Cross-Reference Network

+

+ Drafts + RFCs + — Node size = influence. Drag to rearrange. Scroll to zoom. +

+
+
+ + + + + 2 +
-
- - - - - 2 +
+ +
-
- -
+ + +
+
+

Most Referenced RFCs

+

RFCs cited by the most drafts in the corpus

+
+
+ + + + + + + + + + +
#RFCCited By
+
- -
-
-

Most Referenced RFCs

-

RFCs cited by the most drafts in the corpus

+ +
+
+ +
+
+

Top 20 Most-Cited RFCs

+

Foundational standards the AI/agent ecosystem builds upon

+
+
+ + + + + + + + + + + {% for rfc in influence.top_cited_rfcs %} + + + + + + + {% endfor %} + +
#RFCNameCited By
{{ loop.index }} + RFC {{ rfc.rfc_id }} + {{ rfc.name }} + + {{ rfc.count }} drafts + +
+
+
+ + +
+
+

Top 20 Most-Citing Drafts

+

Drafts with the highest outgoing reference count

+
+
+ + + + + + + + + + + {% for d in influence.top_citing_drafts %} + + + + + + + {% endfor %} + +
#DraftCategoryRefs
{{ loop.index }} + + {{ d.title[:60] }}{% if d.title|length > 60 %}...{% endif %} + + + {{ d.category }} + + {{ d.count }} +
+
+
-
- - - - - - - - - - -
#RFCCited By
+ + +
+
+

Influence Score (PageRank-style)

+

Drafts ranked by weighted sum of how often their cited RFCs are themselves cited — higher score means citing more foundational standards

+
+
+ + + + + + + + + + + + {% for d in influence.top_pagerank %} + + + + + + + + {% endfor %} + +
#DraftCategoryOut-DegreeInfluence Score
{{ loop.index }} + + {{ d.title[:60] }}{% if d.title|length > 60 %}...{% endif %} + + + {{ d.category }} + {{ d.out_degree }} + {{ d.score }} +
+
+
+ + +
+

Average Citations per Category

+

Which categories reference the most external standards

+
+
+ + +
+

Draft-to-Draft Citation Network

+

{{ influence.draft_network|length }} cross-citations between drafts in the corpus

+
+ + +
+ +
+
+
+
Unique BCPs
+
{{ bcp.coverage.unique_bcps }}
+
+
+
+
Total BCP Refs
+
{{ bcp.coverage.total_bcp_refs }}
+
+
+
+
Drafts with BCPs
+
{{ bcp.coverage.drafts_with_bcp }}
+
+
+
+
BCP Coverage
+
{{ bcp.coverage.coverage_pct }}%
+
+
+ +
+ +
+
+

All BCPs by Citation Count

+

{{ bcp.coverage.unique_bcps }} unique BCPs cited across the corpus

+
+
+ + + + + + + + + + + {% for b in bcp.bcps %} + + + + + + + {% endfor %} + +
#BCPCited ByExample Drafts
{{ loop.index }}BCP {{ b.bcp_id }} + + {{ b.count }} + + + {{ b.drafts[:3]|join(', ') }}{% if b.total_drafts > 3 %} +{{ b.total_drafts - 3 }} more{% endif %} +
+
+
+ + +
+
+

BCP Usage by Category

+

Which categories rely most heavily on BCPs

+
+
+ + + + + + + + + + + {% for cat in bcp.by_category %} + + + + + + + {% endfor %} + +
CategoryBCP RefsUnique BCPsTop BCPs
{{ cat.category }}{{ cat.total_bcp_refs }}{{ cat.unique_bcps }} + {% for tb in cat.top_bcps[:3] %} + BCP{{ tb.bcp_id }}({{ tb.count }}) + {% endfor %} +
+
+
+
+ + +
+

BCP Co-Citation Heatmap

+

How often pairs of BCPs are cited together in the same draft. Darker = more co-citations.

+
+
+
+ {% endblock %} {% block extra_scripts %} {% endblock %} diff --git a/src/webui/templates/complexity.html b/src/webui/templates/complexity.html new file mode 100644 index 0000000..8cdde51 --- /dev/null +++ b/src/webui/templates/complexity.html @@ -0,0 +1,332 @@ +{% extends "base.html" %} +{% set active_page = "complexity" %} + +{% block title %}Complexity — IETF Draft Analyzer{% endblock %} + +{% block extra_head %}{% endblock %} + +{% block content %} +
+

Draft Complexity Matrix

+

Correlating structural complexity (pages, authors, citations, ideas) with quality ratings. Does more complexity mean better drafts?

+
+ + +
+
+
Avg Pages
+
{{ data.stats.avg_pages }}
+
{{ data.stats.pages_coverage_pct }}% have page data
+
+
+
Avg Authors
+
{{ data.stats.avg_authors }}
+
+
+
Avg Citations
+
{{ data.stats.avg_citations }}
+
+
+
Drafts Analyzed
+
{{ data.stats.total_drafts }}
+
+
+
Metrics
+
{{ data.metrics | length }} x {{ data.dimensions | length }}
+
complexity x rating
+
+
+ + +
+

Complexity vs Rating Scatter Plots

+

How structural complexity metrics relate to rating dimensions. Each dot is a rated draft. Click to navigate.

+
+
+ + +
+
+

Correlation Matrix

+

Pearson correlation between complexity metrics (rows) and rating dimensions (columns). Green = positive, red = negative. Values range from -1 to +1.

+
+
+ + + + + {% for dim in data.dimensions %} + + {% endfor %} + + + + +
Metric{{ dim | capitalize }}
+
+
+ +
+ +
+
+

Top 10 Most Complex Drafts

+

Ranked by composite complexity (pages + authors + citations + ideas, normalized).

+
+
+ + + + + + + + + + + + + +
#DraftPagesAuthorsCitesScore
+
+
+ + +
+
+

Top 10 Most Efficient Drafts

+

High ratings relative to low structural complexity. Efficiency = score / complexity.

+
+
+ + + + + + + + + + + + + +
#DraftPagesAuthorsScoreEfficiency
+
+
+
+ + +
+

Complexity by Category

+

Average complexity metrics per category. Wider bars = more complex category.

+
+
+ + +
+
+

Complexity by Source

+
+
+ + + + + + + + + + + + + {% for s in data.source_complexity %} + + + + + + + + + {% endfor %} + +
SourceCountAvg PagesAvg AuthorsAvg CitationsAvg Score
{{ s.source | upper }}{{ s.count }}{{ s.avg_pages }}{{ s.avg_authors }}{{ s.avg_citations }} + {% if s.avg_score >= 3.0 %} + {{ s.avg_score }} + {% elif s.avg_score >= 2.0 %} + {{ s.avg_score }} + {% else %} + {{ s.avg_score }} + {% endif %} +
+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/false_positives.html b/src/webui/templates/false_positives.html new file mode 100644 index 0000000..5e6bcb8 --- /dev/null +++ b/src/webui/templates/false_positives.html @@ -0,0 +1,215 @@ +{% extends "base.html" %} +{% set active_page = "false_positives" %} + +{% block title %}False Positive Profile — IETF Draft Analyzer{% endblock %} + +{% block extra_head %}{% endblock %} + +{% block content %} +
+

False Positive Profile

+

Analysis of {{ data.count }} drafts flagged as false positives — documents that matched AI/agent search keywords but were determined not to be genuinely about AI agent infrastructure.

+
+ + +
+
+
+
False Positives
+
{{ data.count }}
+
+
+
+
% of All Drafts
+
{{ data.pct_of_total }}%
+
+
+
+
% of Rated
+
{{ data.pct_of_rated }}%
+
+
+
+
Total Rated
+
{{ data.total_rated }}
+
+
+
+
Total Drafts
+
{{ data.total_drafts }}
+
+
+ +
+ +
+

Rating Distributions: FP vs Non-FP

+

Box plots comparing each rating dimension between false positives (red) and genuine AI/agent drafts (blue). Shows what rating patterns distinguish false positives.

+
+
+ + +
+

False Positives by Source

+

Which standards bodies produce the most false positives in our search results.

+
+
+
+ +
+ +
+

Categories Assigned to False Positives

+

Categories that the classifier assigned to false positive drafts before they were flagged. Shows which categories are most prone to false matches.

+
+
+ + +
+

Top Terms in FP Abstracts

+

Most frequent words in false positive titles and abstracts (stop words excluded). These terms trigger AI/agent keyword matches but appear in unrelated contexts.

+
+
+
+ + +
+
+

All False Positives

+

Complete list of flagged drafts. Click a name to view details.

+
+
+ + + + + + + + + + + + + {% for fp in data.fp_list %} + + + + + + + + + {% endfor %} + +
#DraftTitleSourceRelevanceCategories
{{ loop.index }} + {{ fp.name | replace('draft-', '') | truncate(40) }} + {{ fp.title }} + {{ fp.source }} + + {{ fp.relevance }} + + {% for cat in fp.categories[:2] %} + {{ cat }} + {% endfor %} +
+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/idea_analysis.html b/src/webui/templates/idea_analysis.html new file mode 100644 index 0000000..044b56f --- /dev/null +++ b/src/webui/templates/idea_analysis.html @@ -0,0 +1,330 @@ +{% extends "base.html" %} +{% set active_page = "idea_analysis" %} + +{% block title %}Idea Novelty Analysis — IETF Draft Analyzer{% endblock %} + +{% block extra_head %}{% endblock %} + +{% block content %} +
+

Idea Novelty Deep Dive

+

Comprehensive analysis of {{ data.total }} technical ideas extracted from IETF AI/agent drafts. Explores novelty distribution, type breakdowns, cross-draft patterns, and correlations with draft ratings.

+
+ + +
+
+
{{ data.total }}
+
Total Ideas
+
+
+
{{ data.type_count }}
+
Idea Types
+
+
+
{{ data.avg_novelty }}
+
Avg Novelty Score
+
+
+
{{ data.scored }}
+
Scored Ideas
+
+
+
{{ data.embed_pct }}%
+
Embeddings ({{ data.embed_count }}/{{ data.total }})
+
+
+
{{ data.shared_ideas | length }}
+
Shared Ideas (2+ drafts)
+
+
+ + +
+
+

Novelty Score Distribution

+

How many ideas at each novelty level (1=incremental, 5=groundbreaking). {{ data.unscored }} ideas have no novelty score yet.

+
+
+
+

Ideas by Type (avg novelty color)

+

Count of ideas per type. Bar color intensity reflects average novelty score — brighter = more novel.

+
+
+
+ + +
+
+

Draft Avg Idea Novelty vs Relevance

+

Each dot is a draft. X-axis = average novelty of its ideas, Y-axis = relevance score. Bubble size = number of ideas. Click to view draft.

+
+
+
+

Idea Type Breakdown (Sunburst)

+

Hierarchical view: outer ring shows novelty bands (High/Medium/Low) within each type.

+
+
+
+ + +
+

Ideas per Draft Distribution

+

How many ideas does each draft contribute? Most drafts have 2-4 ideas; some prolific drafts generate 8+.

+
+
+ + +
+
+

Top 20 Most Novel Ideas

+

Ideas with novelty score of 4 or 5, sorted by novelty then draft composite score.

+
+
+ + + + + + + + + + + + + {% for idea in data.top_novel %} + + + + + + + + + {% endfor %} + +
#IdeaNoveltyTypeDraftDescription
{{ loop.index }}{{ idea.title }} + {{ idea.novelty_score }} + + {{ idea.type }} + + {{ idea.draft_name | replace('draft-', '') | truncate(35) }} + {{ idea.description | truncate(120) }}
+
+
+ + +{% if data.shared_ideas %} +
+
+

Ideas Shared Across Multiple Drafts

+

{{ data.shared_ideas | length }} ideas appear in 2 or more drafts, indicating convergent thinking or common building blocks.

+
+
+ {% for idea in data.shared_ideas[:30] %} +
+
+ {{ idea.title }} + {{ idea.appearances }}x + {% for t in idea.types %} + {{ t }} + {% endfor %} +
+
+ {% for d in idea.drafts %} + {{ d | replace('draft-', '') | truncate(30) }} + {% if not loop.last %}|{% endif %} + {% endfor %} +
+
+ {% endfor %} +
+
+{% endif %} + + +
+
+

Most Prolific Drafts (by idea count)

+
+
+ + + + + + + + + + + {% for d in data.top_idea_drafts %} + + + + + + + {% endfor %} + +
#DraftIdeasScore
{{ loop.index }} + {{ d.name | replace('draft-', '') | truncate(45) }} + {% if d.draft_title %} +
{{ d.draft_title | truncate(60) }}
+ {% endif %} +
{{ d.idea_count }} + {% if d.score %} + {{ d.score | round(2) }} + {% else %} + -- + {% endif %} +
+
+
+ + +
+

Embedding Coverage

+
+
+
+
+
+
+ {{ data.embed_count }} / {{ data.total }} ({{ data.embed_pct }}%) +
+

To complete missing embeddings, run: ietf embed-ideas. This requires Ollama running locally. Embeddings enable idea similarity search and clustering.

+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/sources.html b/src/webui/templates/sources.html new file mode 100644 index 0000000..58c4e06 --- /dev/null +++ b/src/webui/templates/sources.html @@ -0,0 +1,198 @@ +{% extends "base.html" %} +{% set active_page = "sources" %} + +{% block title %}Cross-Source Comparison — IETF Draft Analyzer{% endblock %} + +{% block extra_head %}{% endblock %} + +{% block content %} +
+

Cross-Source Comparison

+

Comparing drafts across {{ data.summary | length }} standards bodies on rating dimensions, category focus, and output volume.

+
+ + +
+
+

Standards Body Summary

+

Overview of each source's contribution to the AI/agent standards landscape.

+
+
+ + + + + + + + + + + + + + {% for row in data.summary | sort(attribute='drafts', reverse=True) %} + + + + + + + + + + {% endfor %} + +
SourceDraftsRatedAuthorsIdeasAvg ScoreTop Category
+ {{ row.source }} + {{ row.drafts }}{{ row.rated }}{{ row.authors }}{{ row.ideas }} + {% if row.avg_score >= 3.5 %} + {{ row.avg_score }} + {% elif row.avg_score >= 2.5 %} + {{ row.avg_score }} + {% else %} + {{ row.avg_score }} + {% endif %} + + {{ row.top_category }} +
+
+
+ +
+ +
+

Rating Dimensions by Source

+

Average rating across five dimensions for each standards body. Larger shapes indicate higher average ratings.

+
+
+ + +
+

Category Distribution by Source

+

What topics each standards body focuses on. Stacked bars show the relative emphasis per source.

+
+
+
+ + +
+

Sources x Categories Heatmap

+

Draft counts per source-category pair. Darker cells = more drafts. Shows where each body concentrates its work.

+
+
+ + +
+ +
+

Unique Categories by Source

+

Categories only covered by a single standards body.

+ {% for src, cats in data.unique_categories.items() %} + {% if cats %} +
+ {{ src }} +
+ {% for cat in cats %} + {{ cat }} + {% endfor %} +
+
+ {% endif %} + {% endfor %} +
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %} diff --git a/src/webui/templates/trends_analysis.html b/src/webui/templates/trends_analysis.html new file mode 100644 index 0000000..891721a --- /dev/null +++ b/src/webui/templates/trends_analysis.html @@ -0,0 +1,284 @@ +{% extends "base.html" %} +{% set active_page = "trends" %} + +{% block title %}Trends — IETF Draft Analyzer{% endblock %} + +{% block extra_head %}{% endblock %} + +{% block content %} +
+

Temporal Evolution

+

How the AI standards landscape is changing over time. Submission volume, rating trends, category shifts, and the safety-vs-capability balance.

+
+ + +
+
+
Months Tracked
+
{{ data.months | length }}
+
+
+
Total Submissions
+
{{ data.monthly_table | sum(attribute='total') }}
+
+
+
Fastest Growing
+
{{ data.stats.fastest_growing or 'N/A' }}
+
+
+
Newest Category
+
{{ data.stats.newest_active or 'N/A' }}
+
+
+ + +
+

Monthly Draft Submissions by Source

+

Stacked area chart showing submission volume over time, broken down by source (IETF, W3C, etc.).

+
+
+ +
+ +
+

Monthly Average Ratings

+

Are drafts getting more mature? More novel? Track the five rating dimensions over time.

+
+
+ +
+

Safety vs Capability Ratio

+

Ratio of safety-related drafts (Security, Privacy, Trust, Safety, Governance) to capability drafts (Agents, Infrastructure, MCP, etc.). Higher = more safety focus.

+
+
+
+ + +
+

Category Distribution Over Time

+

Stacked area showing which topics are growing or shrinking. Top 8 categories shown.

+
+
+ +
+ +
+

Cumulative Idea Count

+

Total number of unique technical ideas extracted from drafts over time.

+
+
+ +
+

Monthly New Authors

+

First-time contributors entering the AI standards space each month.

+
+
+
+ + +
+
+

Monthly Breakdown

+
+
+ + + + + + + + + + + {% for row in data.monthly_table %} + + + + + + + {% endfor %} + +
MonthTotalAvg ScoreTrend
{{ row.month }}{{ row.total }} + {% if row.avg_score >= 3.0 %} + {{ row.avg_score }} + {% elif row.avg_score >= 2.0 %} + {{ row.avg_score }} + {% else %} + {{ row.avg_score }} + {% endif %} + + {% if loop.index > 1 %} + {% set prev = data.monthly_table[loop.index0 - 1].total %} + {% if row.total > prev %} + ↑ +{{ row.total - prev }} + {% elif row.total < prev %} + ↓ {{ row.total - prev }} + {% else %} + → 0 + {% endif %} + {% endif %} +
+
+
+{% endblock %} + +{% block extra_scripts %} + +{% endblock %}