Files
ietf-draft-analyzer/data/reports/holistic-agent-ecosystem-draft-outlines.md
Christian Nennemann d6beb9c0a0 v0.3.0: Gap-to-Draft pipeline, Living Standards Observatory, blog series
Gap-to-Draft Pipeline (ietf pipeline):
- Context builder assembles ideas, RFC foundations, similar drafts, ecosystem vision
- Generator produces outlines + sections using rich context with Claude
- Quality gates: novelty (embedding similarity), references, format, self-rating
- Family coordinator generates 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE)
- I-D formatter with proper headers, references, 72-char wrapping

Living Standards Observatory (ietf observatory):
- Source abstraction with IETF + W3C fetchers
- 7-step update pipeline: snapshot, fetch, analyze, embed, ideas, gaps, record
- Static GitHub Pages dashboard (explorer, gap tracker, timeline)
- Weekly CI/CD automation via GitHub Actions

Also includes:
- 361 drafts (expanded from 260 with 6 new keywords), 403 authors, 1,262 ideas, 12 gaps
- Blog series (8 posts planned), reports, arXiv paper figures
- Agent team infrastructure (CLAUDE.md, scripts, dev journal)
- 5 new DB tables, schema migration, ~15 new query methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 00:48:57 +01:00

269 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Holistic Agent Ecosystem: Analysis and Draft Outlines
*Derived from IETF draft analyzer gaps (12), overlap matrix (260 drafts), and 1,262 extracted ideas. Goal: a unified agent ecosystem with DAG orchestration and HITL built in, agnostic and extensible, applicable to both fast/relaxed (e.g. K8s) and regulated/proof-heavy environments.*
---
## 1. Vision Summary
| Pillar | Meaning | Gap / Overlap Evidence |
|--------|---------|------------------------|
| **DAG** | Task/workflow as directed acyclic graph: dependencies, execution order, checkpoints, rollback along the graph. | **Gap 3** (Error Recovery/Rollback), **Gap 1** (Resource Management — scheduling/quotas). Ideas: "Task-Oriented Multi-Agent Recovery", "Working Memory", "Execution Context Token (ECT)". No single draft defines *agent task DAG* as a first-class construct. |
| **HITL** | Human-in-the-loop as a first-class primitive: approval gates, escalation, emergency override, explainability. | **Gap 7** (Human Override); only 22 human-agent drafts vs 60 autonomous netops. Ideas: CHEQ (confirmation), "Human Oversight Requirements", "Level 4 Autonomous Network Architecture". |
| **Agnostic + extensible** | Protocol- and transport-agnostic; works over any A2A protocol; extensible via profiles. | **Gap 4** (Cross-Protocol Translation) — 92 A2A drafts, no universal translation/negotiation. Overlap matrix: high within-category similarity (0.75+) but no interoperability layer. |
| **Dual regime** | Same model works in "fast/relaxed" (K8s, dev) and "regulated" (proofs, attestation, audit). | **Gaps 2, 8, 9, 12**: Behavior verification, cross-domain security, dynamic trust, data provenance. Ideas: STAMP, DAAP, verifiable conversations, ECT — all additive assurance. |
---
## 2. How This Fits With WIMSE and ECT (Differentiation)
**SPIFFE** (CNCF) defines the *identifier* for a workload (`spiffe://trust-domain/path`) and its ecosystem (SVIDs, etc.). So **"who" at the identifier level is already SPIFFE** (or similar URI schemes).
**WIMSE** (Workload Identity in a *Multi System Environment* — [draft-ietf-wimse-arch](https://datatracker.ietf.org/doc/draft-ietf-wimse-arch/)) is the **IETF architecture** for how workload identity and **security context** are conveyed and used across systems. It is **not only "who"**. It covers:
- **Workload identity**: identifier (WIMSE uses a URI; SPIFFE ID is one conforming example), credentials (WIT, X.509), trust domains.
- **Security context**: "information needed for a workload to perform its function" — authorization, accounting, auditing, user info, what processing has already happened, propagation along the call chain.
- **Identity proxy**: inspect, replace, or augment identity and context (e.g. at gateways).
- **Use cases**: bootstrapping, service auth, **authorization**, **audit trails**, **security context establishment and propagation**, delegation, cross-boundary, **AI/ML intermediaries**.
So: **SPIFFE = identifier (who). WIMSE = architecture for conveying identity + security context in protocols (who + context + propagation + authz + audit).**
**ECT** (Execution Context Tokens), in [draft-nennemann-wimse-ect](https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/), is a **JWT-based extension** that records *what* each task did: each ECT is a signed record of one task, linked to predecessors via a DAG (`par` / `jti`). ECT reuses the WIMSE signing model (same key as WIT) and adds: token format, HTTP transport (Execution-Context header), DAG validation, audit-ledger interface. So **ECT = execution evidence (what happened)** built on WIMSE identity/signing.
### Fit in (recommended)
Our ecosystem drafts **do not compete with SPIFFE, WIMSE, or ECT**. They **build on** them:
| Layer | SPIFFE / WIMSE / ECT | Our drafts (AEM, AERR, HEOP, DATS, CPAT, etc.) |
|-------|----------------------|-----------------------------------------------|
| **Identifier** | SPIFFE (or URI): "who" | We use existing identifiers (e.g. `iss` / SPIFFE ID in ECT). |
| **Identity + context** | WIMSE: credentials, security context, propagation, authz, audit | We assume WIMSE (or equivalent) for auth and context; we do not redefine it. |
| **Evidence** | ECT: token format, DAG linkage, signing, audit | We **use ECT as the carrier**: checkpoints, errors, overrides, trust assertions, translation hops are **new ECT node types** (new `exec_act` values and optional claims). |
| **Semantics** | ECT: “a task happened, here are parents” | We define **orchestration and operations**: dependencies, checkpoints, rollback protocol, HITL points, resource hints, assurance profiles, protocol binding. |
Concretely:
- **AERR** (Error Recovery/Rollback): Checkpoints, errors, and rollback results are **ECTs** with specific `exec_act` and extension claims. Rollback walks the **ECT DAG**; no second DAG format.
- **HEOP** (Human Emergency Override): Override and acknowledgment are **ECTs** that link into the same ECT DAG for audit.
- **DATS** (Dynamic Trust): Trust events are derived from **ECT outcomes**; trust assertions are **ECTs**.
- **CPAT** (Cross-Protocol Translation): Each translation hop produces an **ECT**, so the cross-protocol path is one continuous ECT DAG.
- **Agent DAG HITL** ([draft-nennemann-agent-dag-hitl-safety](https://datatracker.ietf.org/doc/draft-nennemann-agent-dag-hitl-safety/)): Policy for when HITL is required; decisions and overrides still record as ECTs.
So the **execution model** (DAG of tasks, checkpoints, rollback, HITL) is **implemented using ECT** as the token and DAG format. We add **semantics and protocols** on top, not a new token or DAG structure.
### Differentiate (what we add)
| Concern | WIMSE/ECT | Our ecosystem |
|---------|-----------|----------------|
| **DAG** | ECT defines *how* nodes link (`par`, `jti`, validation). | We define *what* a node means (task, checkpoint, error, override, trust, translation), *when* to create them, and *how* to act on them (rollback, circuit breaker, HITL flow). |
| **Orchestration** | Out of scope for ECT. | Execution order, resource hints, scheduling, lifecycle; can be described in a **declarative workflow** (e.g. JSON) that is *realized* as ECTs at runtime. |
| **Recovery** | Not in ECT. | AERR: checkpoint placement, error propagation, rollback protocol, circuit breaker. |
| **HITL** | Not in ECT. | HEOP + Agent DAG HITL: approval gates, override, escalation; all recorded as ECTs. |
| **Trust / assurance** | ECT provides signed, linked evidence. | DATS, APAE: how to *derive* trust from ECT outcomes; assurance levels and profiles. |
| **Interop** | Single token format (ECT over HTTP). | CPAT, AEPB: translation between *protocols*; each hop still emits ECTs so the cross-protocol run is one ECT DAG. |
In other words: **SPIFFE = who (identifier). WIMSE = identity + security context + propagation + authz + audit. ECT = execution evidence (DAG of signed records). Our work = orchestration, recovery, HITL, trust, and interop that consume and produce that evidence.**
### Implications for the draft family
- **Draft A (AEM)**: Should state that the **reference implementation of the execution model** is ECT: “The ecosystem uses Execution Context Tokens (ECT) [I-D.nennemann-wimse-ect] as the standard representation of task execution and DAG linkage; extensions (e.g. AERR, HEOP) define additional ECT node types and procedures.”
- **Draft B (ATD)**: Either (1) **ATD = abstract model** (nodes, edges, checkpoints, rollback set) with “ECT is one binding” and optional JSON/CBOR for *declarative* workflow definition, or (2) **ATD = semantics of ECT usage** (when to emit which ECT types, execution semantics, resource hints) without a second wire format. Prefer (2) to avoid overlap with ECT and keep one DAG format.
- **Drafts C, D, E (HITL, AEPB, APAE)**: Same pattern: define procedures and semantics; **record all significant events as ECTs** so the full run is one auditable ECT DAG. Reference SPIFFE/WIMSE for identity and context, and ECT for format and validation.
### One-sentence positioning
**“SPIFFE gives the identifier; WIMSE gives the architecture for identity and security context; ECT gives the DAG evidence format. Our drafts specify how to use that format for orchestration, recovery, human oversight, trust, and cross-protocol interop, so the same stack works from relaxed to fully regulated.”**
---
## 3. How the Data Supports This
### From gap analysis
- **Critical**: Resource management (scheduling/quotas for DAG nodes), behavior verification (runtime proofs in regulated mode), error recovery/rollback (DAG-based undo).
- **High**: Cross-protocol translation (agnostic layer), human override (HITL), lifecycle (versioning/retirement of workflows), multi-agent consensus (coordination in DAG), cross-domain security and dynamic trust (regulated regime).
- **Medium**: Monitoring, explainability (HITL), provenance (regulated regime).
### From overlap
- **A2A protocols** (92 drafts, avg pairwise sim 0.76): Heavy duplication; a thin *ecosystem layer* on top of "any A2A" would reduce friction.
- **Agent discovery/reg** (57) and **identity/auth** (98): Discovery and identity are shared concerns; the ecosystem draft can reference ANS, ADL, OAuth RAR, etc., without mandating one.
- **Human-agent** (22) is underweight; HITL should be a first-class extension point in the ecosystem document.
### From ideas (sample)
- **DAG/context**: "Working Memory", "Execution Context Token", "Task-Oriented Multi-Agent Recovery Framework", "State Consistency Management", "Checkpoint".
- **HITL**: "CHEQ protocol", "Human Oversight Requirements", "Human-in-the-Loop", "Emergency Override".
- **Agnostic**: "Automated Protocol Adaptation", "Semantic Routing", "Protocol Adapter Layer", "Cross-Protocol Translation".
- **Dual regime**: "Cryptographic Proof-Based Autonomy", "Verifiable Agent Behavior Attestation", "Trust Scoring", "Behavioral Monitoring", "Data Provenance".
---
## 4. Proposed Draft Family
Five drafts that together define the holistic ecosystem. Each fills gaps and references existing work (see landscape) to avoid duplication.
---
### Draft A: **Agent Ecosystem Model (AEM) — Architecture and Terminology**
**Role**: Informational. Single source of concepts (DAG, HITL, assurance levels, protocol agnosticism) so other drafts and WGs share vocabulary.
**Gaps addressed**: Cross-cutting; reduces overlap by establishing a common model.
**Outline**:
1. **Introduction** — Need for a unified model across 260+ drafts; scope (orchestration, control, assurance); non-goals (no new wire protocol).
2. **Terminology** — Agent, task, workflow, DAG, node, checkpoint, HITL point, assurance level, ecosystem layer, protocol binding.
3. **Architectural Model** — Ecosystem layer above protocol bindings (A2A, MCP, etc.); DAG as the execution model; HITL as optional nodes; assurance as an orthogonal axis.
4. **Assurance Levels** — Level 0 (best-effort, no proofs), Level 1 (audit trail), Level 2 (attestation/verification), Level 3 (full provenance and compliance). Same DAG/HITL model at every level.
5. **Protocol Agnosticism** — How the ecosystem layer binds to existing protocols (reference ANS, ADL, MCP, STAMP, DAAP, etc.); extension points.
6. **Security Considerations** — Trust boundaries; what each assurance level guarantees.
7. **IANA Considerations** — None or registry for assurance level identifiers.
**Target**: Individual or NMOP; **Status**: Informational.
**Related drafts**: draft-rosenberg-aiproto-framework, draft-zyyhl-agent-networks-framework, draft-nennemann-wimse-ect, draft-aylward-daap-v2.
---
### Draft B: **Agent Task DAG (ATD) — Execution Model and Checkpoints**
**Role**: Define the *semantics* of the DAG execution model (when to emit which ECT types, execution order, checkpoints, rollback) **using ECT as the token and DAG format**. Does not define a second wire format; avoids overlap with [draft-nennemann-wimse-ect](https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/).
**Gaps addressed**: **Gap 1** (Resource Management — scheduling/quotas per node), **Gap 3** (Error Recovery and Rollback).
**Outline**:
1. **Introduction** — Why DAG (ordering, dependencies, safe rollback); relationship to AEM; dependency on ECT for token format and DAG structure.
2. **Terminology** — Node, edge, root, leaf, checkpoint, rollback set, blast radius (aligned with ECT and AERR).
3. **Execution Semantics** — Topological order; node states (pending, running, done, failed, rolled back); when agents MUST/SHOULD emit ECTs; checkpoint placement (per ECT or per subgraph, per AERR).
4. **Resource Hints and Quotas** — Optional resource claims (e.g. in ECT `ext` or workflow descriptor); integration with scheduling (K8s, etc.); fair allocation when agents compete (gap 1).
5. **Checkpoint and Rollback** — Reference AERR for checkpoint/error/rollback ECT types; rollback protocol: walk ECT DAG backwards; circuit-breaker to contain cascading failure.
6. **Optional: Declarative Workflow** — JSON/CBOR for *pre-run* workflow definition (nodes, dependencies, HITL slots, resource hints); at runtime this is realized as ECTs. Enables tooling and portability without replacing ECT.
7. **Security Considerations** — Integrity of ECT DAG and checkpoints; who may trigger rollback.
8. **IANA Considerations** — None if ECT registry is used for node/action types; or registry for workflow descriptor media type.
**Target**: NMOP or individual; **Status**: Standards-track or Experimental.
**Related drafts**: draft-nennemann-wimse-ect, draft-aerr-agent-error-recovery-rollback, draft-yue-anima-agent-recovery-networks, draft-li-dmsc-macp.
---
### Draft C: **Human-in-the-Loop (HITL) Primitives for Agent Ecosystems**
**Role**: Standardize HITL as first-class: approval gates, escalation, emergency override, and explainability hooks.
**Gaps addressed**: **Gap 7** (Human Override and Intervention), **Gap 11** (Agent Explainability).
**Outline**:
1. **Introduction** — Need for HITL in autonomous systems; design goals (minimal friction in relaxed mode, mandatory in regulated).
2. **Terminology** — HITL point, approval gate, escalation, override, explainability token.
3. **HITL Point Model** — Placement in DAG (before/after node, or as a node); types: approval required, notification only, override-only (emergency).
4. **Approval and Escalation** — Request/response format (reference OAuth, CHEQ); timeouts and escalation paths; revocation of approval.
5. **Emergency Override** — Signal to halt or rollback; scope (single node, subgraph, full DAG); authentication and audit (reference DAAP, STAMP).
6. **Explainability** — Optional explainability token (summary, evidence, link to verifiable conversation); required at higher assurance levels.
7. **Binding to AEM** — How HITL integrates with Draft A assurance levels; optional in Level 0, required in Level 2+ for critical paths.
8. **Security Considerations** — Who can approve/override; replay and revocation.
9. **IANA Considerations** — None or registry for HITL point types.
**Target**: NMOP or OPS; **Status**: Standards-track or Experimental.
**Related drafts**: draft-rosenberg-aiproto-cheq, draft-rosenberg-cheq, draft-cui-nmrg-llm-nm, draft-irtf-nmrg-llm-nm, draft-aap-oauth-profile, draft-birkholz-verifiable-agent-conversations.
---
### Draft D: **Agent Ecosystem Protocol Binding (AEPB) — Agnostic Interop Layer**
**Role**: Define how the ecosystem layer (AEM + ATD + HITL) binds to existing A2A and discovery protocols; translation/negotiation so different protocols interoperate.
**Gaps addressed**: **Gap 4** (Cross-Protocol Translation), **Gap 5** (Lifecycle — versioning/retirement).
**Outline**:
1. **Introduction** — Proliferation of 92 A2A protocols; need for a binding layer that preserves ecosystem semantics (DAG, HITL) over any protocol.
2. **Terminology** — Protocol binding, translation, negotiation, capability advertisement.
3. **Capability Advertisement** — How agents/gateways advertise support for AEM/ATD/HITL and assurance levels (reference ADL, ANS, DNS-SD).
4. **Binding Requirements** — What a protocol must provide: task invocation, dependency ordering, checkpoint/rollback signals, HITL callbacks. Mapping to MCP, A2A over HTTP, A2A over MOQT, etc.
5. **Translation and Negotiation** — When two agents speak different protocols: gateway or negotiation (e.g. common subset); minimal translation schema (intent, result, error).
6. **Lifecycle** — Versioning of DAG and agents; graceful shutdown and drain; retirement without breaking dependents (gap 5).
7. **Security Considerations** — Trust of translators; integrity across protocol boundaries.
8. **IANA Considerations** — Registry for protocol binding identifiers.
**Target**: NMOP or individual; **Status**: Experimental.
**Related drafts**: draft-agent-gw, draft-narajala-ans, draft-nederveld-adl, draft-ainp-protocol, draft-mallick-muacp, draft-a2a-moqt-transport.
---
### Draft E: **Assurance Profiles for Agent Ecosystems (APAE) — Dual Regime**
**Role**: Define how the same ecosystem runs in "relaxed" vs "regulated" mode: which proofs, attestations, and provenance are required.
**Gaps addressed**: **Gap 2** (Behavior Verification), **Gap 8** (Cross-Domain Security), **Gap 9** (Dynamic Trust), **Gap 12** (Data Provenance).
**Outline**:
1. **Introduction** — Same DAG/HITL model in dev (K8s) vs regulated (finance, healthcare); assurance profile = dial for proof level.
2. **Terminology** — Assurance profile, proof, attestation, provenance, trust boundary.
3. **Profiles****Relaxed**: best-effort, optional audit; **Standard**: audit trail + optional attestation; **Regulated**: attestation per critical node, provenance chain, behavior verification (reference STAMP, DAAP, verifiable conversations).
4. **Behavior Verification** — How runtime behavior is checked against declared policy (reference DAAP, RATS, EAT); evidence format.
5. **Cross-Domain and Trust** — Trust boundaries between domains; dynamic trust scoring (reference Cosmos, SEAT); revocation.
6. **Provenance** — Data lineage along the DAG; format for provenance records (reference verifiable conversation format); retention.
7. **Security Considerations** — What each profile guarantees and does not guarantee.
8. **IANA Considerations** — Registry for assurance profile identifiers.
**Target**: NMOP or LAKE/RATS; **Status**: Informational or Experimental.
**Related drafts**: draft-guy-bary-stamp-protocol, draft-aylward-daap-v2, draft-birkholz-verifiable-agent-conversations, draft-jiang-seat-dynamic-attestation, draft-cosmos-protocol-specification.
---
## 5. Dependency Graph of Drafts
```
[A: AEM - Architecture & Terminology]
|
+----+----+----+----+
| | | | |
v v v v v
[B: ATD] [C: HITL] [D: AEPB] [E: APAE]
DAG Human Binding Assurance
+ + + +
| | | |
+--------+---------+---------+
|
Implementations: same ecosystem in K8s (relaxed) or regulated (proofs)
```
- **A** is the foundation; B, C, D, E reference A and each other where needed.
- **B** (ATD) and **C** (HITL) are the core execution model; **D** (AEPB) makes it agnostic; **E** (APAE) makes it dual-regime.
---
## 6. How to Use This With Your Analyzer
- **Generate outlines via CLI**: Use `ietf draft-gen <topic>` with topics: "Agent Ecosystem Model", "Agent Task DAG", "Human-in-the-Loop primitives", "Agent Ecosystem Protocol Binding", "Assurance Profiles for Agent Ecosystems". Feed the gap context from this file or from `data/reports/gaps.md`.
- **Cross-reference**: When writing each draft, cite the related drafts listed above; the analyzers `ietf similar` and `ietf compare` can find more.
- **Ideas database**: Search `ideas` for mechanism/architecture/protocol types (e.g. "checkpoint", "override", "translation") to pull concrete mechanisms into sections.
---
## 7. Deeper Datatracker Analysis — What to Run
To keep refining these drafts from the data, use your analyzer as follows:
| Goal | Commands / data |
|------|------------------|
| **Find drafts that touch DAG/workflow** | `ietf search "task graph"`, `ietf search "workflow"`, `ietf search "checkpoint"`, `ietf search "rollback"`; then `ietf similar <name>` for each. |
| **Find HITL-related work** | `ietf search "human-in-the-loop"`, `ietf search "override"`, `ietf search "approval"`; check `data/reports/ideas.md` for "CHEQ", "human", "override". |
| **Cross-protocol / interop** | `ietf report overlap-matrix`; focus on A2A vs other categories; `ietf similar draft-agent-gw` (protocol adaptation). |
| **Assurance / proofs** | Search ideas for "attestation", "verifiable", "provenance", "trust"; list drafts in AI safety/alignment and identity/auth. |
| **Gap → draft outline** | For each gap in `gaps.md`, run `ietf draft-gen "<gap topic>"` (e.g. "Agent Error Recovery and Rollback") and merge with the outlines above. |
| **Cluster overlap** | `ietf clusters --threshold 0.85` to see near-duplicates; use one as canonical and reference others to reduce fragmentation. |
Refreshing the pipeline periodically (`ietf fetch`, `ietf analyze --all`, `ietf ideas --all`, `ietf gaps`) keeps gaps and ideas aligned with the latest datatracker activity.
---
## 8. One-Page Pitch (Elevator Version)
**Problem**: 260+ IETF drafts on AI agents; no single model for orchestration (DAG), human oversight (HITL), or for running the same system in both fast and regulated environments.
**Proposal**: Five coordinated drafts — (A) shared architecture and terminology, (B) DAG execution with checkpoints and rollback, (C) HITL primitives (approval, override, explainability), (D) protocol-agnostic binding so any A2A protocol can participate, (E) assurance profiles so the same stack works in K8s or in a proof-heavy regulated regime.
**Outcome**: One holistic agent ecosystem: DAG + HITL built in, agnostic and extensible, applicable everywhere from relaxed to fully proven.