Files

Christian Nennemann d6beb9c0a0 v0.3.0: Gap-to-Draft pipeline, Living Standards Observatory, blog series

Gap-to-Draft Pipeline (ietf pipeline):
- Context builder assembles ideas, RFC foundations, similar drafts, ecosystem vision
- Generator produces outlines + sections using rich context with Claude
- Quality gates: novelty (embedding similarity), references, format, self-rating
- Family coordinator generates 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE)
- I-D formatter with proper headers, references, 72-char wrapping

Living Standards Observatory (ietf observatory):
- Source abstraction with IETF + W3C fetchers
- 7-step update pipeline: snapshot, fetch, analyze, embed, ideas, gaps, record
- Static GitHub Pages dashboard (explorer, gap tracker, timeline)
- Weekly CI/CD automation via GitHub Actions

Also includes:
- 361 drafts (expanded from 260 with 6 new keywords), 403 authors, 1,262 ideas, 12 gaps
- Blog series (8 posts planned), reports, arXiv paper figures
- Agent team infrastructure (CLAUDE.md, scripts, dev journal)
- 5 new DB tables, schema migration, ~15 new query methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-04 00:48:57 +01:00

22 KiB

Raw Blame History

Holistic Agent Ecosystem: Analysis and Draft Outlines

Derived from IETF draft analyzer gaps (12), overlap matrix (260 drafts), and 1,262 extracted ideas. Goal: a unified agent ecosystem with DAG orchestration and HITL built in, agnostic and extensible, applicable to both fast/relaxed (e.g. K8s) and regulated/proof-heavy environments.

1. Vision Summary

Pillar	Meaning	Gap / Overlap Evidence
DAG	Task/workflow as directed acyclic graph: dependencies, execution order, checkpoints, rollback along the graph.	Gap 3 (Error Recovery/Rollback), Gap 1 (Resource Management — scheduling/quotas). Ideas: "Task-Oriented Multi-Agent Recovery", "Working Memory", "Execution Context Token (ECT)". No single draft defines agent task DAG as a first-class construct.
HITL	Human-in-the-loop as a first-class primitive: approval gates, escalation, emergency override, explainability.	Gap 7 (Human Override); only 22 human-agent drafts vs 60 autonomous netops. Ideas: CHEQ (confirmation), "Human Oversight Requirements", "Level 4 Autonomous Network Architecture".
Agnostic + extensible	Protocol- and transport-agnostic; works over any A2A protocol; extensible via profiles.	Gap 4 (Cross-Protocol Translation) — 92 A2A drafts, no universal translation/negotiation. Overlap matrix: high within-category similarity (0.75+) but no interoperability layer.
Dual regime	Same model works in "fast/relaxed" (K8s, dev) and "regulated" (proofs, attestation, audit).	Gaps 2, 8, 9, 12: Behavior verification, cross-domain security, dynamic trust, data provenance. Ideas: STAMP, DAAP, verifiable conversations, ECT — all additive assurance.

2. How This Fits With WIMSE and ECT (Differentiation)

SPIFFE (CNCF) defines the identifier for a workload (spiffe://trust-domain/path) and its ecosystem (SVIDs, etc.). So "who" at the identifier level is already SPIFFE (or similar URI schemes).

WIMSE (Workload Identity in a Multi System Environment — draft-ietf-wimse-arch) is the IETF architecture for how workload identity and security context are conveyed and used across systems. It is not only "who". It covers:

Workload identity: identifier (WIMSE uses a URI; SPIFFE ID is one conforming example), credentials (WIT, X.509), trust domains.
Security context: "information needed for a workload to perform its function" — authorization, accounting, auditing, user info, what processing has already happened, propagation along the call chain.
Identity proxy: inspect, replace, or augment identity and context (e.g. at gateways).
Use cases: bootstrapping, service auth, authorization, audit trails, security context establishment and propagation, delegation, cross-boundary, AI/ML intermediaries.

So: SPIFFE = identifier (who). WIMSE = architecture for conveying identity + security context in protocols (who + context + propagation + authz + audit).

ECT (Execution Context Tokens), in draft-nennemann-wimse-ect, is a JWT-based extension that records what each task did: each ECT is a signed record of one task, linked to predecessors via a DAG (par / jti). ECT reuses the WIMSE signing model (same key as WIT) and adds: token format, HTTP transport (Execution-Context header), DAG validation, audit-ledger interface. So ECT = execution evidence (what happened) built on WIMSE identity/signing.

Fit in (recommended)

Our ecosystem drafts do not compete with SPIFFE, WIMSE, or ECT. They build on them:

Layer	SPIFFE / WIMSE / ECT	Our drafts (AEM, AERR, HEOP, DATS, CPAT, etc.)
Identifier	SPIFFE (or URI): "who"	We use existing identifiers (e.g. `iss` / SPIFFE ID in ECT).
Identity + context	WIMSE: credentials, security context, propagation, authz, audit	We assume WIMSE (or equivalent) for auth and context; we do not redefine it.
Evidence	ECT: token format, DAG linkage, signing, audit	We use ECT as the carrier: checkpoints, errors, overrides, trust assertions, translation hops are new ECT node types (new `exec_act` values and optional claims).
Semantics	ECT: “a task happened, here are parents”	We define orchestration and operations: dependencies, checkpoints, rollback protocol, HITL points, resource hints, assurance profiles, protocol binding.

Concretely:

AERR (Error Recovery/Rollback): Checkpoints, errors, and rollback results are ECTs with specific exec_act and extension claims. Rollback walks the ECT DAG; no second DAG format.
HEOP (Human Emergency Override): Override and acknowledgment are ECTs that link into the same ECT DAG for audit.
DATS (Dynamic Trust): Trust events are derived from ECT outcomes; trust assertions are ECTs.
CPAT (Cross-Protocol Translation): Each translation hop produces an ECT, so the cross-protocol path is one continuous ECT DAG.
Agent DAG HITL (draft-nennemann-agent-dag-hitl-safety): Policy for when HITL is required; decisions and overrides still record as ECTs.

So the execution model (DAG of tasks, checkpoints, rollback, HITL) is implemented using ECT as the token and DAG format. We add semantics and protocols on top, not a new token or DAG structure.

Differentiate (what we add)

Concern	WIMSE/ECT	Our ecosystem
DAG	ECT defines how nodes link (`par`, `jti`, validation).	We define what a node means (task, checkpoint, error, override, trust, translation), when to create them, and how to act on them (rollback, circuit breaker, HITL flow).
Orchestration	Out of scope for ECT.	Execution order, resource hints, scheduling, lifecycle; can be described in a declarative workflow (e.g. JSON) that is realized as ECTs at runtime.
Recovery	Not in ECT.	AERR: checkpoint placement, error propagation, rollback protocol, circuit breaker.
HITL	Not in ECT.	HEOP + Agent DAG HITL: approval gates, override, escalation; all recorded as ECTs.
Trust / assurance	ECT provides signed, linked evidence.	DATS, APAE: how to derive trust from ECT outcomes; assurance levels and profiles.
Interop	Single token format (ECT over HTTP).	CPAT, AEPB: translation between protocols; each hop still emits ECTs so the cross-protocol run is one ECT DAG.

In other words: SPIFFE = who (identifier). WIMSE = identity + security context + propagation + authz + audit. ECT = execution evidence (DAG of signed records). Our work = orchestration, recovery, HITL, trust, and interop that consume and produce that evidence.

Implications for the draft family

Draft A (AEM): Should state that the reference implementation of the execution model is ECT: “The ecosystem uses Execution Context Tokens (ECT) [I-D.nennemann-wimse-ect] as the standard representation of task execution and DAG linkage; extensions (e.g. AERR, HEOP) define additional ECT node types and procedures.”
Draft B (ATD): Either (1) ATD = abstract model (nodes, edges, checkpoints, rollback set) with “ECT is one binding” and optional JSON/CBOR for declarative workflow definition, or (2) ATD = semantics of ECT usage (when to emit which ECT types, execution semantics, resource hints) without a second wire format. Prefer (2) to avoid overlap with ECT and keep one DAG format.
Drafts C, D, E (HITL, AEPB, APAE): Same pattern: define procedures and semantics; record all significant events as ECTs so the full run is one auditable ECT DAG. Reference SPIFFE/WIMSE for identity and context, and ECT for format and validation.

One-sentence positioning

“SPIFFE gives the identifier; WIMSE gives the architecture for identity and security context; ECT gives the DAG evidence format. Our drafts specify how to use that format for orchestration, recovery, human oversight, trust, and cross-protocol interop, so the same stack works from relaxed to fully regulated.”

3. How the Data Supports This

From gap analysis

Critical: Resource management (scheduling/quotas for DAG nodes), behavior verification (runtime proofs in regulated mode), error recovery/rollback (DAG-based undo).
High: Cross-protocol translation (agnostic layer), human override (HITL), lifecycle (versioning/retirement of workflows), multi-agent consensus (coordination in DAG), cross-domain security and dynamic trust (regulated regime).
Medium: Monitoring, explainability (HITL), provenance (regulated regime).

From overlap

A2A protocols (92 drafts, avg pairwise sim 0.76): Heavy duplication; a thin ecosystem layer on top of "any A2A" would reduce friction.
Agent discovery/reg (57) and identity/auth (98): Discovery and identity are shared concerns; the ecosystem draft can reference ANS, ADL, OAuth RAR, etc., without mandating one.
Human-agent (22) is underweight; HITL should be a first-class extension point in the ecosystem document.

From ideas (sample)

DAG/context: "Working Memory", "Execution Context Token", "Task-Oriented Multi-Agent Recovery Framework", "State Consistency Management", "Checkpoint".
HITL: "CHEQ protocol", "Human Oversight Requirements", "Human-in-the-Loop", "Emergency Override".
Agnostic: "Automated Protocol Adaptation", "Semantic Routing", "Protocol Adapter Layer", "Cross-Protocol Translation".
Dual regime: "Cryptographic Proof-Based Autonomy", "Verifiable Agent Behavior Attestation", "Trust Scoring", "Behavioral Monitoring", "Data Provenance".

4. Proposed Draft Family

Five drafts that together define the holistic ecosystem. Each fills gaps and references existing work (see landscape) to avoid duplication.

Draft A: Agent Ecosystem Model (AEM) — Architecture and Terminology

Role: Informational. Single source of concepts (DAG, HITL, assurance levels, protocol agnosticism) so other drafts and WGs share vocabulary.

Gaps addressed: Cross-cutting; reduces overlap by establishing a common model.

Outline:

Introduction — Need for a unified model across 260+ drafts; scope (orchestration, control, assurance); non-goals (no new wire protocol).
Terminology — Agent, task, workflow, DAG, node, checkpoint, HITL point, assurance level, ecosystem layer, protocol binding.
Architectural Model — Ecosystem layer above protocol bindings (A2A, MCP, etc.); DAG as the execution model; HITL as optional nodes; assurance as an orthogonal axis.
Assurance Levels — Level 0 (best-effort, no proofs), Level 1 (audit trail), Level 2 (attestation/verification), Level 3 (full provenance and compliance). Same DAG/HITL model at every level.
Protocol Agnosticism — How the ecosystem layer binds to existing protocols (reference ANS, ADL, MCP, STAMP, DAAP, etc.); extension points.
Security Considerations — Trust boundaries; what each assurance level guarantees.
IANA Considerations — None or registry for assurance level identifiers.

Target: Individual or NMOP; Status: Informational.
Related drafts: draft-rosenberg-aiproto-framework, draft-zyyhl-agent-networks-framework, draft-nennemann-wimse-ect, draft-aylward-daap-v2.

Draft B: Agent Task DAG (ATD) — Execution Model and Checkpoints

Role: Define the semantics of the DAG execution model (when to emit which ECT types, execution order, checkpoints, rollback) using ECT as the token and DAG format. Does not define a second wire format; avoids overlap with draft-nennemann-wimse-ect.

Gaps addressed: Gap 1 (Resource Management — scheduling/quotas per node), Gap 3 (Error Recovery and Rollback).

Outline:

Introduction — Why DAG (ordering, dependencies, safe rollback); relationship to AEM; dependency on ECT for token format and DAG structure.
Terminology — Node, edge, root, leaf, checkpoint, rollback set, blast radius (aligned with ECT and AERR).
Execution Semantics — Topological order; node states (pending, running, done, failed, rolled back); when agents MUST/SHOULD emit ECTs; checkpoint placement (per ECT or per subgraph, per AERR).
Resource Hints and Quotas — Optional resource claims (e.g. in ECT ext or workflow descriptor); integration with scheduling (K8s, etc.); fair allocation when agents compete (gap 1).
Checkpoint and Rollback — Reference AERR for checkpoint/error/rollback ECT types; rollback protocol: walk ECT DAG backwards; circuit-breaker to contain cascading failure.
Optional: Declarative Workflow — JSON/CBOR for pre-run workflow definition (nodes, dependencies, HITL slots, resource hints); at runtime this is realized as ECTs. Enables tooling and portability without replacing ECT.
Security Considerations — Integrity of ECT DAG and checkpoints; who may trigger rollback.
IANA Considerations — None if ECT registry is used for node/action types; or registry for workflow descriptor media type.

Target: NMOP or individual; Status: Standards-track or Experimental.
Related drafts: draft-nennemann-wimse-ect, draft-aerr-agent-error-recovery-rollback, draft-yue-anima-agent-recovery-networks, draft-li-dmsc-macp.

Draft C: Human-in-the-Loop (HITL) Primitives for Agent Ecosystems

Role: Standardize HITL as first-class: approval gates, escalation, emergency override, and explainability hooks.

Gaps addressed: Gap 7 (Human Override and Intervention), Gap 11 (Agent Explainability).

Outline:

Introduction — Need for HITL in autonomous systems; design goals (minimal friction in relaxed mode, mandatory in regulated).
Terminology — HITL point, approval gate, escalation, override, explainability token.
HITL Point Model — Placement in DAG (before/after node, or as a node); types: approval required, notification only, override-only (emergency).
Approval and Escalation — Request/response format (reference OAuth, CHEQ); timeouts and escalation paths; revocation of approval.
Emergency Override — Signal to halt or rollback; scope (single node, subgraph, full DAG); authentication and audit (reference DAAP, STAMP).
Explainability — Optional explainability token (summary, evidence, link to verifiable conversation); required at higher assurance levels.
Binding to AEM — How HITL integrates with Draft A assurance levels; optional in Level 0, required in Level 2+ for critical paths.
Security Considerations — Who can approve/override; replay and revocation.
IANA Considerations — None or registry for HITL point types.

Target: NMOP or OPS; Status: Standards-track or Experimental.
Related drafts: draft-rosenberg-aiproto-cheq, draft-rosenberg-cheq, draft-cui-nmrg-llm-nm, draft-irtf-nmrg-llm-nm, draft-aap-oauth-profile, draft-birkholz-verifiable-agent-conversations.

Draft D: Agent Ecosystem Protocol Binding (AEPB) — Agnostic Interop Layer

Role: Define how the ecosystem layer (AEM + ATD + HITL) binds to existing A2A and discovery protocols; translation/negotiation so different protocols interoperate.

Gaps addressed: Gap 4 (Cross-Protocol Translation), Gap 5 (Lifecycle — versioning/retirement).

Outline:

Introduction — Proliferation of 92 A2A protocols; need for a binding layer that preserves ecosystem semantics (DAG, HITL) over any protocol.
Terminology — Protocol binding, translation, negotiation, capability advertisement.
Capability Advertisement — How agents/gateways advertise support for AEM/ATD/HITL and assurance levels (reference ADL, ANS, DNS-SD).
Binding Requirements — What a protocol must provide: task invocation, dependency ordering, checkpoint/rollback signals, HITL callbacks. Mapping to MCP, A2A over HTTP, A2A over MOQT, etc.
Translation and Negotiation — When two agents speak different protocols: gateway or negotiation (e.g. common subset); minimal translation schema (intent, result, error).
Lifecycle — Versioning of DAG and agents; graceful shutdown and drain; retirement without breaking dependents (gap 5).
Security Considerations — Trust of translators; integrity across protocol boundaries.
IANA Considerations — Registry for protocol binding identifiers.

Target: NMOP or individual; Status: Experimental.
Related drafts: draft-agent-gw, draft-narajala-ans, draft-nederveld-adl, draft-ainp-protocol, draft-mallick-muacp, draft-a2a-moqt-transport.

Draft E: Assurance Profiles for Agent Ecosystems (APAE) — Dual Regime

Role: Define how the same ecosystem runs in "relaxed" vs "regulated" mode: which proofs, attestations, and provenance are required.

Gaps addressed: Gap 2 (Behavior Verification), Gap 8 (Cross-Domain Security), Gap 9 (Dynamic Trust), Gap 12 (Data Provenance).

Outline:

Introduction — Same DAG/HITL model in dev (K8s) vs regulated (finance, healthcare); assurance profile = dial for proof level.
Terminology — Assurance profile, proof, attestation, provenance, trust boundary.
Profiles — Relaxed: best-effort, optional audit; Standard: audit trail + optional attestation; Regulated: attestation per critical node, provenance chain, behavior verification (reference STAMP, DAAP, verifiable conversations).
Behavior Verification — How runtime behavior is checked against declared policy (reference DAAP, RATS, EAT); evidence format.
Cross-Domain and Trust — Trust boundaries between domains; dynamic trust scoring (reference Cosmos, SEAT); revocation.
Provenance — Data lineage along the DAG; format for provenance records (reference verifiable conversation format); retention.
Security Considerations — What each profile guarantees and does not guarantee.
IANA Considerations — Registry for assurance profile identifiers.

Target: NMOP or LAKE/RATS; Status: Informational or Experimental.
Related drafts: draft-guy-bary-stamp-protocol, draft-aylward-daap-v2, draft-birkholz-verifiable-agent-conversations, draft-jiang-seat-dynamic-attestation, draft-cosmos-protocol-specification.

5. Dependency Graph of Drafts

    [A: AEM - Architecture & Terminology]
         |
    +----+----+----+----+
    |    |    |    |    |
    v    v    v    v    v
[B: ATD] [C: HITL] [D: AEPB] [E: APAE]
  DAG      Human     Binding   Assurance
  +        +         +         +
  |        |         |         |
  +--------+---------+---------+
         |
    Implementations: same ecosystem in K8s (relaxed) or regulated (proofs)

A is the foundation; B, C, D, E reference A and each other where needed.
B (ATD) and C (HITL) are the core execution model; D (AEPB) makes it agnostic; E (APAE) makes it dual-regime.

6. How to Use This With Your Analyzer

Generate outlines via CLI: Use ietf draft-gen <topic> with topics: "Agent Ecosystem Model", "Agent Task DAG", "Human-in-the-Loop primitives", "Agent Ecosystem Protocol Binding", "Assurance Profiles for Agent Ecosystems". Feed the gap context from this file or from data/reports/gaps.md.
Cross-reference: When writing each draft, cite the related drafts listed above; the analyzer’s ietf similar and ietf compare can find more.
Ideas database: Search ideas for mechanism/architecture/protocol types (e.g. "checkpoint", "override", "translation") to pull concrete mechanisms into sections.

7. Deeper Datatracker Analysis — What to Run

To keep refining these drafts from the data, use your analyzer as follows:

Goal	Commands / data
Find drafts that touch DAG/workflow	`ietf search "task graph"`, `ietf search "workflow"`, `ietf search "checkpoint"`, `ietf search "rollback"`; then `ietf similar <name>` for each.
Find HITL-related work	`ietf search "human-in-the-loop"`, `ietf search "override"`, `ietf search "approval"`; check `data/reports/ideas.md` for "CHEQ", "human", "override".
Cross-protocol / interop	`ietf report overlap-matrix`; focus on A2A vs other categories; `ietf similar draft-agent-gw` (protocol adaptation).
Assurance / proofs	Search ideas for "attestation", "verifiable", "provenance", "trust"; list drafts in AI safety/alignment and identity/auth.
Gap → draft outline	For each gap in `gaps.md`, run `ietf draft-gen "<gap topic>"` (e.g. "Agent Error Recovery and Rollback") and merge with the outlines above.
Cluster overlap	`ietf clusters --threshold 0.85` to see near-duplicates; use one as canonical and reference others to reduce fragmentation.

Refreshing the pipeline periodically (ietf fetch, ietf analyze --all, ietf ideas --all, ietf gaps) keeps gaps and ideas aligned with the latest datatracker activity.

8. One-Page Pitch (Elevator Version)

Problem: 260+ IETF drafts on AI agents; no single model for orchestration (DAG), human oversight (HITL), or for running the same system in both fast and regulated environments.

Proposal: Five coordinated drafts — (A) shared architecture and terminology, (B) DAG execution with checkpoints and rollback, (C) HITL primitives (approval, override, explainability), (D) protocol-agnostic binding so any A2A protocol can participate, (E) assurance profiles so the same stack works in K8s or in a proof-heavy regulated regime.

Outcome: One holistic agent ecosystem: DAG + HITL built in, agnostic and extensible, applicable everywhere from relaxed to fully proven.

22 KiB Raw Blame History Unescape Escape

Holistic Agent Ecosystem: Analysis and Draft Outlines

1. Vision Summary

2. How This Fits With WIMSE and ECT (Differentiation)

Fit in (recommended)

Differentiate (what we add)

Implications for the draft family

One-sentence positioning

3. How the Data Supports This

From gap analysis

From overlap

From ideas (sample)

4. Proposed Draft Family

Draft A: Agent Ecosystem Model (AEM) — Architecture and Terminology

Draft B: Agent Task DAG (ATD) — Execution Model and Checkpoints

Draft C: Human-in-the-Loop (HITL) Primitives for Agent Ecosystems

Draft D: Agent Ecosystem Protocol Binding (AEPB) — Agnostic Interop Layer

Draft E: Assurance Profiles for Agent Ecosystems (APAE) — Dual Regime

5. Dependency Graph of Drafts

6. How to Use This With Your Analyzer

7. Deeper Datatracker Analysis — What to Run

8. One-Page Pitch (Elevator Version)

22 KiB

Raw Blame History