feat: add draft data, gap analysis report, and workspace config

2026-04-06 18:47:15 +02:00
parent 4f310407b0
commit 2506b6325a
189 changed files with 62649 additions and 0 deletions
--- a/workspace/draft-team/cycles/agent-error-recovery-rollback/30-outline.md
+++ b/workspace/draft-team/cycles/agent-error-recovery-rollback/30-outline.md
@@ -0,0 +1,79 @@
+# Draft Outline
+
+## Abstract
+
+State that the document defines experimental recovery semantics for multi-agent task execution, including failure signaling, rollback requests, rollback results, and checkpoint references. Make clear it is protocol-agnostic and intended to improve interoperable recovery behavior across agent ecosystems.
+
+## Section plan
+
+1. Introduction
+2. Terminology
+3. Problem Statement and Design Goals
+4. Recovery Model Overview
+5. Event Types and Required Fields
+6. Task States and Recovery Procedures
+7. Rollback Scope and Dependency Handling
+8. Error Conditions and Partial Rollback
+9. Security Considerations
+10. Privacy Considerations
+11. IANA Considerations
+12. References
+
+## Author guidance by section
+
+### 1. Introduction
+
+Explain why autonomous multi-agent systems need interoperable recovery behavior. Keep this grounded in failure propagation and operational safety, not generic AI rhetoric.
+
+### 2. Terminology
+
+Define only the core terms needed for this document: task, dependency, checkpoint, failure event, rollback set, recovery record, coordinator. Keep terms stable and conservative.
+
+### 3. Problem Statement and Design Goals
+
+Describe the exact gap: current drafts define communication and orchestration patterns, but no common rollback semantics. Include explicit goals such as idempotency, partial rollback transparency, and protocol-agnostic applicability.
+
+### 4. Recovery Model Overview
+
+Describe the model at a high level before any field-level detail. Separate local failure handling from cross-agent recovery signaling. Make clear what this document does not define.
+
+### 5. Event Types and Required Fields
+
+Define `checkpoint`, `failure`, `rollback-request`, and `rollback-result`. This section must specify required versus optional fields and avoid vague "metadata may include" language where interoperability depends on a field.
+
+### 6. Task States and Recovery Procedures
+
+Define the state transitions relevant to failure and rollback. Include procedure ordering: detect failure, emit failure event, decide rollback scope, send rollback request, emit rollback result. If escalation is possible, say when.
+
+### 7. Rollback Scope and Dependency Handling
+
+Define how dependencies influence rollback. Be explicit about direct versus transitive effects, what happens when scope is uncertain, and how actual applied scope is reported back.
+
+### 8. Error Conditions and Partial Rollback
+
+Handle non-reversible tasks, refusal, timeout, duplicate requests, and partial success. This section is important for implementability and must not collapse into generic prose.
+
+### 9. Security Considerations
+
+Address spoofing, replay, unauthorized rollback, false failure signaling, topology leakage, and abuse of partial rollback states. The section should be mechanism-specific.
+
+### 10. Privacy Considerations
+
+Address exposure of task identifiers, failure causes, dependency graphs, and sensitive operational details.
+
+### 11. IANA Considerations
+
+Either clearly say none, or request small registries for failure classes and rollback outcomes. Do not hand-wave this.
+
+### 12. References
+
+Use placeholders where necessary, but include adjacent drafts that informed the design and any underlying execution-evidence substrate if referenced.
+
+## Issues that must not be hand-waved
+
+- what fields are mandatory in each event
+- what counts as a successful versus partial rollback
+- how rollback requests remain idempotent
+- what an agent does when a requested rollback is impossible
+- how dependency-driven rollback scope is determined and reported
+- what security properties the mechanism relies on from lower layers