80 lines
3.6 KiB
Markdown
80 lines
3.6 KiB
Markdown
# Draft Outline
|
|
|
|
## Abstract
|
|
|
|
State that the document defines experimental recovery semantics for multi-agent task execution, including failure signaling, rollback requests, rollback results, and checkpoint references. Make clear it is protocol-agnostic and intended to improve interoperable recovery behavior across agent ecosystems.
|
|
|
|
## Section plan
|
|
|
|
1. Introduction
|
|
2. Terminology
|
|
3. Problem Statement and Design Goals
|
|
4. Recovery Model Overview
|
|
5. Event Types and Required Fields
|
|
6. Task States and Recovery Procedures
|
|
7. Rollback Scope and Dependency Handling
|
|
8. Error Conditions and Partial Rollback
|
|
9. Security Considerations
|
|
10. Privacy Considerations
|
|
11. IANA Considerations
|
|
12. References
|
|
|
|
## Author guidance by section
|
|
|
|
### 1. Introduction
|
|
|
|
Explain why autonomous multi-agent systems need interoperable recovery behavior. Keep this grounded in failure propagation and operational safety, not generic AI rhetoric.
|
|
|
|
### 2. Terminology
|
|
|
|
Define only the core terms needed for this document: task, dependency, checkpoint, failure event, rollback set, recovery record, coordinator. Keep terms stable and conservative.
|
|
|
|
### 3. Problem Statement and Design Goals
|
|
|
|
Describe the exact gap: current drafts define communication and orchestration patterns, but no common rollback semantics. Include explicit goals such as idempotency, partial rollback transparency, and protocol-agnostic applicability.
|
|
|
|
### 4. Recovery Model Overview
|
|
|
|
Describe the model at a high level before any field-level detail. Separate local failure handling from cross-agent recovery signaling. Make clear what this document does not define.
|
|
|
|
### 5. Event Types and Required Fields
|
|
|
|
Define `checkpoint`, `failure`, `rollback-request`, and `rollback-result`. This section must specify required versus optional fields and avoid vague "metadata may include" language where interoperability depends on a field.
|
|
|
|
### 6. Task States and Recovery Procedures
|
|
|
|
Define the state transitions relevant to failure and rollback. Include procedure ordering: detect failure, emit failure event, decide rollback scope, send rollback request, emit rollback result. If escalation is possible, say when.
|
|
|
|
### 7. Rollback Scope and Dependency Handling
|
|
|
|
Define how dependencies influence rollback. Be explicit about direct versus transitive effects, what happens when scope is uncertain, and how actual applied scope is reported back.
|
|
|
|
### 8. Error Conditions and Partial Rollback
|
|
|
|
Handle non-reversible tasks, refusal, timeout, duplicate requests, and partial success. This section is important for implementability and must not collapse into generic prose.
|
|
|
|
### 9. Security Considerations
|
|
|
|
Address spoofing, replay, unauthorized rollback, false failure signaling, topology leakage, and abuse of partial rollback states. The section should be mechanism-specific.
|
|
|
|
### 10. Privacy Considerations
|
|
|
|
Address exposure of task identifiers, failure causes, dependency graphs, and sensitive operational details.
|
|
|
|
### 11. IANA Considerations
|
|
|
|
Either clearly say none, or request small registries for failure classes and rollback outcomes. Do not hand-wave this.
|
|
|
|
### 12. References
|
|
|
|
Use placeholders where necessary, but include adjacent drafts that informed the design and any underlying execution-evidence substrate if referenced.
|
|
|
|
## Issues that must not be hand-waved
|
|
|
|
- what fields are mandatory in each event
|
|
- what counts as a successful versus partial rollback
|
|
- how rollback requests remain idempotent
|
|
- what an agent does when a requested rollback is impossible
|
|
- how dependency-driven rollback scope is determined and reported
|
|
- what security properties the mechanism relies on from lower layers
|