feat: add draft data, gap analysis report, and workspace config
This commit is contained in:
@@ -0,0 +1,70 @@
|
||||
# User Spec
|
||||
|
||||
## Topic
|
||||
|
||||
Agent Error Recovery and Rollback for Multi-Agent Systems
|
||||
|
||||
## Goal
|
||||
|
||||
Produce a credible IETF-style Internet-Draft for a narrowly scoped mechanism that standardizes how cooperating agents report failures, define rollback scope, and execute coordinated recovery without cascading damage.
|
||||
|
||||
## Intended status
|
||||
|
||||
Experimental.
|
||||
|
||||
Rationale: the problem is clearly real and under-specified, but the ecosystem is still young and the mechanism should not pretend to have full deployment consensus yet.
|
||||
|
||||
## Problem to solve
|
||||
|
||||
Current AI-agent and autonomous-operations drafts define communication, identity, and orchestration patterns, but the landscape analysis shows no common mechanism for:
|
||||
|
||||
- signaling execution failure in a machine-actionable way
|
||||
- declaring rollback boundaries and blast radius
|
||||
- coordinating rollback across dependent agents
|
||||
- recording recovery outcomes for audit and future trust decisions
|
||||
|
||||
This creates high interoperability and safety risk for autonomous systems that act across multiple services or domains.
|
||||
|
||||
## What must be true in the final draft
|
||||
|
||||
- The draft stays tightly scoped to recovery and rollback semantics, not a full agent architecture.
|
||||
- The mechanism is protocol-agnostic enough to work across multiple agent ecosystems.
|
||||
- The draft defines concrete states, triggers, and recovery procedures that two implementers could follow consistently.
|
||||
- Security Considerations meaningfully address spoofed rollback, unauthorized override, replay, and denial-of-service by false failure signaling.
|
||||
- The text is shaped like a real Internet-Draft, not a product design memo.
|
||||
- The draft clearly states what is in scope now and what is deferred to later work such as richer workflow orchestration or dynamic trust scoring.
|
||||
|
||||
## Constraints
|
||||
|
||||
- scope constraints
|
||||
Keep this to rollback and recovery coordination. Do not absorb lifecycle management, full workflow DAG standardization, or human override into the core mechanism except where needed as interfaces.
|
||||
- compatibility constraints
|
||||
Reuse adjacent concepts where possible from existing IETF-style work on execution evidence, attestation, or agent communication. Do not invent a full new identity or transport stack.
|
||||
- terminology constraints
|
||||
Use conservative standards language. Prefer terms like agent, execution, checkpoint, rollback set, dependency, and recovery record. Avoid buzzwords and branding.
|
||||
|
||||
## Source materials to prioritize
|
||||
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/gaps.md`
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/holistic-agent-ecosystem-draft-outlines.md`
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/ideas.md`
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/overview.md`
|
||||
- `draft-yue-anima-agent-recovery-networks`
|
||||
- `draft-li-dmsc-macp`
|
||||
- `draft-fu-nmop-agent-communication-framework`
|
||||
- `draft-srijal-agents-policy`
|
||||
- related WIMSE or ECT materials when they help avoid redefining execution evidence
|
||||
|
||||
## Success criteria
|
||||
|
||||
- A reader can tell exactly what an agent must emit or process when a task fails.
|
||||
- A reader can tell how rollback scope is determined and how dependent agents respond.
|
||||
- The draft includes enough structure to support interoperability testing later.
|
||||
- Specialist reviewers can criticize the draft on substance rather than on missing basic sections or obvious ambiguity.
|
||||
|
||||
## Questions for the team
|
||||
|
||||
- What is the smallest interoperable core for rollback semantics?
|
||||
- Should checkpoints and recovery records be abstract objects, protocol messages, or profileable metadata on top of another carrier?
|
||||
- What information is mandatory in a failure signal versus optional?
|
||||
- How should rollback interact with partially completed downstream work?
|
||||
@@ -0,0 +1,27 @@
|
||||
# Cycle Status
|
||||
|
||||
## Summary
|
||||
|
||||
- cycle: agent-error-recovery-rollback
|
||||
- version: v1
|
||||
- last updated: 2026-03-02 18:00 UTC
|
||||
|
||||
## Artifact Status
|
||||
|
||||
- `00-user-spec.md`: written
|
||||
- `10-research-brief.md`: written
|
||||
- `20-architecture-brief.md`: written
|
||||
- `30-outline.md`: written
|
||||
- `40-draft-v1.md`: written
|
||||
- `50-reviews-v1/security.md`: written
|
||||
- `50-reviews-v1/software.md`: written
|
||||
- `50-reviews-v1/architecture.md`: written
|
||||
- `50-reviews-v1/ietf-senior.md`: written
|
||||
- `55-review-synthesis-v1.md`: written
|
||||
- `60-revision-plan-v1.md`: written
|
||||
|
||||
## Notes
|
||||
|
||||
- written means the artifact contains substantive content.
|
||||
- stub means the file exists but still appears to be a placeholder.
|
||||
- missing means the expected file has not been created.
|
||||
@@ -0,0 +1,27 @@
|
||||
# Cycle Status
|
||||
|
||||
## Summary
|
||||
|
||||
- cycle: agent-error-recovery-rollback
|
||||
- version: v2
|
||||
- last updated: 2026-03-02 18:06 UTC
|
||||
|
||||
## Artifact Status
|
||||
|
||||
- `00-user-spec.md`: written
|
||||
- `10-research-brief.md`: written
|
||||
- `20-architecture-brief.md`: written
|
||||
- `30-outline.md`: written
|
||||
- `40-draft-v2.md`: written
|
||||
- `50-reviews-v2/security.md`: stub
|
||||
- `50-reviews-v2/software.md`: stub
|
||||
- `50-reviews-v2/architecture.md`: stub
|
||||
- `50-reviews-v2/ietf-senior.md`: stub
|
||||
- `55-review-synthesis-v2.md`: stub
|
||||
- `60-revision-plan-v2.md`: stub
|
||||
|
||||
## Notes
|
||||
|
||||
- written means the artifact contains substantive content.
|
||||
- stub means the file exists but still appears to be a placeholder.
|
||||
- missing means the expected file has not been created.
|
||||
@@ -0,0 +1,60 @@
|
||||
# Research Brief
|
||||
|
||||
## Problem framing
|
||||
|
||||
Fact: the analyzer identifies Agent Error Recovery and Rollback as a critical gap in the current IETF AI/agent landscape, especially within autonomous netops. Fact: the gap statement is specific: current drafts discuss communication and coordination, but do not define a common mechanism for machine-actionable failure signaling, rollback boundaries, or coordinated recovery across dependent agents.
|
||||
|
||||
Inference: this is a good first draft topic because it is narrower and more defensible than a full agent orchestration architecture, while still addressing a real interoperability and safety problem. Hypothesis: the best initial document is an experimental protocol or profile for failure, checkpoint, rollback-request, and rollback-result semantics, not a complete workflow language.
|
||||
|
||||
## Evidence from existing drafts
|
||||
|
||||
Fact: the gap report cites only six extracted ideas that partially touch this area. The strongest adjacent ideas are "Task-Oriented Multi-Agent Recovery Framework", "Inter-Agent Communication Protocol Requirements", and "State Consistency Management" from `draft-yue-anima-agent-recovery-networks`, plus "Mandatory restrictive failure behavior" from `draft-srijal-agents-policy`.
|
||||
|
||||
Fact: adjacent drafts in the space include `draft-li-dmsc-macp`, `draft-fu-nmop-agent-communication-framework`, `draft-mallick-muacp`, and `draft-zyyhl-agent-networks-framework`. These appear to focus on collaboration or communication frameworks, not interoperable rollback semantics.
|
||||
|
||||
Fact: the landscape overview shows high activity and overlap in adjacent categories, but not maturity on recovery. `draft-li-dmsc-macp` scores well overall, while `draft-fu-nmop-agent-communication-framework` is relevant but lower maturity. This suggests there is ecosystem pressure for operational coordination, yet no shared recovery core has emerged.
|
||||
|
||||
Fact: the ideas corpus also shows related building blocks such as agent context propagation, working memory, authorization profiles, attestation, and policy enforcement. These matter because rollback decisions depend on shared execution context and trustworthy signaling, even if the rollback draft should not standardize those mechanisms itself.
|
||||
|
||||
## Overlap and adjacent work
|
||||
|
||||
Fact: `holistic-agent-ecosystem-draft-outlines.md` already frames recovery as part of a broader family and recommends using an execution-evidence substrate such as ECT rather than inventing a second DAG or token format. That same document suggests rollback should be represented through explicit checkpoint, error, rollback-request, and rollback-result events.
|
||||
|
||||
Inference: the closest collision risk is not another rollback standard, but accidental overreach into three nearby topics:
|
||||
|
||||
- full task DAG and orchestration semantics
|
||||
- human override and intervention
|
||||
- dynamic trust and assurance
|
||||
|
||||
Inference: the architect should treat those as interfaces, not as primary scope. The rollback draft should define how recovery interacts with dependencies and checkpoints, while leaving workflow planning, trust scoring, and human escalation to companion work or future drafts.
|
||||
|
||||
## Gaps and unresolved questions
|
||||
|
||||
Fact: the current evidence does not yet establish a canonical wire format or transport for rollback signaling. Fact: the analyzer materials argue for reusing adjacent execution-evidence work, but do not prove that one specific substrate is mature enough to normatively depend on.
|
||||
|
||||
Open questions:
|
||||
|
||||
- What is the minimum mandatory information in a failure signal: task identifier, parent dependency, failure class, reversibility, checkpoint reference, and rollback scope are likely candidates, but the exact set still needs comparison against existing drafts.
|
||||
- Should rollback scope be defined as explicit dependency closure, implementation-local policy, or both?
|
||||
- How should partially completed downstream actions be marked when they are not cleanly reversible?
|
||||
- Which failures require automatic circuit breaking versus optional operator or policy input?
|
||||
- Can the draft stay protocol-agnostic while still being testable by independent implementers?
|
||||
|
||||
## Additional data worth investigating
|
||||
|
||||
- Verify whether WIMSE or ECT-related drafts already define reusable execution identifiers, parent linkage, or signed event records that would let this draft avoid inventing its own carrier.
|
||||
- Inspect `draft-yue-anima-agent-recovery-networks` directly for concrete recovery states, not just its analyzer summary.
|
||||
- Compare `draft-li-dmsc-macp` and `draft-fu-nmop-agent-communication-framework` for any existing error taxonomy, dependency model, or task lifecycle signaling.
|
||||
- Search the ideas set for `checkpoint`, `rollback`, `error`, `failure`, `compensation`, and `circuit breaker` to see whether additional partially related mechanisms were missed by the headline gap report.
|
||||
|
||||
## Recommendation to the architect
|
||||
|
||||
Design the first draft as a narrowly scoped experimental specification for coordinated recovery semantics in multi-agent execution. Keep the document centered on:
|
||||
|
||||
- failure and checkpoint vocabulary
|
||||
- task state transitions
|
||||
- rollback request and result signaling
|
||||
- dependency-aware rollback scope
|
||||
- minimal security requirements for authentic and authorized recovery events
|
||||
|
||||
Avoid defining a new identity system, full orchestration language, human override workflow, or trust-scoring model. If a reusable execution-evidence substrate exists, bind to it; otherwise define a minimal abstract event model that can later be profiled onto specific carriers.
|
||||
@@ -0,0 +1,121 @@
|
||||
# Architecture Brief
|
||||
|
||||
## Scope
|
||||
|
||||
Define an experimental, protocol-agnostic recovery model for multi-agent execution that standardizes:
|
||||
|
||||
- failure signaling
|
||||
- checkpoint references
|
||||
- rollback request and rollback result semantics
|
||||
- dependency-aware rollback scope
|
||||
- minimum task state transitions relevant to recovery
|
||||
|
||||
The document should be narrow enough that an existing agent protocol or execution-evidence carrier can adopt it as a profile or extension.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- defining a full workflow or DAG language
|
||||
- defining human override or approval workflows beyond a hook for escalation
|
||||
- defining identity, authentication, or attestation systems
|
||||
- defining global trust scoring or reputation exchange
|
||||
- defining scheduler behavior, quota fairness, or resource arbitration beyond optional future hooks
|
||||
|
||||
## Terminology and actors
|
||||
|
||||
- `agent`: autonomous software entity performing one or more tasks
|
||||
- `task`: a discrete unit of work whose execution and outcome can be referenced
|
||||
- `dependency`: another task whose outcome affects whether the current task may continue or must roll back
|
||||
- `checkpoint`: a recorded pre-action or recovery-safe state from which rollback may proceed
|
||||
- `failure event`: a machine-actionable signal that a task or dependency failed
|
||||
- `rollback set`: the set of tasks and effects that the sender requests to revert or compensate
|
||||
- `recovery record`: a record of rollback attempt, success, partial success, or failure
|
||||
- `coordinator`: optional role that computes rollback scope across multiple dependent agents
|
||||
|
||||
Actors:
|
||||
|
||||
- originating agent that detects failure
|
||||
- dependent agent that receives failure or rollback signals
|
||||
- optional coordination service or gateway
|
||||
- policy authority or operator only when automatic rollback is disallowed
|
||||
|
||||
## Protocol or data model shape
|
||||
|
||||
Use an abstract event model with four core event types:
|
||||
|
||||
1. `checkpoint`
|
||||
2. `failure`
|
||||
3. `rollback-request`
|
||||
4. `rollback-result`
|
||||
|
||||
Each event should carry a minimum common envelope:
|
||||
|
||||
- event identifier
|
||||
- task identifier
|
||||
- workflow or execution context identifier if available
|
||||
- sender identity reference
|
||||
- timestamp
|
||||
- referenced parent task or dependency identifiers where relevant
|
||||
|
||||
Event-specific content:
|
||||
|
||||
- `checkpoint`: checkpoint identifier, reversibility class, optional expiry
|
||||
- `failure`: failure class, severity, reversibility indicator, blast-radius hint, failed dependency reference
|
||||
- `rollback-request`: target checkpoint or rollback boundary, requested rollback scope, reason code, urgency, idempotency token
|
||||
- `rollback-result`: outcome status, actual scope applied, partial rollback indicators, residual risk or manual follow-up required
|
||||
|
||||
State model:
|
||||
|
||||
- `pending`
|
||||
- `running`
|
||||
- `completed`
|
||||
- `failed`
|
||||
- `rollback-requested`
|
||||
- `rolled-back`
|
||||
- `rollback-failed`
|
||||
- `compensation-required`
|
||||
|
||||
Design choice: keep the carrier abstract in this first draft, but include a section describing how the model may bind to existing execution-evidence formats if such a substrate is available and sufficiently mature.
|
||||
|
||||
## Normative requirements candidates
|
||||
|
||||
- Agents MUST emit a failure event when a task failure can affect dependent execution outside local process scope.
|
||||
- Failure events MUST identify the failed task and SHOULD identify affected dependencies when known.
|
||||
- Rollback requests MUST be idempotent and uniquely identifiable.
|
||||
- Agents receiving a rollback request MUST return a rollback result, even when rollback is refused or only partially completed.
|
||||
- A rollback result MUST indicate one of: success, partial success, refusal, irreversible, or failure.
|
||||
- Agents MUST NOT claim successful rollback unless the referenced effects were actually reverted or explicitly compensated.
|
||||
- If a task is not reversible, the agent MUST signal that fact explicitly rather than silently ignoring rollback.
|
||||
- Implementations SHOULD support checkpoint references when a task has externally visible side effects.
|
||||
- The specification SHOULD allow policy-controlled escalation rather than requiring automatic rollback for every failure.
|
||||
- The document MUST distinguish rollback of prior effects from cancellation of work that has not yet executed.
|
||||
|
||||
## Security, privacy, and abuse considerations
|
||||
|
||||
- unauthorized rollback requests could be used as denial-of-service
|
||||
- spoofed failure signals could trigger cascading rollback
|
||||
- replayed rollback requests could repeatedly unwind completed work
|
||||
- rollback metadata may expose internal topology or sensitive task relationships
|
||||
- partial rollback can create inconsistent downstream state that attackers can exploit
|
||||
- signed or otherwise authenticated event carriage is strongly preferred, but the draft should avoid redefining base authentication
|
||||
- the draft should require clear handling of refusal, partial rollback, and policy escalation to avoid silent unsafe states
|
||||
|
||||
Privacy is probably secondary but not zero: task identifiers, dependency graphs, and failure reasons can leak operational details.
|
||||
|
||||
## IANA impact
|
||||
|
||||
Most likely minimal for the first version.
|
||||
|
||||
If the draft defines abstract event or reason-code registries, keep them compact:
|
||||
|
||||
- rollback event types
|
||||
- failure classes
|
||||
- rollback outcome codes
|
||||
|
||||
If an existing registry from an underlying carrier can be reused, prefer that.
|
||||
|
||||
## Open design questions
|
||||
|
||||
- Should rollback scope be defined normatively as dependency closure, or left partially implementation-specific with mandatory disclosure of actual scope?
|
||||
- Is a separate `cancellation` event needed, or is that explicitly out of scope for this draft?
|
||||
- How much of checkpoint semantics should be mandatory versus profile-specific?
|
||||
- Can one draft stay both carrier-agnostic and implementable, or does it need a non-normative binding example to avoid vagueness?
|
||||
@@ -0,0 +1,79 @@
|
||||
# Draft Outline
|
||||
|
||||
## Abstract
|
||||
|
||||
State that the document defines experimental recovery semantics for multi-agent task execution, including failure signaling, rollback requests, rollback results, and checkpoint references. Make clear it is protocol-agnostic and intended to improve interoperable recovery behavior across agent ecosystems.
|
||||
|
||||
## Section plan
|
||||
|
||||
1. Introduction
|
||||
2. Terminology
|
||||
3. Problem Statement and Design Goals
|
||||
4. Recovery Model Overview
|
||||
5. Event Types and Required Fields
|
||||
6. Task States and Recovery Procedures
|
||||
7. Rollback Scope and Dependency Handling
|
||||
8. Error Conditions and Partial Rollback
|
||||
9. Security Considerations
|
||||
10. Privacy Considerations
|
||||
11. IANA Considerations
|
||||
12. References
|
||||
|
||||
## Author guidance by section
|
||||
|
||||
### 1. Introduction
|
||||
|
||||
Explain why autonomous multi-agent systems need interoperable recovery behavior. Keep this grounded in failure propagation and operational safety, not generic AI rhetoric.
|
||||
|
||||
### 2. Terminology
|
||||
|
||||
Define only the core terms needed for this document: task, dependency, checkpoint, failure event, rollback set, recovery record, coordinator. Keep terms stable and conservative.
|
||||
|
||||
### 3. Problem Statement and Design Goals
|
||||
|
||||
Describe the exact gap: current drafts define communication and orchestration patterns, but no common rollback semantics. Include explicit goals such as idempotency, partial rollback transparency, and protocol-agnostic applicability.
|
||||
|
||||
### 4. Recovery Model Overview
|
||||
|
||||
Describe the model at a high level before any field-level detail. Separate local failure handling from cross-agent recovery signaling. Make clear what this document does not define.
|
||||
|
||||
### 5. Event Types and Required Fields
|
||||
|
||||
Define `checkpoint`, `failure`, `rollback-request`, and `rollback-result`. This section must specify required versus optional fields and avoid vague "metadata may include" language where interoperability depends on a field.
|
||||
|
||||
### 6. Task States and Recovery Procedures
|
||||
|
||||
Define the state transitions relevant to failure and rollback. Include procedure ordering: detect failure, emit failure event, decide rollback scope, send rollback request, emit rollback result. If escalation is possible, say when.
|
||||
|
||||
### 7. Rollback Scope and Dependency Handling
|
||||
|
||||
Define how dependencies influence rollback. Be explicit about direct versus transitive effects, what happens when scope is uncertain, and how actual applied scope is reported back.
|
||||
|
||||
### 8. Error Conditions and Partial Rollback
|
||||
|
||||
Handle non-reversible tasks, refusal, timeout, duplicate requests, and partial success. This section is important for implementability and must not collapse into generic prose.
|
||||
|
||||
### 9. Security Considerations
|
||||
|
||||
Address spoofing, replay, unauthorized rollback, false failure signaling, topology leakage, and abuse of partial rollback states. The section should be mechanism-specific.
|
||||
|
||||
### 10. Privacy Considerations
|
||||
|
||||
Address exposure of task identifiers, failure causes, dependency graphs, and sensitive operational details.
|
||||
|
||||
### 11. IANA Considerations
|
||||
|
||||
Either clearly say none, or request small registries for failure classes and rollback outcomes. Do not hand-wave this.
|
||||
|
||||
### 12. References
|
||||
|
||||
Use placeholders where necessary, but include adjacent drafts that informed the design and any underlying execution-evidence substrate if referenced.
|
||||
|
||||
## Issues that must not be hand-waved
|
||||
|
||||
- what fields are mandatory in each event
|
||||
- what counts as a successful versus partial rollback
|
||||
- how rollback requests remain idempotent
|
||||
- what an agent does when a requested rollback is impossible
|
||||
- how dependency-driven rollback scope is determined and reported
|
||||
- what security properties the mechanism relies on from lower layers
|
||||
@@ -0,0 +1,216 @@
|
||||
# Draft
|
||||
|
||||
## Abstract
|
||||
|
||||
This document defines experimental recovery semantics for multi-agent task execution. It specifies common event types for failure signaling, checkpoint reference, rollback requests, and rollback results so that cooperating agents can coordinate recovery after operational faults. The mechanism is protocol-agnostic and is intended to be profiled onto existing agent communication or execution-evidence substrates. The goal is to improve interoperability when autonomous systems must contain failures, report rollback scope, and communicate partial or unsuccessful recovery without silent divergence.
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
Multi-agent systems increasingly perform coordinated work across services, tools, and administrative domains. In such systems, one task failure can invalidate downstream work, require compensating actions, or force a broader rollback of externally visible effects. Existing drafts define communication frameworks, discovery, identity, and broader orchestration concepts, but they do not define a shared recovery core that independent implementations can follow.
|
||||
|
||||
Absent common recovery semantics, one implementation may silently retry while another expects explicit rollback, and a third may report only local failure without describing downstream consequences. That mismatch creates interoperability risk and operational safety risk, especially when agents act without immediate human supervision.
|
||||
|
||||
This document defines a narrow recovery model for cross-agent failure handling. It does not define a full workflow language, a transport binding, or a human override system. Instead, it defines event semantics and minimum procedure rules so that agents can exchange recovery-relevant information consistently.
|
||||
|
||||
## 2. Terminology
|
||||
|
||||
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals.
|
||||
|
||||
Agent: an autonomous software entity that performs one or more tasks and may exchange recovery events with peers.
|
||||
|
||||
Task: a discrete unit of work whose execution and outcome can be identified.
|
||||
|
||||
Dependency: a relationship in which one task relies on the prior completion, state, or side effects of another task.
|
||||
|
||||
Checkpoint: a recorded state or recovery-safe reference from which rollback can proceed.
|
||||
|
||||
Failure Event: a machine-actionable record that a task or dependency failed in a way that can affect other participants.
|
||||
|
||||
Rollback Set: the set of tasks, effects, or checkpoints that a rollback request identifies as the intended recovery scope.
|
||||
|
||||
Recovery Record: a record of rollback attempt, refusal, partial rollback, success, or failure.
|
||||
|
||||
Coordinator: an optional component that computes or distributes rollback scope across multiple agents.
|
||||
|
||||
Compensation: a follow-up action that mitigates an irreversible effect when direct rollback is not possible.
|
||||
|
||||
## 3. Problem Statement
|
||||
|
||||
Current agent ecosystems have uneven support for failure handling. Some drafts discuss task coordination or operational recovery, but the analyzed landscape still lacks a common method to express:
|
||||
|
||||
- that a task failed in a cross-agent relevant way,
|
||||
- which dependencies are affected,
|
||||
- which checkpoint or rollback boundary should be used, and
|
||||
- whether rollback succeeded, only partially succeeded, or was impossible.
|
||||
|
||||
The absence of these common semantics makes independent implementation difficult. An originating agent may believe it has requested rollback, while a receiving agent may treat the same signal as informational. Similarly, partial rollback can leave downstream agents operating on inconsistent assumptions if outcome reporting is underspecified.
|
||||
|
||||
The design goals for this document are:
|
||||
|
||||
- protocol-agnostic applicability,
|
||||
- minimal mandatory fields for interoperability,
|
||||
- idempotent rollback requests,
|
||||
- explicit reporting of partial or impossible rollback, and
|
||||
- compatibility with existing lower-layer identity and integrity mechanisms.
|
||||
|
||||
## 4. Recovery Model Overview
|
||||
|
||||
This document defines four event types:
|
||||
|
||||
- `checkpoint`
|
||||
- `failure`
|
||||
- `rollback-request`
|
||||
- `rollback-result`
|
||||
|
||||
These events MAY be carried in a message protocol, stored as execution records, or embedded in a larger workflow substrate. This document does not standardize the carrier. It standardizes the meaning of the events and the minimum information needed for interoperable recovery behavior.
|
||||
|
||||
Each event has a common envelope containing:
|
||||
|
||||
- an event identifier,
|
||||
- a task identifier,
|
||||
- a sender identity reference,
|
||||
- a timestamp, and
|
||||
- any relevant workflow or execution context identifier.
|
||||
|
||||
The recovery model assumes that a failure can be local or cross-agent relevant. Local failures that cannot affect any external dependency do not require signaling under this document. When a failure can affect dependent work outside local scope, the originating agent MUST emit a `failure` event.
|
||||
|
||||
If rollback is needed, the requester sends a `rollback-request` identifying the requested scope. The receiver returns a `rollback-result` stating whether the requested recovery succeeded, partially succeeded, was refused, was impossible, or failed.
|
||||
|
||||
## 5. Event Types and Required Fields
|
||||
|
||||
### 5.1 Checkpoint
|
||||
|
||||
A `checkpoint` event identifies a recovery-safe reference that later rollback may target. A checkpoint event MUST include:
|
||||
|
||||
- event identifier,
|
||||
- task identifier,
|
||||
- checkpoint identifier,
|
||||
- sender identity reference,
|
||||
- timestamp.
|
||||
|
||||
A checkpoint event SHOULD include reversibility class and MAY include checkpoint expiry or retention information.
|
||||
|
||||
### 5.2 Failure
|
||||
|
||||
A `failure` event reports a task failure that can affect dependent execution outside local process scope. A failure event MUST include:
|
||||
|
||||
- event identifier,
|
||||
- failed task identifier,
|
||||
- sender identity reference,
|
||||
- timestamp,
|
||||
- failure class,
|
||||
- reversibility indicator.
|
||||
|
||||
A failure event SHOULD include affected dependency identifiers when known, and MAY include severity, blast-radius hint, or checkpoint reference.
|
||||
|
||||
### 5.3 Rollback Request
|
||||
|
||||
A `rollback-request` event asks another participant to revert or compensate previously applied effects. A rollback request MUST include:
|
||||
|
||||
- event identifier,
|
||||
- requester identity reference,
|
||||
- target task identifier or checkpoint identifier,
|
||||
- requested rollback scope,
|
||||
- idempotency token,
|
||||
- timestamp.
|
||||
|
||||
A rollback request SHOULD include reason code and urgency. A rollback request MAY include dependency evidence or policy reference supporting the request.
|
||||
|
||||
### 5.4 Rollback Result
|
||||
|
||||
A `rollback-result` event reports the outcome of processing a rollback request. A rollback result MUST include:
|
||||
|
||||
- event identifier,
|
||||
- referenced rollback-request identifier,
|
||||
- responder identity reference,
|
||||
- outcome code,
|
||||
- timestamp,
|
||||
- actual scope applied.
|
||||
|
||||
The outcome code MUST be one of:
|
||||
|
||||
- `success`
|
||||
- `partial-success`
|
||||
- `refused`
|
||||
- `irreversible`
|
||||
- `failure`
|
||||
|
||||
A rollback result SHOULD include residual risk description when the result is not `success`. A rollback result MAY include compensation details.
|
||||
|
||||
## 6. Task States and Recovery Procedures
|
||||
|
||||
For purposes of this document, relevant task states are:
|
||||
|
||||
- `pending`
|
||||
- `running`
|
||||
- `completed`
|
||||
- `failed`
|
||||
- `rollback-requested`
|
||||
- `rolled-back`
|
||||
- `rollback-failed`
|
||||
- `compensation-required`
|
||||
|
||||
When an agent detects a task failure that can affect external dependents, it MUST transition the affected task to `failed` and emit a `failure` event. If policy permits automatic recovery, the originating agent or coordinator SHOULD determine the rollback set and issue one or more `rollback-request` events. If policy does not permit automatic rollback, the implementation SHOULD enter a local hold or escalation path rather than silently continuing.
|
||||
|
||||
An agent receiving a `rollback-request` MUST process duplicate requests idempotently. If the request can be honored, the agent applies rollback or compensation as appropriate and emits a `rollback-result`. If the request cannot be honored because the effect is irreversible or unauthorized, the agent MUST emit a `rollback-result` with the appropriate outcome code.
|
||||
|
||||
This document distinguishes rollback from cancellation. Cancellation of work not yet started is out of scope except where a local implementation uses cancellation internally to satisfy a rollback request.
|
||||
|
||||
## 7. Rollback Scope and Dependency Handling
|
||||
|
||||
Rollback scope is central to interoperability. A rollback request MUST identify either:
|
||||
|
||||
- a target checkpoint, or
|
||||
- an explicit rollback set.
|
||||
|
||||
When transitive dependencies are known, the requester SHOULD include them or indicate that transitive evaluation is required. When dependency knowledge is incomplete, the requester MUST still identify the minimum known affected scope and the responder MUST report the actual scope applied in the rollback result.
|
||||
|
||||
An implementation MUST NOT report successful rollback for effects outside the applied scope. If only part of the requested rollback set is reversed, the responder MUST return `partial-success` and describe any remaining irreversible or uncompensated effects.
|
||||
|
||||
A coordinator MAY compute rollback scope across multiple agents, but this document does not require a coordinator role. Peers can interoperate directly as long as they provide the required event information.
|
||||
|
||||
## 8. Error Conditions and Partial Rollback
|
||||
|
||||
The following conditions require explicit handling:
|
||||
|
||||
- duplicate rollback requests,
|
||||
- timeout while waiting for rollback completion,
|
||||
- refusal due to insufficient authorization,
|
||||
- irreversible effects,
|
||||
- partial rollback where some effects are reversed and others remain,
|
||||
- failure of the rollback procedure itself.
|
||||
|
||||
If a requested rollback is impossible, the responding agent MUST indicate `irreversible` or `failure` as appropriate and SHOULD indicate whether compensation is available. If a request is refused for policy reasons, the agent MUST indicate `refused` and SHOULD include a reason that is usable by the requester or an external policy authority.
|
||||
|
||||
Implementations SHOULD avoid silent downgrade from rollback to best-effort local cleanup. If only local cleanup occurred, the rollback result SHOULD say so clearly.
|
||||
|
||||
## 9. Security Considerations
|
||||
|
||||
Unauthorized rollback requests can be used to deny service or corrupt coordinated work. Implementations therefore need an authenticated and authorized carriage for the events defined here, even though this document does not define the underlying security protocol.
|
||||
|
||||
Spoofed failure events can trigger unnecessary rollback. Replay of old rollback requests can repeatedly unwind valid work. Implementations SHOULD provide replay resistance and SHOULD bind requests and results to stable task and requester identifiers.
|
||||
|
||||
Partial rollback is itself a security concern because it can leave downstream systems in an inconsistent state that an attacker can exploit. For that reason, responders MUST explicitly report residual scope and any remaining irreversible effects.
|
||||
|
||||
Failure and rollback metadata can also reveal topology, task dependencies, and operational weaknesses. Deployments SHOULD minimize unnecessary disclosure and SHOULD apply least-privilege access to recovery records.
|
||||
|
||||
## 10. Privacy Considerations
|
||||
|
||||
Task identifiers, failure classes, dependency relationships, and reason codes may expose sensitive operational details. In some deployments, these details can reveal user behavior, internal service structure, or policy logic.
|
||||
|
||||
Implementations SHOULD disclose only the information necessary for interoperable recovery. If a deployment requires broader analytics or audit retention, that policy is deployment-specific and outside the scope of this document.
|
||||
|
||||
## 11. IANA Considerations
|
||||
|
||||
This document currently requests no IANA action.
|
||||
|
||||
Future versions may request compact registries for failure classes, rollback outcome codes, or event type identifiers if implementation experience shows that fixed interoperation points are needed.
|
||||
|
||||
## 12. References
|
||||
|
||||
- [RFC2119] Key words for use in RFCs to Indicate Requirement Levels.
|
||||
- [RFC8174] Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words.
|
||||
- Placeholder reference for adjacent execution-evidence substrate, if adopted.
|
||||
- Placeholder reference for `draft-yue-anima-agent-recovery-networks`.
|
||||
- Placeholder reference for `draft-li-dmsc-macp`.
|
||||
- Placeholder reference for `draft-fu-nmop-agent-communication-framework`.
|
||||
@@ -0,0 +1,242 @@
|
||||
# Draft
|
||||
|
||||
## Abstract
|
||||
|
||||
This document defines experimental recovery semantics for multi-agent task execution. It specifies interoperable event semantics for failure signaling, checkpoint reference, rollback requests, and rollback results so that cooperating agents can coordinate recovery after operational faults. The mechanism is carrier-agnostic and is intended to be profiled onto existing agent communication or execution-evidence substrates. It addresses an interoperability gap in current agent systems: different implementations can detect the same failure yet diverge materially in how they request rollback, report applied scope, and disclose partial or irreversible outcomes.
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
Multi-agent systems increasingly perform coordinated work across services, tools, and administrative domains. In such systems, one task failure can invalidate downstream work, require compensating actions, or force a broader rollback of externally visible effects. Existing drafts define communication frameworks, discovery, identity, and broader orchestration concepts, but they do not yet provide a small interoperable recovery core that independent implementations can share.
|
||||
|
||||
Without common recovery behavior, one implementation may silently retry while another expects explicit rollback, and a third may report only local failure without describing downstream consequences. Those differences are not just operationally inconvenient; they create genuine safety and interoperability risk when agents act without immediate human supervision.
|
||||
|
||||
This document therefore defines an abstract recovery protocol model for cross-agent failure handling. It does not define a workflow language, a transport binding, or a human override system. It does define required event meaning, minimum fields, authorization and replay expectations, rollback-scope reporting, and outcome reporting sufficient for interoperable recovery behavior.
|
||||
|
||||
The intended status of this document is Experimental.
|
||||
|
||||
## 2. Terminology
|
||||
|
||||
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals.
|
||||
|
||||
Agent: an autonomous software entity that performs one or more tasks and may exchange recovery events with peers.
|
||||
|
||||
Task: a discrete unit of work whose execution and outcome can be identified.
|
||||
|
||||
Dependency: a relationship in which one task relies on the prior completion, state, or side effects of another task.
|
||||
|
||||
Checkpoint: a recorded recovery-safe reference from which rollback or compensation planning can proceed.
|
||||
|
||||
Failure Event: a machine-actionable record indicating that a task or dependency failed in a way that can affect other participants.
|
||||
|
||||
Rollback Set: the abstract set of task identifiers, checkpoint identifiers, or effect identifiers that a rollback request identifies as in scope.
|
||||
|
||||
Recovery Record: a record of rollback attempt, refusal, partial rollback, success, or failure.
|
||||
|
||||
Compensation: a follow-up action that mitigates an irreversible effect when direct rollback is not possible.
|
||||
|
||||
## 3. Problem Statement
|
||||
|
||||
Current agent ecosystems have uneven support for failure handling. Some drafts discuss task coordination or operational recovery, but the analyzed landscape still lacks a common method to express:
|
||||
|
||||
- that a task failed in a cross-agent relevant way,
|
||||
- which dependencies are affected,
|
||||
- which checkpoint or rollback boundary should be used,
|
||||
- what rollback scope is being requested, and
|
||||
- whether rollback succeeded, only partially succeeded, was refused, or was impossible.
|
||||
|
||||
The absence of these common semantics makes independent implementation difficult. An originating agent may believe it has requested rollback, while a receiving agent may treat the same signal as informational. Similarly, partial rollback can leave downstream agents operating on inconsistent assumptions if outcome reporting is underspecified.
|
||||
|
||||
The design goals for this document are:
|
||||
|
||||
- protocol-agnostic applicability,
|
||||
- minimal mandatory fields for interoperability,
|
||||
- idempotent rollback requests,
|
||||
- explicit authorization and replay handling,
|
||||
- explicit reporting of partial or impossible rollback, and
|
||||
- compatibility with existing lower-layer identity and integrity mechanisms.
|
||||
|
||||
## 4. Recovery Model Overview
|
||||
|
||||
This document defines four event types:
|
||||
|
||||
- `checkpoint`
|
||||
- `failure`
|
||||
- `rollback-request`
|
||||
- `rollback-result`
|
||||
|
||||
These events MAY be carried in a message protocol, stored as execution records, or embedded in a larger workflow substrate. This document does not standardize the carrier. It standardizes the abstract protocol behavior and the minimum information needed for interoperable recovery.
|
||||
|
||||
Each event has a common envelope containing:
|
||||
|
||||
- an event identifier,
|
||||
- a task identifier,
|
||||
- a sender identity reference,
|
||||
- a timestamp, and
|
||||
- any relevant workflow or execution context identifier.
|
||||
|
||||
The recovery model assumes that a failure can be local or cross-agent relevant. Local failures that cannot affect any external dependency do not require signaling under this document. When a failure can affect dependent work outside local scope, the originating agent MUST emit a `failure` event.
|
||||
|
||||
If rollback is needed, the requester sends a `rollback-request` identifying the requested scope. The receiver evaluates authorization, replay status, and local reversibility before acting. The receiver then returns a `rollback-result` stating whether the requested recovery succeeded, partially succeeded, was refused, was impossible, or failed.
|
||||
|
||||
## 5. Event Types and Required Fields
|
||||
|
||||
### 5.1 Checkpoint
|
||||
|
||||
A `checkpoint` event identifies a recovery-safe reference that later rollback may target. A checkpoint event MUST include:
|
||||
|
||||
- event identifier,
|
||||
- task identifier,
|
||||
- checkpoint identifier,
|
||||
- sender identity reference,
|
||||
- timestamp.
|
||||
|
||||
A checkpoint event SHOULD include reversibility class and MAY include checkpoint expiry or retention information.
|
||||
|
||||
### 5.2 Failure
|
||||
|
||||
A `failure` event reports a task failure that can affect dependent execution outside local process scope. A failure event MUST include:
|
||||
|
||||
- event identifier,
|
||||
- failed task identifier,
|
||||
- sender identity reference,
|
||||
- timestamp,
|
||||
- failure class,
|
||||
- reversibility indicator.
|
||||
|
||||
A failure event SHOULD include affected dependency identifiers when known, and MAY include severity, blast-radius hint, or checkpoint reference.
|
||||
|
||||
### 5.3 Rollback Request
|
||||
|
||||
A `rollback-request` event asks another participant to revert or compensate previously applied effects. A rollback request MUST include:
|
||||
|
||||
- event identifier,
|
||||
- requester identity reference,
|
||||
- target task identifier or checkpoint identifier,
|
||||
- requested rollback scope,
|
||||
- idempotency token,
|
||||
- timestamp.
|
||||
|
||||
A rollback request SHOULD include reason code and urgency. A rollback request MAY include dependency evidence or policy reference supporting the request.
|
||||
|
||||
Before applying rollback, a receiver MUST evaluate whether the requester is authorized to request rollback for the identified scope. If authorization fails, the receiver MUST NOT apply rollback and MUST emit a `rollback-result` with outcome `refused`.
|
||||
|
||||
### 5.4 Rollback Result
|
||||
|
||||
A `rollback-result` event reports the outcome of processing a rollback request. A rollback result MUST include:
|
||||
|
||||
- event identifier,
|
||||
- referenced rollback-request identifier,
|
||||
- responder identity reference,
|
||||
- outcome code,
|
||||
- timestamp,
|
||||
- actual scope applied.
|
||||
|
||||
The outcome code MUST be one of:
|
||||
|
||||
- `success`
|
||||
- `partial-success`
|
||||
- `refused`
|
||||
- `irreversible`
|
||||
- `failure`
|
||||
|
||||
If the outcome code is not `success`, the rollback result MUST include enough detail to indicate remaining unapplied scope, residual irreversible effects, or refusal reason. A rollback result MAY include compensation details.
|
||||
|
||||
## 6. Task States and Recovery Procedures
|
||||
|
||||
For purposes of this document, relevant task states are:
|
||||
|
||||
- `pending`
|
||||
- `running`
|
||||
- `completed`
|
||||
- `failed`
|
||||
- `rollback-requested`
|
||||
- `rolled-back`
|
||||
- `rollback-failed`
|
||||
- `compensation-required`
|
||||
|
||||
When an agent detects a task failure that can affect external dependents, it MUST transition the affected task to `failed` and emit a `failure` event. If policy permits automatic recovery, the originating agent SHOULD determine the rollback set and issue one or more `rollback-request` events. If policy does not permit automatic rollback, the implementation SHOULD enter a local hold or escalation path rather than silently continuing.
|
||||
|
||||
An agent receiving a `rollback-request` MUST process duplicate requests idempotently. To do so, the receiver MUST correlate the request identifier and idempotency token and MUST reject or safely ignore stale replayed requests according to local replay policy. A request that is recognized as stale replay MUST NOT cause a second rollback action.
|
||||
|
||||
If the request is authorized and can be honored, the agent applies rollback or compensation as appropriate and emits a `rollback-result`. If the request cannot be honored because the effect is irreversible, unauthorized, or operationally failed, the agent MUST emit a `rollback-result` with the appropriate outcome code.
|
||||
|
||||
This document distinguishes rollback from cancellation. Cancellation of work not yet started is out of scope except where a local implementation uses cancellation internally while fulfilling a rollback request.
|
||||
|
||||
### 6.1 State Transition Guidance
|
||||
|
||||
| Current State | Trigger | Next State | Required Output |
|
||||
|---|---|---|---|
|
||||
| `running` | cross-agent relevant failure detected | `failed` | `failure` |
|
||||
| `completed` | authorized rollback requested | `rollback-requested` | none immediately |
|
||||
| `rollback-requested` | rollback fully applied | `rolled-back` | `rollback-result(success)` |
|
||||
| `rollback-requested` | rollback partially applied | `compensation-required` | `rollback-result(partial-success)` |
|
||||
| `rollback-requested` | rollback impossible | `rollback-failed` or `compensation-required` | `rollback-result(irreversible)` |
|
||||
| `rollback-requested` | processing failure | `rollback-failed` | `rollback-result(failure)` |
|
||||
|
||||
This table is intentionally minimal. Local implementations MAY track finer-grained states, but interoperable outputs MUST remain consistent with the transitions above.
|
||||
|
||||
## 7. Rollback Scope and Dependency Handling
|
||||
|
||||
Rollback scope is central to interoperability. A rollback request MUST identify either:
|
||||
|
||||
- a target checkpoint, or
|
||||
- an explicit rollback set.
|
||||
|
||||
At minimum, a rollback set MUST identify one or more affected task identifiers, checkpoint identifiers, or effect identifiers. When transitive dependencies are known, the requester SHOULD indicate whether the scope includes only direct dependencies or includes transitive dependencies as well.
|
||||
|
||||
When dependency knowledge is incomplete, the requester MUST still identify the minimum known affected scope and the responder MUST report the actual scope applied in the rollback result. A responder MUST NOT report successful rollback for effects outside the applied scope.
|
||||
|
||||
If only part of the requested rollback set is reversed, the responder MUST return `partial-success` and MUST describe any remaining irreversible or uncompensated effects.
|
||||
|
||||
## 8. Error Conditions and Partial Rollback
|
||||
|
||||
The following conditions require explicit handling:
|
||||
|
||||
- duplicate rollback requests,
|
||||
- stale replay of prior rollback requests,
|
||||
- timeout while waiting for rollback completion,
|
||||
- refusal due to insufficient authorization,
|
||||
- irreversible effects,
|
||||
- partial rollback where some effects are reversed and others remain,
|
||||
- failure of the rollback procedure itself.
|
||||
|
||||
If a requested rollback is impossible, the responding agent MUST indicate `irreversible` or `failure` as appropriate and SHOULD indicate whether compensation is available. If a request times out after some scope has been applied, the responder SHOULD return `partial-success` rather than silently collapsing to generic failure.
|
||||
|
||||
Implementations SHOULD avoid silent downgrade from rollback to best-effort local cleanup. If only local cleanup occurred, the rollback result SHOULD say so clearly.
|
||||
|
||||
### 8.1 Non-Normative Example Flow
|
||||
|
||||
Agent A executes task `t-17`, which depends on Agent B having applied task `t-12`. Agent B later detects that `t-12` wrote invalid external state and emits `failure(failed-task=t-12, affected-dependency=t-17)`. Agent A determines that rollback is required for `t-17` and sends `rollback-request(request-id=r-8, target-task=t-17, scope={t-17, ckpt-17-precommit}, idempotency-token=abc123)`.
|
||||
|
||||
Agent A's peer evaluates requester authorization and replay status, applies rollback to `t-17`, but cannot reverse one externally visible notification. It therefore emits `rollback-result(ref=r-8, outcome=partial-success, actual-scope={t-17, ckpt-17-precommit}, residual=notification already delivered)`. A downstream relying party can now distinguish partial rollback from full recovery and act accordingly.
|
||||
|
||||
## 9. Security Considerations
|
||||
|
||||
Unauthorized rollback requests can be used to deny service or corrupt coordinated work. Implementations therefore need authenticated carriage and explicit authorization checks for the events defined here, even though this document does not define the underlying security protocol.
|
||||
|
||||
Spoofed failure events can trigger unnecessary rollback. Replay of old rollback requests can repeatedly unwind valid work. Implementations MUST prevent replayed requests from causing repeated rollback actions and SHOULD bind requests and results to stable task and requester identifiers.
|
||||
|
||||
Partial rollback is itself a security concern because it can leave downstream systems in an inconsistent state that an attacker can exploit. For that reason, responders MUST explicitly report residual scope and any remaining irreversible effects.
|
||||
|
||||
Failure and rollback metadata can also reveal topology, task dependencies, and operational weaknesses. Deployments SHOULD minimize unnecessary disclosure and SHOULD apply least-privilege access to recovery records.
|
||||
|
||||
## 10. Privacy Considerations
|
||||
|
||||
Task identifiers, failure classes, dependency relationships, and reason codes may expose sensitive operational details. In some deployments, these details can reveal user behavior, internal service structure, or policy logic.
|
||||
|
||||
Implementations SHOULD disclose only the information necessary for interoperable recovery. If a deployment requires broader analytics or audit retention, that policy is deployment-specific and outside the scope of this document.
|
||||
|
||||
## 11. IANA Considerations
|
||||
|
||||
This document currently requests no IANA action.
|
||||
|
||||
Future versions may request compact registries for failure classes, rollback outcome codes, or event type identifiers if implementation experience shows that fixed interoperation points are needed.
|
||||
|
||||
## 12. References
|
||||
|
||||
- [RFC2119] Key words for use in RFCs to Indicate Requirement Levels.
|
||||
- [RFC8174] Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words.
|
||||
- Placeholder reference for adjacent execution-evidence substrate, if adopted.
|
||||
- Placeholder reference for `draft-yue-anima-agent-recovery-networks`.
|
||||
- Placeholder reference for `draft-li-dmsc-macp`.
|
||||
- Placeholder reference for `draft-fu-nmop-agent-communication-framework`.
|
||||
@@ -0,0 +1,24 @@
|
||||
# Architecture Review
|
||||
|
||||
## Findings
|
||||
|
||||
### Medium: the draft is mostly well scoped, but it wavers between abstract event semantics and protocol behavior
|
||||
|
||||
The document says it is carrier-agnostic and not a transport binding, which is correct. However, several MUST-level statements already imply protocol behavior. That is acceptable, but the architecture should acknowledge that the document defines an abstract protocol model, not only vocabulary.
|
||||
|
||||
### Medium: coordinator role is introduced but not integrated into the model
|
||||
|
||||
The coordinator is defined as optional, yet no section explains how peers distinguish coordinator-computed scope from sender-local scope. That leaves a conceptual hole in the actor model.
|
||||
|
||||
### Medium: cancellation is declared out of scope, but the boundary with rollback is not fully clean
|
||||
|
||||
The text says cancellation of work not yet started is out of scope, except when used internally to satisfy rollback. That line is defensible, but it should be expressed more rigorously to prevent readers from assuming cancellation semantics are standardized here.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Should the draft describe itself as an abstract recovery protocol profile rather than only "semantics"?
|
||||
- Does the optional coordinator need one or two normative constraints, or should it be deferred entirely?
|
||||
|
||||
## Residual risk
|
||||
|
||||
Scope discipline is good overall. The main remaining architectural risk is ambiguity about whether this document is merely descriptive or actually defines interoperable protocol behavior. It should explicitly choose the latter in a carefully bounded way.
|
||||
@@ -0,0 +1,28 @@
|
||||
# IETF Senior Review
|
||||
|
||||
## Findings
|
||||
|
||||
### High: the draft still reads more like a design sketch than a publishable Internet-Draft
|
||||
|
||||
The overall structure is right, but several sections stop at high-level intent. A publishable draft needs more disciplined distinction between required behavior, optional behavior, and explanatory rationale. Sections 5 through 8 are closest to publishable, but they still need slightly more rigor.
|
||||
|
||||
### Medium: the abstract is acceptable but could better state the interoperability problem and deployment value
|
||||
|
||||
The current abstract says what the document defines, but it could more directly explain why existing agent systems fail to interoperate during recovery and why this document matters.
|
||||
|
||||
### Medium: References and IANA sections are too provisional
|
||||
|
||||
It is fine to keep placeholders at this stage, but the text currently signals that core dependencies are undecided. Before wider circulation, the draft should either name the expected adjacent substrate or state clearly that no substrate dependency is required.
|
||||
|
||||
### Medium: terminology is mostly clean, but some items still need RFC-style definition form
|
||||
|
||||
The terms are understandable, yet a few are written more like explanations than stable definitions. Tightening the definition style would help the document feel more standards-native.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Does the draft intend to progress as a standalone individual draft or as part of a family with a shared terminology base?
|
||||
- Should the document explicitly call itself Experimental in the introduction rather than only in external cycle metadata?
|
||||
|
||||
## Residual publishability risk
|
||||
|
||||
This is a credible start. The remaining publishability risk is not the idea; it is the need for one more iteration of standards-style precision and dependency cleanup.
|
||||
@@ -0,0 +1,28 @@
|
||||
# Security Review
|
||||
|
||||
## Findings
|
||||
|
||||
### High: rollback authorization is left entirely to the lower layer without a required authorization decision point
|
||||
|
||||
The draft says recovery events need authenticated and authorized carriage, but it never states when a receiver is required to evaluate authorization before acting on a `rollback-request`. Two compliant implementations could therefore both authenticate the requester yet differ on whether task-level rollback authority is required. The draft should require an explicit authorization check before any irreversible rollback action is attempted.
|
||||
|
||||
### High: replay protection is mentioned but underspecified for interoperable use
|
||||
|
||||
The draft says implementations SHOULD provide replay resistance, but `rollback-request` already defines an idempotency token and stable identifiers. That is enough structure to make stronger requirements possible. Without a minimum replay-handling rule, an attacker can reuse stale rollback requests in a way that different implementations will treat inconsistently.
|
||||
|
||||
### Medium: failure-event spoofing risk is identified, but the draft does not require correlation between failure and rollback flows
|
||||
|
||||
An attacker who can inject a plausible `failure` event may induce unnecessary rollback decisions. The draft should at least require that a `rollback-request` reference a specific task or failure context and that receivers preserve the linkage in the `rollback-result`.
|
||||
|
||||
### Medium: partial rollback can leave exploitable inconsistent state, but no minimum disclosure is mandated
|
||||
|
||||
The draft correctly notes the risk, yet "residual risk description" is only a SHOULD. For partial-success and irreversible outcomes, a stronger requirement is warranted so downstream agents can react safely.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Should authorization be expressed as a generic requirement only, or should the document define a task-scope authorization concept for rollback actions?
|
||||
- Should replay resistance be a MUST for all deployments, or only when rollback has externally visible effects?
|
||||
|
||||
## Residual risk
|
||||
|
||||
Even with the fixes above, the draft will still depend heavily on lower-layer identity and authorization systems. That is acceptable, but the security section should say so more concretely and bind protocol behavior to those assumptions.
|
||||
@@ -0,0 +1,28 @@
|
||||
# Software Review
|
||||
|
||||
## Findings
|
||||
|
||||
### High: required fields are defined, but no concrete message shape or example flow is provided
|
||||
|
||||
The event model is understandable, but two implementers could still serialize or correlate it differently. A non-normative example showing `failure -> rollback-request -> rollback-result` with task identifiers, dependency references, and partial-success handling would materially reduce ambiguity.
|
||||
|
||||
### High: task state transitions are incomplete at the procedure level
|
||||
|
||||
The draft lists states but does not specify enough transition rules. For example, can a task move from `completed` directly to `rollback-requested`? Can `compensation-required` be terminal? Can `rollback-failed` later transition to `rolled-back` after manual intervention? Without a transition table or explicit rules, interoperability tests will be hard to design.
|
||||
|
||||
### Medium: rollback scope remains too abstract for independent implementations
|
||||
|
||||
The draft requires a target checkpoint or explicit rollback set, but it does not describe the structure of a rollback set or how direct and transitive dependencies are represented. The draft needs at least a minimal abstract shape for scope membership.
|
||||
|
||||
### Medium: timeout behavior is named but not operationalized
|
||||
|
||||
Timeout is listed as an error condition, but no rule says whether timeout yields `failure`, `partial-success`, or local retry. This will fragment behavior.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Is a compact transition table sufficient, or does the draft need a separate state machine subsection?
|
||||
- Should rollback set representation be a list of task identifiers, checkpoint identifiers, or both?
|
||||
|
||||
## Residual risk
|
||||
|
||||
The current draft is close to implementable, but it still needs one more layer of precision around flow shape and state progression before two vendors would likely build compatible behavior.
|
||||
@@ -0,0 +1,7 @@
|
||||
# Architecture Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# IETF Senior Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual publishability risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# Security Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# Software Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,26 @@
|
||||
# Review Synthesis
|
||||
|
||||
## Blocking findings
|
||||
|
||||
- Add an explicit authorization-decision requirement before acting on rollback requests. The security review correctly identifies this as the biggest missing control.
|
||||
- Tighten replay handling by linking idempotency, request identity, and stale-request rejection into one interoperable rule.
|
||||
- Add one concrete non-normative flow example and a compact transition table. The software review is right that the draft is still too abstract for two independent implementations.
|
||||
|
||||
## Major findings
|
||||
|
||||
- Clarify whether the document is an abstract protocol model or only event vocabulary. The architecture review recommends choosing the former in a bounded way.
|
||||
- Specify minimum disclosure rules for partial-success, irreversible, and refused outcomes so downstream agents can react safely.
|
||||
- Clarify rollback-scope representation at the abstract level: what a rollback set minimally contains and how direct versus transitive scope is reported.
|
||||
- Improve the abstract and introduction to frame the interoperability problem more directly.
|
||||
|
||||
## Minor findings
|
||||
|
||||
- Tighten terminology definitions into more RFC-like form.
|
||||
- Clarify the coordinator role or remove it if not needed in this revision.
|
||||
- Clarify the cancellation boundary.
|
||||
- Reduce placeholder feel in References and dependency text.
|
||||
|
||||
## Conflicts resolved
|
||||
|
||||
- No meaningful reviewer conflict exists on scope. All reviewers favor keeping the document narrow.
|
||||
- The only tension is between remaining carrier-agnostic and becoming implementable. Resolution: keep the model carrier-agnostic, but add one non-normative example and stronger abstract structure rather than binding to a specific substrate in v1.
|
||||
@@ -0,0 +1,9 @@
|
||||
# Review Synthesis
|
||||
|
||||
## Blocking findings
|
||||
|
||||
## Major findings
|
||||
|
||||
## Minor findings
|
||||
|
||||
## Conflicts resolved
|
||||
@@ -0,0 +1,28 @@
|
||||
# Revision Plan
|
||||
|
||||
## Blocking changes
|
||||
|
||||
- Add a normative requirement that receivers evaluate authorization before honoring a rollback request.
|
||||
- Add a normative replay-handling rule tying request identity, idempotency token, and stale-request rejection together.
|
||||
- Add a compact state-transition table covering normal failure, rollback request, partial success, irreversible outcome, and compensation-required cases.
|
||||
- Add one non-normative end-to-end example flow with concrete identifiers and a partial-success outcome.
|
||||
|
||||
## High-value improvements
|
||||
|
||||
- Clarify rollback-set structure and how transitive scope is represented or reported.
|
||||
- Strengthen `rollback-result` requirements for partial-success, refused, and irreversible outcomes.
|
||||
- Tighten the abstract, introduction, and terminology wording to sound more like an actual I-D.
|
||||
- Either define the coordinator role more clearly or remove it from this version.
|
||||
|
||||
## Deferred items
|
||||
|
||||
- Binding to a specific execution-evidence substrate
|
||||
- Human override or operator approval flow
|
||||
- Registries for failure classes and rollback outcomes unless implementation feedback requires them
|
||||
|
||||
## Draft order for next iteration
|
||||
|
||||
1. Revise abstract and terminology.
|
||||
2. Revise Sections 5 through 8 for authorization, replay, scope shape, and state transitions.
|
||||
3. Add non-normative example flow.
|
||||
4. Revisit Security, Privacy, IANA, and References after the protocol text settles.
|
||||
@@ -0,0 +1,9 @@
|
||||
# Revision Plan
|
||||
|
||||
## Blocking changes
|
||||
|
||||
## High-value improvements
|
||||
|
||||
## Deferred items
|
||||
|
||||
## Draft order for next iteration
|
||||
@@ -0,0 +1,69 @@
|
||||
# User Spec
|
||||
|
||||
## Topic
|
||||
|
||||
Dynamic Trust and Reputation for Multi-Agent Systems
|
||||
|
||||
## Goal
|
||||
|
||||
Produce a credible IETF-style Internet-Draft for an interoperable way to represent and exchange runtime trust signals about agents, so systems can adapt authorization, routing, or collaboration decisions as agent behavior changes over time.
|
||||
|
||||
## Intended status
|
||||
|
||||
Experimental.
|
||||
|
||||
Rationale: the need is clear, but trust scoring models are easy to overclaim and likely need deployment experience before standards-track treatment.
|
||||
|
||||
## Problem to solve
|
||||
|
||||
The analyzer identifies Dynamic Trust and Reputation as a high-severity gap. Current work is dominated by static identity and certificate-style authentication, but long-running agent ecosystems need a way to incorporate runtime behavior, observed failures, successful execution history, and policy violations into ongoing trust decisions.
|
||||
|
||||
The draft should address:
|
||||
|
||||
- how trust-relevant events are represented
|
||||
- how trust assertions or trust updates are shared
|
||||
- how recipients understand freshness, confidence, and scope
|
||||
- how dynamic trust interacts with but does not replace identity and authorization
|
||||
|
||||
## What must be true in the final draft
|
||||
|
||||
- The draft distinguishes identity, attestation, authorization, and trust; it must not collapse them into one concept.
|
||||
- Dynamic trust is presented as supplemental runtime evidence, not magic security.
|
||||
- The mechanism is narrow enough to be interoperable and testable.
|
||||
- Security and privacy analysis address manipulation, collusion, replay, reputational poisoning, and unwanted disclosure.
|
||||
- The document remains grounded in observable events and explicit confidence, not vague AI safety rhetoric.
|
||||
|
||||
## Constraints
|
||||
|
||||
- scope constraints
|
||||
Do not try to standardize a universal reputation economy or global scoring service. Focus on exchangeable trust signals and their interpretation boundaries.
|
||||
- compatibility constraints
|
||||
Reuse adjacent identity, attestation, and execution-evidence work when possible. Do not redefine base authentication or token exchange.
|
||||
- terminology constraints
|
||||
Separate trust event, trust assertion, confidence, freshness, subject, issuer, and scope. Avoid anthropomorphic language.
|
||||
|
||||
## Source materials to prioritize
|
||||
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/gaps.md`
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/holistic-agent-ecosystem-draft-outlines.md`
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/ideas.md`
|
||||
- `/home/c/projects/ietf-draft-analyzer/data/reports/overview.md`
|
||||
- `draft-cosmos-protocol-specification`
|
||||
- `draft-jiang-seat-dynamic-attestation`
|
||||
- `draft-aylward-daap-v2`
|
||||
- `draft-birkholz-verifiable-agent-conversations`
|
||||
- relevant WIMSE, RATS, or attestation-adjacent drafts when they help prevent reinvention
|
||||
|
||||
## Success criteria
|
||||
|
||||
- A reader can tell what trust-relevant event data must be present and how it is scoped.
|
||||
- A reader can tell how trust assertions expire, how confidence is expressed, and how misuse is limited.
|
||||
- Reviewers can challenge the design on substance rather than on fuzzy terminology or missing threat analysis.
|
||||
- The draft makes clear what decisions dynamic trust can inform and what it must not be trusted to do alone.
|
||||
|
||||
## Questions for the team
|
||||
|
||||
- What is the minimum interoperable trust event model?
|
||||
- Should trust updates be absolute assertions, delta adjustments, or both?
|
||||
- How should confidence, issuer scope, and freshness be represented?
|
||||
- What privacy risks arise when sharing negative trust events across domains?
|
||||
@@ -0,0 +1,27 @@
|
||||
# Cycle Status
|
||||
|
||||
## Summary
|
||||
|
||||
- cycle: dynamic-trust-and-reputation
|
||||
- version: v1
|
||||
- last updated: 2026-03-02 18:00 UTC
|
||||
|
||||
## Artifact Status
|
||||
|
||||
- `00-user-spec.md`: written
|
||||
- `10-research-brief.md`: written
|
||||
- `20-architecture-brief.md`: written
|
||||
- `30-outline.md`: written
|
||||
- `40-draft-v1.md`: written
|
||||
- `50-reviews-v1/security.md`: written
|
||||
- `50-reviews-v1/software.md`: written
|
||||
- `50-reviews-v1/architecture.md`: written
|
||||
- `50-reviews-v1/ietf-senior.md`: written
|
||||
- `55-review-synthesis-v1.md`: written
|
||||
- `60-revision-plan-v1.md`: written
|
||||
|
||||
## Notes
|
||||
|
||||
- written means the artifact contains substantive content.
|
||||
- stub means the file exists but still appears to be a placeholder.
|
||||
- missing means the expected file has not been created.
|
||||
@@ -0,0 +1,27 @@
|
||||
# Cycle Status
|
||||
|
||||
## Summary
|
||||
|
||||
- cycle: dynamic-trust-and-reputation
|
||||
- version: v2
|
||||
- last updated: 2026-03-02 18:06 UTC
|
||||
|
||||
## Artifact Status
|
||||
|
||||
- `00-user-spec.md`: written
|
||||
- `10-research-brief.md`: written
|
||||
- `20-architecture-brief.md`: written
|
||||
- `30-outline.md`: written
|
||||
- `40-draft-v2.md`: written
|
||||
- `50-reviews-v2/security.md`: stub
|
||||
- `50-reviews-v2/software.md`: stub
|
||||
- `50-reviews-v2/architecture.md`: stub
|
||||
- `50-reviews-v2/ietf-senior.md`: stub
|
||||
- `55-review-synthesis-v2.md`: stub
|
||||
- `60-revision-plan-v2.md`: stub
|
||||
|
||||
## Notes
|
||||
|
||||
- written means the artifact contains substantive content.
|
||||
- stub means the file exists but still appears to be a placeholder.
|
||||
- missing means the expected file has not been created.
|
||||
@@ -0,0 +1,60 @@
|
||||
# Research Brief
|
||||
|
||||
## Problem framing
|
||||
|
||||
Fact: the analyzer marks Dynamic Trust and Reputation as a high-severity gap in the agent identity and authorization space. Fact: the stated problem is that static authentication is not enough for long-running autonomous systems, because past behavior, policy violations, successful task history, and environmental changes can alter whether another agent should be trusted.
|
||||
|
||||
Inference: this topic is worth pursuing, but it is easy to overreach. The most defensible first draft is not a universal reputation system. It is a narrow mechanism for representing and exchanging trust-relevant runtime assertions with freshness, confidence, issuer scope, and revocation semantics.
|
||||
|
||||
## Evidence from existing drafts
|
||||
|
||||
Fact: the gap report identifies only five partially related ideas across the full corpus. The clearest named mechanism is `Trust Scoring` from `draft-cosmos-protocol-specification`. Other partial signals include `Trust Score-based Policy Enforcement`, `Cryptographic Proof-Based Autonomy`, and dynamic attestation work.
|
||||
|
||||
Fact: the gap report points to several related drafts: `draft-jiang-seat-dynamic-attestation`, `draft-cosmos-protocol-specification`, `draft-diaconu-agents-authz-info-sharing`, `draft-agent-gw`, and `draft-li-dmsc-inf-architecture`. These appear to provide fragments such as attestation, trust-native semantics, or information sharing, but not a generally reusable dynamic trust exchange core.
|
||||
|
||||
Fact: the broader overview shows stronger maturity in adjacent accountability and attestation drafts such as `draft-aylward-daap-v2`, `draft-guy-bary-stamp-protocol`, and `draft-birkholz-verifiable-agent-conversations`. Those are important not because they solve dynamic trust directly, but because they provide candidate evidence sources from which trust events might be derived.
|
||||
|
||||
Fact: the holistic ecosystem outline places dynamic trust alongside assurance, cross-domain security, and provenance rather than as a standalone identity replacement. That is a strong scope signal.
|
||||
|
||||
## Overlap and adjacent work
|
||||
|
||||
Inference: the main collision risks are:
|
||||
|
||||
- collapsing trust into identity or attestation
|
||||
- drifting into full behavior verification and assurance profiles
|
||||
- defining global reputation semantics that are impossible to standardize early
|
||||
|
||||
Inference: the architect should treat dynamic trust as a supplemental decision input. A trust assertion should help receivers adjust risk posture, routing, delegation, or policy thresholds, but should not replace authentication, authorization, or local policy.
|
||||
|
||||
There is also a likely layering opportunity: trust events may be derived from signed execution evidence, attestation results, policy compliance checks, or observed protocol outcomes. That suggests the first draft should define a trust event model and trust assertion envelope rather than inventing a new base proof system.
|
||||
|
||||
## Gaps and unresolved questions
|
||||
|
||||
Fact: the available analyzer artifacts do not yet show a shared vocabulary for freshness, confidence, negative trust evidence, or revocation of prior trust assertions. Fact: the ideas corpus surfaced less direct material than expected, which suggests the field is genuinely underdefined rather than merely fragmented.
|
||||
|
||||
Open questions:
|
||||
|
||||
- What is the minimum trust event payload: subject, issuer, event type, score or delta, confidence, freshness, scope, and evidence reference are likely candidates, but this needs careful architectural pruning.
|
||||
- Should trust be represented as absolute score, bounded level, delta adjustment, or a combination?
|
||||
- How should a receiver distinguish local opinion from portable inter-domain assertion?
|
||||
- How should negative trust events be shared without creating privacy, defamation, or poisoning problems?
|
||||
- What revocation or expiry mechanism is needed so stale trust does not silently persist?
|
||||
|
||||
## Additional data worth investigating
|
||||
|
||||
- Inspect `draft-cosmos-protocol-specification` directly for the semantics of trust scoring and whether any parts are salvageable without importing the whole model.
|
||||
- Inspect `draft-jiang-seat-dynamic-attestation` for how runtime attestation changes over time and whether it offers reusable freshness or confidence patterns.
|
||||
- Compare `draft-aylward-daap-v2` and `draft-birkholz-verifiable-agent-conversations` for event formats that could serve as evidence references.
|
||||
- Search more deeply for `confidence`, `reputation`, `revocation`, `behavioral`, `policy violation`, and `provenance` in raw draft text if the architect needs a stronger evidence base.
|
||||
|
||||
## Recommendation to the architect
|
||||
|
||||
Design the first draft as an experimental representation for dynamic trust assertions and trust events, not as a global scoring system. Keep the document centered on:
|
||||
|
||||
- trust event vocabulary
|
||||
- trust assertion envelope and required fields
|
||||
- issuer, subject, scope, freshness, and confidence semantics
|
||||
- revocation or expiry behavior
|
||||
- security and privacy limits on exchanging negative or cross-domain trust information
|
||||
|
||||
Avoid redefining identity, token exchange, attestation, or full behavior verification. If evidence references from adjacent drafts can be reused, bind to them rather than creating a new proof substrate.
|
||||
@@ -0,0 +1,113 @@
|
||||
# Architecture Brief
|
||||
|
||||
## Scope
|
||||
|
||||
Define an experimental, interoperable representation for dynamic trust information exchanged between agents or agent-adjacent services. The draft should standardize:
|
||||
|
||||
- trust event vocabulary
|
||||
- trust assertion envelope
|
||||
- issuer, subject, scope, freshness, and confidence semantics
|
||||
- expiry or revocation behavior
|
||||
- minimal rules for how receivers interpret portable trust information
|
||||
|
||||
The document should remain supplemental to identity, attestation, and authorization systems.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- creating a global reputation network or universal score
|
||||
- replacing authentication, attestation, or authorization
|
||||
- standardizing all behavior-verification evidence formats
|
||||
- requiring a single scoring algorithm
|
||||
- defining economic incentives, penalties, or marketplace reputation
|
||||
|
||||
## Terminology and actors
|
||||
|
||||
- `trust event`: an observed runtime occurrence relevant to trust assessment
|
||||
- `trust assertion`: a structured statement by an issuer about a subject's trust-relevant state
|
||||
- `issuer`: the party making the assertion
|
||||
- `subject`: the agent or service described by the assertion
|
||||
- `confidence`: how strongly the issuer stands behind the assertion
|
||||
- `freshness`: how current the assertion is and how long it remains usable
|
||||
- `scope`: the context in which the assertion is intended to apply
|
||||
- `evidence reference`: pointer to supporting execution, attestation, or compliance evidence
|
||||
- `revocation`: withdrawal or supersession of a prior trust assertion
|
||||
|
||||
Actors:
|
||||
|
||||
- observing agent or service
|
||||
- trust assertion issuer
|
||||
- relying party that consumes trust information
|
||||
- optional policy authority governing how trust affects decisions
|
||||
|
||||
## Protocol or data model shape
|
||||
|
||||
Use two related objects:
|
||||
|
||||
1. a trust event record
|
||||
2. a trust assertion
|
||||
|
||||
Trust event record minimum fields:
|
||||
|
||||
- event identifier
|
||||
- subject identifier
|
||||
- issuer or observer identifier
|
||||
- event type
|
||||
- timestamp
|
||||
- scope
|
||||
|
||||
Trust assertion minimum fields:
|
||||
|
||||
- assertion identifier
|
||||
- subject identifier
|
||||
- issuer identifier
|
||||
- trust statement value
|
||||
- confidence value
|
||||
- freshness or expiry information
|
||||
- scope
|
||||
|
||||
Optional fields:
|
||||
|
||||
- evidence reference
|
||||
- delta-from-prior assertion
|
||||
- revokes or supersedes assertion identifier
|
||||
- explanation code
|
||||
|
||||
Design choice: do not require one numeric scoring model. Allow bounded levels, numeric values, or deltas as long as the representation states which model is being used and how confidence and expiry apply.
|
||||
|
||||
## Normative requirements candidates
|
||||
|
||||
- A trust assertion MUST identify both issuer and subject.
|
||||
- A trust assertion MUST indicate scope and freshness.
|
||||
- A trust assertion MUST NOT be treated as a substitute for authentication or authorization.
|
||||
- If a trust assertion supersedes or revokes a prior assertion, it MUST identify the prior assertion.
|
||||
- Receivers MUST be able to distinguish portable trust assertions from local-only trust state.
|
||||
- Trust assertions SHOULD include evidence references when the underlying evidence is available and shareable.
|
||||
- Implementations SHOULD define local policy for how negative assertions are consumed; this document should not hardcode one response.
|
||||
- Issuers MUST NOT present stale assertions as current.
|
||||
|
||||
## Security, privacy, and abuse considerations
|
||||
|
||||
- false negative or false positive trust assertions can manipulate routing or authorization decisions
|
||||
- colluding issuers could amplify reputational poisoning
|
||||
- replayed stale assertions can preserve obsolete trust
|
||||
- over-shared negative trust information can leak sensitive incident details
|
||||
- portable trust data may be misread as global truth rather than scoped issuer opinion
|
||||
|
||||
The draft should strongly emphasize that trust assertions are context-bound statements requiring authenticated origin, explicit freshness, and local policy interpretation.
|
||||
|
||||
## IANA impact
|
||||
|
||||
Potentially small registries only if needed by implementation experience:
|
||||
|
||||
- trust event types
|
||||
- trust assertion statement models
|
||||
- explanation codes
|
||||
|
||||
Avoid large registries or score semantics that imply false precision.
|
||||
|
||||
## Open design questions
|
||||
|
||||
- Should the primary trust statement model be level-based, numeric, delta-based, or model-agnostic?
|
||||
- How much explanation should be mandatory when sharing negative trust?
|
||||
- How should a receiver compare assertions from different issuers with different confidence models?
|
||||
- Should revocation be a first-class assertion type or simply a superseding assertion?
|
||||
@@ -0,0 +1,79 @@
|
||||
# Draft Outline
|
||||
|
||||
## Abstract
|
||||
|
||||
State that the document defines experimental semantics for exchanging dynamic trust assertions and trust-relevant runtime events in multi-agent systems. Make clear that the mechanism supplements, but does not replace, identity, attestation, and authorization.
|
||||
|
||||
## Section plan
|
||||
|
||||
1. Introduction
|
||||
2. Terminology
|
||||
3. Problem Statement and Design Goals
|
||||
4. Trust Model Overview
|
||||
5. Trust Events
|
||||
6. Trust Assertions
|
||||
7. Freshness, Confidence, and Revocation
|
||||
8. Receiver Processing and Policy Boundaries
|
||||
9. Security Considerations
|
||||
10. Privacy Considerations
|
||||
11. IANA Considerations
|
||||
12. References
|
||||
|
||||
## Author guidance by section
|
||||
|
||||
### 1. Introduction
|
||||
|
||||
Anchor the problem in long-running agent interactions where static identity is insufficient. Avoid implying that trust scores solve security by themselves.
|
||||
|
||||
### 2. Terminology
|
||||
|
||||
Define trust event, trust assertion, issuer, subject, confidence, freshness, scope, evidence reference, and revocation. Be disciplined about these distinctions.
|
||||
|
||||
### 3. Problem Statement and Design Goals
|
||||
|
||||
Explain the gap between static authentication and runtime trust decisions. State that the document aims to standardize representation and exchange, not one universal scoring algorithm.
|
||||
|
||||
### 4. Trust Model Overview
|
||||
|
||||
Show the layering clearly: identity and attestation remain below; trust assertions sit above them as supplemental runtime signals interpreted by local policy.
|
||||
|
||||
### 5. Trust Events
|
||||
|
||||
Define the observable events that can feed trust changes. Avoid overloading this section with algorithmic scoring guidance.
|
||||
|
||||
### 6. Trust Assertions
|
||||
|
||||
Define the required fields of a portable trust assertion and how issuer, subject, scope, confidence, and statement value are represented.
|
||||
|
||||
### 7. Freshness, Confidence, and Revocation
|
||||
|
||||
This is the core interoperability section. Be precise about expiry, supersession, stale data, and the difference between confidence and trust value.
|
||||
|
||||
### 8. Receiver Processing and Policy Boundaries
|
||||
|
||||
Explain what a receiver may infer and what remains local policy. This section must prevent readers from treating portable trust as universal authorization.
|
||||
|
||||
### 9. Security Considerations
|
||||
|
||||
Address poisoning, collusion, replay, spoofing, and misuse of trust assertions in access-control flows.
|
||||
|
||||
### 10. Privacy Considerations
|
||||
|
||||
Address cross-domain disclosure of incidents, behavior, and negative assertions.
|
||||
|
||||
### 11. IANA Considerations
|
||||
|
||||
Either no action or minimal registries for event types and assertion models.
|
||||
|
||||
### 12. References
|
||||
|
||||
Keep placeholders if needed, but cite adjacent attestation, accountability, and evidence-bearing drafts that influenced the layering.
|
||||
|
||||
## Issues that must not be hand-waved
|
||||
|
||||
- whether trust assertions are scoped issuer opinions or universal facts
|
||||
- how freshness and expiry are represented
|
||||
- how revocation or supersession works
|
||||
- how confidence differs from trust value
|
||||
- what evidence reference means and when it is optional
|
||||
- how receivers avoid using trust as a drop-in replacement for authorization
|
||||
@@ -0,0 +1,172 @@
|
||||
# Draft
|
||||
|
||||
## Abstract
|
||||
|
||||
This document defines experimental semantics for exchanging dynamic trust assertions and trust-relevant runtime events in multi-agent systems. The mechanism allows one party to communicate scoped, time-bounded statements about another party's observed trust-relevant behavior, together with confidence and optional evidence references. The mechanism supplements identity, attestation, and authorization systems; it does not replace them. The goal is to improve interoperability where long-running agent interactions require trust decisions that evolve over time rather than remaining fixed at initial authentication.
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
Many agent systems authenticate peers once and then rely on static identity or long-lived authorization artifacts for the remainder of an interaction. That approach is often insufficient for long-running or cross-domain systems in which runtime behavior, policy violations, attestation changes, or observed failures should affect how much confidence one participant places in another.
|
||||
|
||||
Several existing drafts address accountability, attestation, authorization, or cross-domain information sharing. However, the current landscape still lacks a compact, reusable way to represent and exchange dynamic trust information as an interoperable runtime signal. As a result, systems that do attempt dynamic trust tend to use proprietary or locally scoped semantics that are hard to compare or consume across implementations.
|
||||
|
||||
This document defines a narrow mechanism for trust events and trust assertions. It standardizes how such information is represented and how relying parties distinguish issuer opinion, freshness, scope, and confidence. It does not define a single global scoring algorithm, a reputation marketplace, or a replacement for authorization policy.
|
||||
|
||||
## 2. Terminology
|
||||
|
||||
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals.
|
||||
|
||||
Trust Event: an observed runtime occurrence relevant to trust assessment.
|
||||
|
||||
Trust Assertion: a structured statement by an issuer regarding the trust-relevant state of a subject.
|
||||
|
||||
Issuer: the party that originates a trust assertion.
|
||||
|
||||
Subject: the agent or service that a trust assertion describes.
|
||||
|
||||
Relying Party: a party that consumes a trust assertion for local decision-making.
|
||||
|
||||
Scope: the context in which a trust assertion is intended to apply.
|
||||
|
||||
Confidence: the issuer's stated level of confidence in a trust assertion.
|
||||
|
||||
Freshness: the temporal validity information for a trust assertion, including creation time, expiry, or other recency limits.
|
||||
|
||||
Evidence Reference: a pointer to supporting execution, attestation, compliance, or observational evidence.
|
||||
|
||||
Revocation: withdrawal or supersession of a previously issued trust assertion.
|
||||
|
||||
Portable Trust Assertion: a trust assertion intended for use outside the issuer's local trust store.
|
||||
|
||||
Local Trust State: trust information maintained only within one implementation and not intended for exchange under this document.
|
||||
|
||||
## 3. Problem Statement and Design Goals
|
||||
|
||||
Static identity answers who a peer claims to be. Dynamic trust concerns whether recent behavior, evidence, and context justify continuing to rely on that peer in the same way. In current practice, systems often blur these concepts, leading to three recurring problems:
|
||||
|
||||
- trust information is shared without clear scope or expiry,
|
||||
- negative trust signals are propagated without confidence or evidence context, and
|
||||
- receivers treat portable trust statements as universal authorization decisions.
|
||||
|
||||
The design goals for this document are therefore:
|
||||
|
||||
- to define a compact representation for trust events and trust assertions,
|
||||
- to require issuer, subject, scope, freshness, and confidence information,
|
||||
- to support revocation or supersession of stale assertions,
|
||||
- to preserve local policy discretion, and
|
||||
- to avoid false precision by not mandating one global trust algorithm.
|
||||
|
||||
## 4. Trust Model Overview
|
||||
|
||||
This document defines two related objects:
|
||||
|
||||
- a `trust-event`, and
|
||||
- a `trust-assertion`.
|
||||
|
||||
A trust event is an observed occurrence that may justify a trust update. Examples include successful execution, attestation degradation, repeated policy violation, or verified protocol misbehavior. This document standardizes the representation of such events but does not require that every event be exchanged externally.
|
||||
|
||||
A trust assertion is a portable statement derived from local observation, policy processing, or supporting evidence. A trust assertion can be exchanged between participants when the issuer intends another relying party to consider that information.
|
||||
|
||||
This document is layered above identity, attestation, and authorization systems. A trust assertion MUST NOT be treated as proof of identity and MUST NOT be used as a substitute for authentication. Likewise, it MUST NOT by itself grant authorization. Instead, it provides a supplemental input to local policy.
|
||||
|
||||
## 5. Trust Events
|
||||
|
||||
A trust event record MUST include:
|
||||
|
||||
- event identifier,
|
||||
- subject identifier,
|
||||
- issuer or observer identifier,
|
||||
- event type,
|
||||
- timestamp,
|
||||
- scope.
|
||||
|
||||
A trust event SHOULD include an evidence reference when supporting evidence exists and can be shared. A trust event MAY include an explanation code or local severity value.
|
||||
|
||||
This document does not require that all trust events be externally exchanged. An implementation MAY use local-only trust events to derive portable trust assertions. However, if a portable trust assertion references a trust event, the implementation SHOULD preserve enough linkage that a relying party can understand the event context.
|
||||
|
||||
Example trust-event categories include:
|
||||
|
||||
- successful verified execution,
|
||||
- attestation downgrade,
|
||||
- policy violation,
|
||||
- repeated protocol error,
|
||||
- trust recovery after remediation.
|
||||
|
||||
This list is illustrative only.
|
||||
|
||||
## 6. Trust Assertions
|
||||
|
||||
A trust assertion MUST include:
|
||||
|
||||
- assertion identifier,
|
||||
- issuer identifier,
|
||||
- subject identifier,
|
||||
- trust statement value,
|
||||
- confidence value,
|
||||
- freshness information,
|
||||
- scope.
|
||||
|
||||
A trust assertion MAY include:
|
||||
|
||||
- evidence reference,
|
||||
- explanation code,
|
||||
- delta-from-prior value,
|
||||
- revokes or supersedes assertion identifier.
|
||||
|
||||
This document permits multiple trust statement models, including bounded levels, numeric values, or delta updates. If an issuer uses a given model, the assertion MUST identify that model clearly enough for the relying party to interpret the statement.
|
||||
|
||||
An issuer MUST distinguish portable trust assertions from local trust state. A relying party MUST be able to determine whether the received assertion is intended to travel across administrative boundaries or is only meaningful within the issuer's local environment.
|
||||
|
||||
## 7. Freshness, Confidence, and Revocation
|
||||
|
||||
Freshness is mandatory. An issuer MUST include enough temporal information for a relying party to detect stale assertions. At minimum, that means creation time and either expiry time or a validity policy that can be interpreted consistently.
|
||||
|
||||
Confidence is distinct from trust value. The trust statement says what the issuer believes about the subject; the confidence value says how strongly the issuer stands behind that statement. A relying party MUST NOT assume that a high trust value implies high confidence, or vice versa.
|
||||
|
||||
If an issuer revokes or supersedes a prior assertion, the new assertion MUST identify the prior assertion. A relying party receiving both old and new assertions SHOULD prefer the newer assertion when freshness and issuer identity indicate that supersession is valid.
|
||||
|
||||
Issuers MUST NOT present stale assertions as current. Relying parties SHOULD reject or downgrade stale assertions according to local policy.
|
||||
|
||||
## 8. Receiver Processing and Policy Boundaries
|
||||
|
||||
Relying parties consume trust assertions as local policy input. This document does not require one decision algorithm. However, receivers MUST preserve the following distinctions:
|
||||
|
||||
- issuer opinion versus objective fact,
|
||||
- trust value versus confidence,
|
||||
- portable assertion versus local trust state,
|
||||
- trust input versus authorization decision.
|
||||
|
||||
A relying party MAY combine assertions from multiple issuers, but comparison across issuers is inherently local-policy dependent. This document therefore does not define issuer ranking, quorum rules, or mandatory aggregation algorithms.
|
||||
|
||||
When a negative assertion lacks sufficient freshness, scope, or issuer clarity, a relying party SHOULD treat it cautiously or ignore it. When a positive assertion lacks evidence reference where such evidence is normally expected, the relying party MAY reduce its weight.
|
||||
|
||||
## 9. Security Considerations
|
||||
|
||||
Dynamic trust information is vulnerable to spoofing, replay, collusion, and reputational poisoning. Implementations therefore need authenticated origin and integrity protection for portable trust assertions, even though this document does not define the underlying cryptographic transport or token format.
|
||||
|
||||
Replay of stale trust assertions can preserve outdated trust long after behavior has changed. For this reason, freshness is mandatory and receivers SHOULD apply explicit stale-data handling.
|
||||
|
||||
Colluding issuers can amplify false claims. This document does not solve collusion, but it reduces ambiguity by requiring issuer identification, scope, and confidence. Deployments SHOULD avoid treating multiple assertions as independent when they originate from closely related sources.
|
||||
|
||||
Trust assertions can also be misused as unauthorized access-control surrogates. Implementers MUST NOT treat a trust assertion alone as granting access absent normal authorization checks.
|
||||
|
||||
## 10. Privacy Considerations
|
||||
|
||||
Trust events and trust assertions may reveal sensitive operational information, including policy violations, remediation history, attestation degradation, or other indicators of weakness. Negative assertions may also expose behavior that a subject does not expect to be shared across domains.
|
||||
|
||||
Implementations SHOULD minimize disclosure to what is necessary for the intended scope. Evidence references SHOULD avoid exposing raw sensitive details when a narrower reference suffices. Cross-domain sharing of negative assertions deserves particular caution because it can create lasting reputational effects outside the original operational context.
|
||||
|
||||
## 11. IANA Considerations
|
||||
|
||||
This document currently requests no IANA action.
|
||||
|
||||
If implementation experience later shows clear need for shared registries, suitable candidates include trust-event categories, trust statement model identifiers, and explanation codes. Such registries should remain compact and avoid implying false precision.
|
||||
|
||||
## 12. References
|
||||
|
||||
- [RFC2119] Key words for use in RFCs to Indicate Requirement Levels.
|
||||
- [RFC8174] Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words.
|
||||
- Placeholder reference for `draft-cosmos-protocol-specification`.
|
||||
- Placeholder reference for `draft-jiang-seat-dynamic-attestation`.
|
||||
- Placeholder reference for `draft-aylward-daap-v2`.
|
||||
- Placeholder reference for `draft-birkholz-verifiable-agent-conversations`.
|
||||
@@ -0,0 +1,190 @@
|
||||
# Draft
|
||||
|
||||
## Abstract
|
||||
|
||||
This document defines experimental semantics for exchanging portable dynamic trust assertions and associated trust-relevant runtime events in multi-agent systems. The mechanism allows one party to communicate a scoped, time-bounded opinion about another party's observed trust-relevant behavior, together with model identification, confidence, freshness, and optional evidence or explanation data. The mechanism supplements identity, attestation, and authorization systems; it does not replace them. Its purpose is to improve interoperability where long-running agent interactions require trust decisions that evolve over time rather than remaining fixed at initial authentication.
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
Many agent systems authenticate peers once and then rely on static identity or long-lived authorization artifacts for the remainder of an interaction. That approach is often insufficient for long-running or cross-domain systems in which runtime behavior, policy violations, attestation changes, or observed failures should affect how much confidence one participant places in another.
|
||||
|
||||
Several existing drafts address accountability, attestation, authorization, or cross-domain information sharing. However, the current landscape still lacks a compact, reusable way to represent and exchange dynamic trust information as an interoperable runtime signal. As a result, systems that do attempt dynamic trust tend to use proprietary or locally scoped semantics that are hard to compare or consume across implementations.
|
||||
|
||||
This document defines a narrow mechanism for trust assertions and supporting trust events. It standardizes how portable trust information is represented and how relying parties distinguish issuer opinion, freshness, scope, confidence, and model type. It does not define a global scoring algorithm, a reputation marketplace, or a replacement for authorization policy.
|
||||
|
||||
The intended status of this document is Experimental.
|
||||
|
||||
## 2. Terminology
|
||||
|
||||
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals.
|
||||
|
||||
Trust Event: an observed runtime occurrence relevant to trust assessment.
|
||||
|
||||
Trust Assertion: a structured statement by an issuer regarding the trust-relevant state of a subject.
|
||||
|
||||
Portable Trust Assertion: a trust assertion intended for use outside the issuer's local trust store.
|
||||
|
||||
Issuer: the party that originates a trust assertion.
|
||||
|
||||
Subject: the agent or service that a trust assertion describes.
|
||||
|
||||
Relying Party: a party that consumes a trust assertion for local decision-making.
|
||||
|
||||
Scope: the context in which a trust assertion is intended to apply.
|
||||
|
||||
Confidence: the issuer's stated degree of confidence in a trust assertion.
|
||||
|
||||
Freshness: the temporal validity information for a trust assertion, including creation time and expiry or equivalent validity bound.
|
||||
|
||||
Model Identifier: an identifier indicating how the trust statement value is to be interpreted, such as level-based, numeric, or delta-based.
|
||||
|
||||
Evidence Reference: a pointer to supporting execution, attestation, compliance, or observational evidence.
|
||||
|
||||
Explanation Code: a compact issuer-supplied explanation label associated with a trust assertion.
|
||||
|
||||
Revocation: invalidation of a previously issued trust assertion.
|
||||
|
||||
Supersession: replacement of a prior trust assertion by a newer assertion from the same issuer.
|
||||
|
||||
Local Trust State: trust information maintained only within one implementation and not intended for exchange under this document.
|
||||
|
||||
## 3. Problem Statement and Design Goals
|
||||
|
||||
Static identity answers who a peer claims to be. Dynamic trust concerns whether recent behavior, evidence, and context justify continuing to rely on that peer in the same way. In current practice, systems often blur these concepts, leading to three recurring problems:
|
||||
|
||||
- trust information is shared without clear scope or expiry,
|
||||
- negative trust signals are propagated without confidence or evidence context, and
|
||||
- receivers treat portable trust statements as universal authorization decisions.
|
||||
|
||||
The design goals for this document are therefore:
|
||||
|
||||
- to define a compact portable trust assertion envelope,
|
||||
- to require issuer, subject, scope, freshness, confidence, and model identification,
|
||||
- to support revocation or supersession of stale assertions,
|
||||
- to preserve local policy discretion, and
|
||||
- to avoid false precision by not mandating one global trust algorithm.
|
||||
|
||||
## 4. Trust Model Overview
|
||||
|
||||
This document standardizes portable trust assertions as the primary interoperable object. Trust events are supporting input objects that MAY be exchanged or MAY remain local, depending on deployment needs.
|
||||
|
||||
A trust event is an observed occurrence that may justify a trust update. Examples include successful execution, attestation degradation, repeated policy violation, or verified protocol misbehavior. This document standardizes the minimal representation of such events, but portable trust assertions are the main interoperability target.
|
||||
|
||||
A portable trust assertion is a scoped issuer opinion derived from local observation, policy processing, or supporting evidence. A portable trust assertion can be exchanged between participants when the issuer intends another relying party to consider that information.
|
||||
|
||||
This document is layered above identity, attestation, and authorization systems. A portable trust assertion MUST NOT be treated as proof of identity and MUST NOT be used as a substitute for authentication. Likewise, it MUST NOT by itself grant authorization. Instead, it provides supplemental input to local policy.
|
||||
|
||||
## 5. Trust Events
|
||||
|
||||
A trust event record MUST include:
|
||||
|
||||
- event identifier,
|
||||
- subject identifier,
|
||||
- issuer or observer identifier,
|
||||
- event type,
|
||||
- timestamp,
|
||||
- scope.
|
||||
|
||||
A trust event SHOULD include an evidence reference when supporting evidence exists and can be shared. A trust event MAY include an explanation code or local severity value.
|
||||
|
||||
This document does not require that all trust events be externally exchanged. An implementation MAY use local-only trust events to derive portable trust assertions. If a portable trust assertion references a trust event, the implementation SHOULD preserve enough linkage that a relying party can understand the event context.
|
||||
|
||||
This document does not standardize a mandatory global event vocabulary in v2. Event-type names MAY be profile-specific unless later implementation experience shows the need for shared registries.
|
||||
|
||||
## 6. Trust Assertions
|
||||
|
||||
A portable trust assertion MUST include:
|
||||
|
||||
- assertion identifier,
|
||||
- issuer identifier,
|
||||
- subject identifier,
|
||||
- trust statement value,
|
||||
- model identifier,
|
||||
- confidence value,
|
||||
- freshness information,
|
||||
- scope.
|
||||
|
||||
A portable trust assertion MAY include:
|
||||
|
||||
- evidence reference,
|
||||
- explanation code,
|
||||
- delta-from-prior value,
|
||||
- revokes assertion identifier,
|
||||
- supersedes assertion identifier.
|
||||
|
||||
An issuer MUST distinguish portable trust assertions from local trust state. A relying party MUST be able to determine whether the received assertion is intended to travel across administrative boundaries or is only meaningful within the issuer's local environment.
|
||||
|
||||
If a portable trust assertion carries a negative or cautionary trust statement, it MUST include either an evidence reference or an explanation code. It MAY include both.
|
||||
|
||||
## 7. Freshness, Confidence, and Revocation
|
||||
|
||||
Freshness is mandatory. An issuer MUST include enough temporal information for a relying party to detect stale assertions. At minimum, that means creation time and either expiry time or a validity bound that can be interpreted consistently.
|
||||
|
||||
Confidence is distinct from trust value. The trust statement says what the issuer believes about the subject; the confidence value says how strongly the issuer stands behind that statement. A relying party MUST NOT assume that a high trust value implies high confidence, or vice versa.
|
||||
|
||||
Revocation and supersession are distinct. Revocation invalidates a prior assertion without necessarily replacing it with a new positive or negative assertion. Supersession replaces a prior assertion with a newer one from the same issuer. If an issuer revokes or supersedes a prior assertion, the new assertion MUST identify the prior assertion.
|
||||
|
||||
Issuers MUST NOT present stale assertions as current. A relying party MUST reject a clearly expired portable trust assertion as conformant input, though it MAY retain it locally for audit or diagnostic purposes.
|
||||
|
||||
## 8. Receiver Processing and Policy Boundaries
|
||||
|
||||
Portable trust assertions are local policy input. This document does not require one decision algorithm. However, receivers MUST preserve the following distinctions:
|
||||
|
||||
- issuer opinion versus objective fact,
|
||||
- trust value versus confidence,
|
||||
- portable assertion versus local trust state,
|
||||
- trust input versus authorization decision.
|
||||
|
||||
A relying party MUST NOT treat an unauthenticated portable trust assertion as conformant input under this specification. Likewise, a relying party MUST NOT treat a portable trust assertion alone as granting access absent normal authorization checks.
|
||||
|
||||
A relying party MAY combine assertions from multiple issuers, but comparison across issuers is inherently local-policy dependent. This document therefore does not define issuer ranking, quorum rules, or mandatory aggregation algorithms. Implementations SHOULD take care not to treat closely related issuers as independent corroboration sources.
|
||||
|
||||
### 8.1 Non-Normative Assertion Example
|
||||
|
||||
An issuer may send a portable trust assertion with:
|
||||
|
||||
- assertion-id `ta-44`
|
||||
- subject `agent:example:planner7`
|
||||
- issuer `agent:example:gateway2`
|
||||
- model `level`
|
||||
- trust-value `caution`
|
||||
- confidence `0.8`
|
||||
- created-at `2026-03-02T17:00:00Z`
|
||||
- expires-at `2026-03-02T18:00:00Z`
|
||||
- scope `cross-domain-task-routing`
|
||||
- explanation-code `policy-violation-recent`
|
||||
|
||||
### 8.2 Non-Normative Multi-Issuer Conflict Example
|
||||
|
||||
Issuer A sends a fresh `level=trusted` assertion with confidence `0.6` for a subject in scope `document-translation`. Issuer B sends a newer `level=caution` assertion with confidence `0.9` in the same scope, referencing a recent attestation downgrade. This document does not require one aggregation outcome. It does require that the relying party preserve issuer identity, freshness, scope, and confidence rather than collapsing the two assertions into an unexplained average.
|
||||
|
||||
## 9. Security Considerations
|
||||
|
||||
Dynamic trust information is vulnerable to spoofing, replay, collusion, and reputational poisoning. Implementations therefore need authenticated origin and integrity protection for portable trust assertions, even though this document does not define the underlying cryptographic transport or token format.
|
||||
|
||||
Replay of stale trust assertions can preserve outdated trust long after behavior has changed. For this reason, freshness is mandatory and clearly expired portable trust assertions MUST be rejected as valid current input.
|
||||
|
||||
Colluding issuers can amplify false claims. This document does not solve collusion, but it reduces ambiguity by requiring issuer identification, scope, confidence, and model identification. Deployments SHOULD avoid treating multiple assertions as independent when they originate from closely related sources.
|
||||
|
||||
Trust assertions can also be misused as unauthorized access-control surrogates. Implementers MUST NOT treat a portable trust assertion alone as granting access absent normal authorization checks.
|
||||
|
||||
## 10. Privacy Considerations
|
||||
|
||||
Trust events and trust assertions may reveal sensitive operational information, including policy violations, remediation history, attestation degradation, or other indicators of weakness. Negative assertions may also expose behavior that a subject does not expect to be shared across domains.
|
||||
|
||||
Implementations SHOULD minimize disclosure to what is necessary for the intended scope. Evidence references SHOULD avoid exposing raw sensitive details when a narrower reference suffices. Cross-domain sharing of negative assertions deserves particular caution because it can create lasting reputational effects outside the original operational context.
|
||||
|
||||
## 11. IANA Considerations
|
||||
|
||||
This document currently requests no IANA action.
|
||||
|
||||
If implementation experience later shows clear need for shared registries, suitable candidates include model identifiers, trust-event categories, and explanation codes. Such registries should remain compact and avoid implying false precision.
|
||||
|
||||
## 12. References
|
||||
|
||||
- [RFC2119] Key words for use in RFCs to Indicate Requirement Levels.
|
||||
- [RFC8174] Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words.
|
||||
- Placeholder reference for `draft-cosmos-protocol-specification`.
|
||||
- Placeholder reference for `draft-jiang-seat-dynamic-attestation`.
|
||||
- Placeholder reference for `draft-aylward-daap-v2`.
|
||||
- Placeholder reference for `draft-birkholz-verifiable-agent-conversations`.
|
||||
@@ -0,0 +1,24 @@
|
||||
# Architecture Review
|
||||
|
||||
## Findings
|
||||
|
||||
### Medium: scope discipline is good, but the draft risks under-specifying the portable core
|
||||
|
||||
The draft correctly avoids becoming a universal reputation system. The remaining risk is that so much is left to local policy that the portable assertion core becomes too thin. The architecture should define a firmer minimum portable envelope.
|
||||
|
||||
### Medium: the trust-event object may be more than the first revision needs
|
||||
|
||||
The draft has both trust events and trust assertions. That layering is sensible, but the architecture should say more directly whether trust-event interoperability is a primary goal or merely a feeder model for assertions. Otherwise readers may assume both layers are equally mature.
|
||||
|
||||
### Medium: revocation and supersession deserve a cleaner conceptual split
|
||||
|
||||
The draft treats revocation as withdrawal or supersession, but those are not always the same. One invalidates a prior assertion; the other replaces it with a newer one. This distinction should be sharper.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Is the first implementable milestone portable assertions only, with trust events described as optional supporting input?
|
||||
- Should revocation be kept as a general umbrella term or split explicitly into revoke and supersede actions?
|
||||
|
||||
## Residual risk
|
||||
|
||||
The document has good boundaries. The main architectural risk is not scope creep but insufficient commitment to a concrete portable core.
|
||||
@@ -0,0 +1,28 @@
|
||||
# IETF Senior Review
|
||||
|
||||
## Findings
|
||||
|
||||
### High: the draft is credible, but it still reads more like an architecture note than a standards-ready specification
|
||||
|
||||
The structure is sound and the layering is disciplined. What it still lacks is the slight extra formality that makes an Internet-Draft feel publishable: clearer field requirements, fewer conceptual transitions, and less reliance on explanatory prose in Sections 5 through 8.
|
||||
|
||||
### Medium: the abstract should emphasize scoped issuer opinion sooner
|
||||
|
||||
That point is present later in the document and is central to avoiding misuse. It should appear earlier and more explicitly in the abstract.
|
||||
|
||||
### Medium: IANA and references remain intentionally provisional
|
||||
|
||||
That is acceptable at this stage, but before circulation beyond an internal drafting loop, the document should either define a tiny initial model registry or clearly state that all model identifiers are profile-specific pending later work.
|
||||
|
||||
### Medium: terminology is good, but a few terms could be made more standards-native
|
||||
|
||||
Portable Trust Assertion and Local Trust State are useful distinctions, though they currently read slightly informal. Tightening those definitions would improve the document.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Is the intended status Experimental explicitly stated in the draft text anywhere, or only in the cycle metadata?
|
||||
- Should the document explicitly note that it does not define trust aggregation across issuers?
|
||||
|
||||
## Residual publishability risk
|
||||
|
||||
This is a strong first version. The remaining work is mainly to replace architectural vagueness with just enough protocol discipline to withstand IETF-style scrutiny.
|
||||
@@ -0,0 +1,28 @@
|
||||
# Security Review
|
||||
|
||||
## Findings
|
||||
|
||||
### High: assertion authenticity is assumed but not tied to required receiver behavior
|
||||
|
||||
The draft correctly says portable trust assertions need authenticated origin and integrity protection, but it does not make rejection behavior explicit. A receiver should not be allowed to consume an unauthenticated portable assertion and still claim conformance.
|
||||
|
||||
### High: replay handling depends on freshness, but the minimum stale-data rule is too soft
|
||||
|
||||
Freshness is mandatory, which is good, but receivers only SHOULD reject or downgrade stale assertions. For clearly expired assertions, that is too weak. A stronger interoperability floor is warranted.
|
||||
|
||||
### Medium: negative trust sharing creates reputational poisoning risk without minimum evidence discipline
|
||||
|
||||
The document warns about poisoning and privacy, yet evidence references remain entirely optional. That is reasonable for all assertions, but negative portable assertions may need a stronger requirement for explanation or evidence linkage.
|
||||
|
||||
### Medium: collusion risk is identified but not operationalized
|
||||
|
||||
The draft notes that multiple issuers may not be independent, but it gives no guidance on how a relying party should avoid double-counting related issuers. Even a brief cautionary requirement or implementation note would help.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Should unauthenticated portable trust assertions be explicitly non-conformant?
|
||||
- Should negative assertions require either evidence reference or explanation code?
|
||||
|
||||
## Residual risk
|
||||
|
||||
Even with improvements, dynamic trust will remain vulnerable to social and operational abuse that pure wire semantics cannot prevent. The draft should state those limits plainly.
|
||||
@@ -0,0 +1,28 @@
|
||||
# Software Review
|
||||
|
||||
## Findings
|
||||
|
||||
### High: trust statement models are allowed to vary, but model identification is still too abstract
|
||||
|
||||
The draft says a trust assertion must identify its model clearly enough for interpretation, but it never sketches the minimum structure of that identifier. Implementers need at least an abstract field or named model token.
|
||||
|
||||
### Medium: receiver processing lacks concrete examples of multi-issuer conflict
|
||||
|
||||
The text is directionally correct that aggregation is local policy, but a non-normative example of conflicting assertions with different confidence and freshness would make implementation much easier.
|
||||
|
||||
### Medium: trust-event categories are illustrative only, which is safe, but leaves event producers with little interoperability anchor
|
||||
|
||||
The draft should either define a small initial event vocabulary or state more clearly that event categories are profile-specific and not intended to interoperate by name in v1.
|
||||
|
||||
### Medium: freshness requirements need a clearer shape
|
||||
|
||||
The text requires creation time and either expiry or validity policy, but two implementations could still encode validity very differently. The document would benefit from one abstract freshness shape or example.
|
||||
|
||||
## Open questions
|
||||
|
||||
- Should the document standardize a tiny base model such as `level`, `numeric`, and `delta`?
|
||||
- Should it include a compact example trust assertion object?
|
||||
|
||||
## Residual risk
|
||||
|
||||
The draft is conceptually coherent, but still needs one more layer of data-shape clarity before implementation teams are likely to converge cleanly.
|
||||
@@ -0,0 +1,7 @@
|
||||
# Architecture Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# IETF Senior Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual publishability risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# Security Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# Software Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,25 @@
|
||||
# Review Synthesis
|
||||
|
||||
## Blocking findings
|
||||
|
||||
- Add an explicit conformance rule that portable trust assertions require authenticated origin and integrity protection; unauthenticated portable assertions must not be treated as conformant input.
|
||||
- Tighten stale-data handling so clearly expired assertions are rejected rather than merely "downgraded" at implementer discretion.
|
||||
- Define a firmer minimum portable data shape for trust assertions, including explicit model identification.
|
||||
|
||||
## Major findings
|
||||
|
||||
- Clarify whether trust-event interoperability is core to the document or whether trust events are primarily feeder objects for portable assertions.
|
||||
- Strengthen the handling of negative assertions by requiring either evidence reference or explanation code when such assertions are exchanged portably.
|
||||
- Clarify revocation versus supersession.
|
||||
- Add one compact example of conflicting assertions from different issuers to make receiver processing easier to implement.
|
||||
|
||||
## Minor findings
|
||||
|
||||
- Tighten abstract wording around scoped issuer opinion.
|
||||
- Make a few terminology definitions more RFC-like.
|
||||
- Reduce provisional tone in IANA and dependency text.
|
||||
|
||||
## Conflicts resolved
|
||||
|
||||
- No major reviewer conflict exists. All reviewers support the narrow scope.
|
||||
- The only tension is between remaining model-agnostic and becoming implementable. Resolution: keep algorithm choice open, but define a stronger minimum portable assertion envelope and clearer stale-data behavior.
|
||||
@@ -0,0 +1,9 @@
|
||||
# Review Synthesis
|
||||
|
||||
## Blocking findings
|
||||
|
||||
## Major findings
|
||||
|
||||
## Minor findings
|
||||
|
||||
## Conflicts resolved
|
||||
@@ -0,0 +1,29 @@
|
||||
# Revision Plan
|
||||
|
||||
## Blocking changes
|
||||
|
||||
- Add explicit rejection behavior for unauthenticated portable trust assertions.
|
||||
- Strengthen stale-data handling for expired assertions.
|
||||
- Add a clearer abstract field or token for trust statement model identification.
|
||||
- Clarify whether negative portable assertions require evidence reference, explanation code, or one of the two.
|
||||
|
||||
## High-value improvements
|
||||
|
||||
- Add one compact example assertion and one multi-issuer conflict example.
|
||||
- Clarify revocation versus supersession.
|
||||
- Decide whether trust events are first-class interoperable objects in v1 or primarily internal feeder records.
|
||||
- Tighten abstract and terminology wording.
|
||||
|
||||
## Deferred items
|
||||
|
||||
- cross-issuer aggregation algorithms
|
||||
- global reputation semantics
|
||||
- large shared registries
|
||||
- mandatory numeric scoring
|
||||
|
||||
## Draft order for next iteration
|
||||
|
||||
1. Tighten Sections 4 through 8 around portable assertion conformance.
|
||||
2. Add explicit model identification and stale-data rules.
|
||||
3. Add negative-assertion handling rules and examples.
|
||||
4. Revisit Security, Privacy, IANA, and References for final consistency.
|
||||
@@ -0,0 +1,9 @@
|
||||
# Revision Plan
|
||||
|
||||
## Blocking changes
|
||||
|
||||
## High-value improvements
|
||||
|
||||
## Deferred items
|
||||
|
||||
## Draft order for next iteration
|
||||
25
workspace/draft-team/cycles/example-topic/00-user-spec.md
Normal file
25
workspace/draft-team/cycles/example-topic/00-user-spec.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# User Spec
|
||||
|
||||
## Topic
|
||||
|
||||
## Goal
|
||||
|
||||
## Intended status
|
||||
|
||||
Informational, Experimental, or Standards Track.
|
||||
|
||||
## Problem to solve
|
||||
|
||||
## What must be true in the final draft
|
||||
|
||||
## Constraints
|
||||
|
||||
- scope constraints
|
||||
- compatibility constraints
|
||||
- terminology constraints
|
||||
|
||||
## Source materials to prioritize
|
||||
|
||||
## Success criteria
|
||||
|
||||
## Questions for the team
|
||||
@@ -0,0 +1,13 @@
|
||||
# Research Brief
|
||||
|
||||
## Problem framing
|
||||
|
||||
## Evidence from existing drafts
|
||||
|
||||
## Overlap and adjacent work
|
||||
|
||||
## Gaps and unresolved questions
|
||||
|
||||
## Additional data worth investigating
|
||||
|
||||
## Recommendation to the architect
|
||||
@@ -0,0 +1,17 @@
|
||||
# Architecture Brief
|
||||
|
||||
## Scope
|
||||
|
||||
## Non-goals
|
||||
|
||||
## Terminology and actors
|
||||
|
||||
## Protocol or data model shape
|
||||
|
||||
## Normative requirements candidates
|
||||
|
||||
## Security, privacy, and abuse considerations
|
||||
|
||||
## IANA impact
|
||||
|
||||
## Open design questions
|
||||
9
workspace/draft-team/cycles/example-topic/30-outline.md
Normal file
9
workspace/draft-team/cycles/example-topic/30-outline.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Draft Outline
|
||||
|
||||
## Abstract
|
||||
|
||||
## Section plan
|
||||
|
||||
## Author guidance by section
|
||||
|
||||
## Issues that must not be hand-waved
|
||||
21
workspace/draft-team/cycles/example-topic/40-draft-v1.md
Normal file
21
workspace/draft-team/cycles/example-topic/40-draft-v1.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Draft
|
||||
|
||||
## Abstract
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
## 2. Terminology
|
||||
|
||||
## 3. Problem Statement
|
||||
|
||||
## 4. Protocol Overview
|
||||
|
||||
## 5. Detailed Specification
|
||||
|
||||
## 6. Security Considerations
|
||||
|
||||
## 7. Privacy Considerations
|
||||
|
||||
## 8. IANA Considerations
|
||||
|
||||
## 9. References
|
||||
@@ -0,0 +1,9 @@
|
||||
# Review Report
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Strengths
|
||||
|
||||
## Residual publishability risk
|
||||
@@ -0,0 +1,9 @@
|
||||
# Revision Plan
|
||||
|
||||
## Blocking changes
|
||||
|
||||
## High-value improvements
|
||||
|
||||
## Deferred items
|
||||
|
||||
## Draft order for next iteration
|
||||
@@ -0,0 +1,25 @@
|
||||
# User Spec
|
||||
|
||||
## Topic
|
||||
|
||||
## Goal
|
||||
|
||||
## Intended status
|
||||
|
||||
Informational, Experimental, or Standards Track.
|
||||
|
||||
## Problem to solve
|
||||
|
||||
## What must be true in the final draft
|
||||
|
||||
## Constraints
|
||||
|
||||
- scope constraints
|
||||
- compatibility constraints
|
||||
- terminology constraints
|
||||
|
||||
## Source materials to prioritize
|
||||
|
||||
## Success criteria
|
||||
|
||||
## Questions for the team
|
||||
@@ -0,0 +1,13 @@
|
||||
# Research Brief
|
||||
|
||||
## Problem framing
|
||||
|
||||
## Evidence from existing drafts
|
||||
|
||||
## Overlap and adjacent work
|
||||
|
||||
## Gaps and unresolved questions
|
||||
|
||||
## Additional data worth investigating
|
||||
|
||||
## Recommendation to the architect
|
||||
@@ -0,0 +1,17 @@
|
||||
# Architecture Brief
|
||||
|
||||
## Scope
|
||||
|
||||
## Non-goals
|
||||
|
||||
## Terminology and actors
|
||||
|
||||
## Protocol or data model shape
|
||||
|
||||
## Normative requirements candidates
|
||||
|
||||
## Security, privacy, and abuse considerations
|
||||
|
||||
## IANA impact
|
||||
|
||||
## Open design questions
|
||||
@@ -0,0 +1,9 @@
|
||||
# Draft Outline
|
||||
|
||||
## Abstract
|
||||
|
||||
## Section plan
|
||||
|
||||
## Author guidance by section
|
||||
|
||||
## Issues that must not be hand-waved
|
||||
21
workspace/draft-team/cycles/review-board-test/40-draft-v1.md
Normal file
21
workspace/draft-team/cycles/review-board-test/40-draft-v1.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Draft
|
||||
|
||||
## Abstract
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
## 2. Terminology
|
||||
|
||||
## 3. Problem Statement
|
||||
|
||||
## 4. Protocol Overview
|
||||
|
||||
## 5. Detailed Specification
|
||||
|
||||
## 6. Security Considerations
|
||||
|
||||
## 7. Privacy Considerations
|
||||
|
||||
## 8. IANA Considerations
|
||||
|
||||
## 9. References
|
||||
@@ -0,0 +1,7 @@
|
||||
# Architecture Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# IETF Senior Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual publishability risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# Security Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,7 @@
|
||||
# Software Review
|
||||
|
||||
## Findings
|
||||
|
||||
## Open questions
|
||||
|
||||
## Residual risk
|
||||
@@ -0,0 +1,9 @@
|
||||
# Review Synthesis
|
||||
|
||||
## Blocking findings
|
||||
|
||||
## Major findings
|
||||
|
||||
## Minor findings
|
||||
|
||||
## Conflicts resolved
|
||||
@@ -0,0 +1,9 @@
|
||||
# Revision Plan
|
||||
|
||||
## Blocking changes
|
||||
|
||||
## High-value improvements
|
||||
|
||||
## Deferred items
|
||||
|
||||
## Draft order for next iteration
|
||||
Reference in New Issue
Block a user