Pipeline output: - ABVP: Agent Behavior Verification Protocol (quality 3.0/5) - AEM: Privacy-Preserving Agent Learning Protocol (quality 2.1/5) - ATD: Agent Task DAG Framework (quality 2.5/5) - HITL: Human-in-the-Loop Primitives (quality 2.4/5) - AEPB: Real-Time Agent Rollback Protocol (quality 2.5/5) - APAE: Agent Provenance Assurance Ecosystem (quality 2.5/5) Quality gates: all pass novelty + references, format gate improved with markdown stripping (_strip_markdown) and dynamic header padding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
794 lines
40 KiB
Plaintext
794 lines
40 KiB
Plaintext
Internet-Draft anima
|
|
Intended status: standards-track March 2026
|
|
Expires: September 05, 2026
|
|
|
|
|
|
Agent Task DAG: A Framework for Directed Acyclic Graph Execution in Multi-Agent Systems
|
|
draft-agent-ecosystem-agent-task-a-00
|
|
|
|
Abstract
|
|
|
|
As AI agent systems become increasingly complex, there is a
|
|
growing need for structured approaches to orchestrate multi-step
|
|
tasks across multiple autonomous agents. This document defines the
|
|
Agent Task DAG (Directed Acyclic Graph) framework, which provides
|
|
a standardized approach for representing, executing, and managing
|
|
complex workflows in multi-agent environments. The framework
|
|
addresses key challenges including task decomposition, dependency
|
|
management, parallel execution, failure recovery, and human
|
|
oversight integration. By building upon existing agent
|
|
authorization profiles and task negotiation protocols, this
|
|
specification enables agents to coordinate complex workflows while
|
|
maintaining security, auditability, and the ability to incorporate
|
|
human-in-the-loop decision points. The framework supports both
|
|
fast execution in trusted environments and rigorous verification
|
|
in regulated contexts through configurable assurance profiles.
|
|
|
|
Status of This Memo
|
|
|
|
This Internet-Draft is submitted in full conformance with the
|
|
provisions of BCP 78 and BCP 79.
|
|
|
|
This document is intended to have standards-track status.
|
|
Distribution of this memo is unlimited.
|
|
|
|
Terminology
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
|
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
|
|
"MAY", and "OPTIONAL" in this document are to be interpreted as
|
|
described in BCP 14 [RFC2119] [RFC8174] when, and only when, they
|
|
appear in all capitals, as shown here.
|
|
|
|
Agent Task DAG
|
|
A directed acyclic graph representing a complex workflow where
|
|
nodes represent individual tasks and edges represent
|
|
dependencies between tasks
|
|
|
|
Task Node
|
|
An individual unit of work within a DAG that can be executed by
|
|
one or more agents
|
|
|
|
Execution Context
|
|
The runtime environment and state information associated with
|
|
DAG execution, including agent assignments, intermediate
|
|
results, and checkpoint data
|
|
|
|
Checkpoint
|
|
A persistent snapshot of DAG execution state that enables
|
|
rollback and recovery operations
|
|
|
|
Task Binding
|
|
The association of a task node with specific agent capabilities
|
|
or agent instances
|
|
|
|
DAG Coordinator
|
|
An agent or system component responsible for orchestrating the
|
|
execution of a complete DAG workflow
|
|
|
|
|
|
Table of Contents
|
|
|
|
1. Introduction ................................................ 3
|
|
2. Terminology ................................................. 4
|
|
3. Problem Statement ........................................... 5
|
|
4. Agent Task DAG Framework .................................... 6
|
|
5. Task Execution Protocol ..................................... 7
|
|
6. Checkpoint and Recovery Mechanisms .......................... 8
|
|
7. Integration with Existing Agent Protocols ................... 9
|
|
8. Security Considerations ..................................... 10
|
|
9. IANA Considerations ......................................... 11
|
|
10. References .................................................. 12
|
|
|
|
1. Introduction
|
|
|
|
The increasing sophistication of AI agent systems has created a
|
|
demand for structured approaches to orchestrate complex, multi-
|
|
step tasks across autonomous agents. While individual agents have
|
|
become capable of handling sophisticated reasoning and execution
|
|
tasks, real-world applications often require coordinating multiple
|
|
agents to complete workflows that involve parallel processing,
|
|
sequential dependencies, and dynamic task allocation. Current
|
|
approaches to multi-agent coordination typically rely on ad-hoc
|
|
communication patterns or simple request-response chains, which
|
|
lack the expressiveness and reliability needed for complex
|
|
enterprise and research applications.
|
|
|
|
This document defines the Agent Task DAG (Directed Acyclic Graph)
|
|
framework, which provides a standardized approach for
|
|
representing, executing, and managing complex workflows in multi-
|
|
agent environments. The framework builds upon existing agent
|
|
protocols, particularly the Agent Authorization Profile [draft-
|
|
aap-oauth-profile] for security and authorization, and agent task
|
|
coordination mechanisms [draft-cui-ai-agent-task] for basic task
|
|
execution. By representing workflows as directed acyclic graphs,
|
|
the framework enables explicit modeling of task dependencies,
|
|
parallel execution opportunities, and conditional branching while
|
|
maintaining guarantees about workflow termination and consistency.
|
|
|
|
The Agent Task DAG framework addresses several critical challenges
|
|
in multi-agent systems: task decomposition and dependency
|
|
management, efficient parallel execution across heterogeneous
|
|
agents, robust failure recovery and rollback mechanisms, and
|
|
integration of human oversight at critical decision points. The
|
|
framework leverages structured claims for agent context [draft-
|
|
aap-oauth-profile] to enable context-aware task assignment and
|
|
supports agent context distribution mechanisms [draft-chang-agent-
|
|
context-interaction] to maintain coherent state across complex
|
|
multi-round workflows. This approach ensures that agents can
|
|
coordinate effectively while maintaining security boundaries and
|
|
audit trails required in enterprise environments.
|
|
|
|
The specification is designed to be protocol-agnostic and can
|
|
operate over various transport mechanisms including HTTP
|
|
[RFC9110], message queuing systems, and specialized agent
|
|
communication protocols. The framework integrates with existing
|
|
OAuth 2.0 [RFC6749] and JWT [RFC7519] infrastructure through the
|
|
Agent Authorization Profile, enabling seamless deployment in
|
|
environments that already support agent authentication and
|
|
authorization. The DAG representation follows JSON [RFC8259]
|
|
encoding standards to ensure broad compatibility and easy
|
|
integration with existing agent development frameworks.
|
|
|
|
This document focuses specifically on the DAG execution framework
|
|
and does not address broader questions of agent discovery,
|
|
capability matching, or task marketplace mechanisms, which are
|
|
covered by complementary specifications. The framework assumes the
|
|
existence of agent authorization infrastructure and builds upon
|
|
established patterns for agent-to-agent communication while
|
|
providing the additional structure needed for complex workflow
|
|
coordination.
|
|
|
|
2. Terminology
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
|
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
|
|
"MAY", and "OPTIONAL" in this document are to be interpreted as
|
|
described in BCP 14 [RFC2119] [RFC8174] when, and only when, they
|
|
appear in all capitals, as shown here.
|
|
|
|
This specification builds upon terminology established in the
|
|
Agent Authorization Profile [draft-aap-oauth-profile], AI Agent
|
|
Task specifications [draft-cui-ai-agent-task], and Agent Context
|
|
Interaction mechanisms [draft-chang-agent-context-interaction].
|
|
The following terms are defined for use throughout this document:
|
|
|
|
Agent Task DAG: A directed acyclic graph data structure
|
|
representing a complex multi-step workflow where nodes correspond
|
|
to individual tasks and directed edges represent dependency
|
|
relationships between tasks. The DAG enforces execution ordering
|
|
constraints while enabling parallel execution of independent task
|
|
branches. Each DAG maintains metadata including creation time,
|
|
ownership, and execution policies that govern how the workflow may
|
|
be executed across multiple agents.
|
|
|
|
Task Node: An individual unit of work within an Agent Task DAG
|
|
that encapsulates a specific operation to be performed by one or
|
|
more AI agents. Each task node contains task specifications,
|
|
input/output schemas, execution constraints, and binding
|
|
requirements that determine which agents are capable of executing
|
|
the task. Task nodes maintain state information including
|
|
execution status, assigned agents, and result data as defined in
|
|
[draft-cui-ai-agent-task].
|
|
|
|
Execution Context: The runtime environment and associated state
|
|
information that governs the execution of an Agent Task DAG. The
|
|
execution context includes agent assignments, intermediate task
|
|
results, security credentials, operational constraints from Agent
|
|
Authorization Profiles [draft-aap-oauth-profile], and distributed
|
|
context information as specified in [draft-chang-agent-context-
|
|
interaction]. The execution context ensures consistency and
|
|
provides necessary information for task coordination across
|
|
multiple agents.
|
|
|
|
Checkpoint: A persistent, immutable snapshot of Agent Task DAG
|
|
execution state captured at a specific point in time. Checkpoints
|
|
contain the complete execution context, task completion status,
|
|
intermediate results, and sufficient metadata to enable rollback
|
|
and recovery operations. Checkpoints serve as recovery points for
|
|
failure scenarios and decision points for human-in-the-loop
|
|
interventions.
|
|
|
|
Task Binding: The process and resulting association between a task
|
|
node and specific agent capabilities or agent instances that will
|
|
execute the task. Task binding considers agent authorization
|
|
profiles, capability matching, resource availability, and security
|
|
constraints. The binding process may be performed statically
|
|
during DAG planning or dynamically during execution based on
|
|
runtime conditions.
|
|
|
|
DAG Coordinator: An agent or system component responsible for
|
|
orchestrating the complete lifecycle of Agent Task DAG execution.
|
|
The DAG Coordinator manages task scheduling, monitors execution
|
|
progress, handles inter-agent communication, enforces security
|
|
policies, and coordinates checkpoint and recovery operations. The
|
|
coordinator maintains the authoritative view of DAG execution
|
|
state and serves as the primary interface for human oversight and
|
|
intervention.
|
|
|
|
3. Problem Statement
|
|
|
|
Current approaches to multi-agent task coordination suffer from
|
|
several fundamental limitations that impede the development of
|
|
robust, scalable autonomous systems. Existing coordination
|
|
mechanisms typically rely on ad-hoc communication patterns, simple
|
|
request-response protocols, or basic workflow engines that were
|
|
not designed for the dynamic, autonomous nature of AI agents.
|
|
While protocols like those defined in [draft-cui-ai-agent-task]
|
|
provide foundations for individual task execution, they lack
|
|
standardized approaches for managing complex workflows involving
|
|
multiple interdependent tasks across heterogeneous agent
|
|
populations. The Agent Authorization Profile [draft-aap-oauth-
|
|
profile] establishes important primitives for agent identity and
|
|
authorization, but does not address the orchestration challenges
|
|
that arise when multiple authorized agents must coordinate to
|
|
complete complex, multi-step objectives.
|
|
|
|
The complexity of real-world AI agent applications demands
|
|
structured approaches to task decomposition and dependency
|
|
management that current protocols do not adequately address.
|
|
Agents operating in domains such as scientific research, business
|
|
process automation, or infrastructure management often require
|
|
workflows where tasks have intricate dependencies, may execute in
|
|
parallel when possible, and must handle partial failures
|
|
gracefully. Without standardized mechanisms for representing these
|
|
relationships, agent systems resort to brittle, custom
|
|
coordination logic that is difficult to audit, debug, or modify.
|
|
The lack of formal workflow representation also prevents effective
|
|
human oversight integration, as stakeholders cannot easily
|
|
understand or intervene in complex multi-agent processes.
|
|
|
|
Agent Context Distribution mechanisms [draft-chang-agent-context-
|
|
interaction] have demonstrated that context sharing among agents
|
|
significantly impacts execution success rates, but current
|
|
approaches do not provide systematic ways to manage context
|
|
propagation through complex workflows. In multi-step processes,
|
|
intermediate results from one task often serve as inputs to
|
|
downstream tasks, creating context dependencies that must be
|
|
carefully managed to ensure workflow integrity. Existing protocols
|
|
lack standardized approaches for maintaining execution context
|
|
across task boundaries, leading to information loss, redundant
|
|
computation, and coordination failures that compromise overall
|
|
system reliability.
|
|
|
|
Fault tolerance and recovery represent critical gaps in current
|
|
multi-agent coordination approaches. Real-world agent systems must
|
|
handle various failure modes including agent unavailability, task
|
|
timeouts, resource constraints, and partial execution failures.
|
|
Without systematic checkpoint and recovery mechanisms, workflows
|
|
often must restart completely when any component fails, leading to
|
|
inefficient resource utilization and poor user experience. The
|
|
absence of standardized rollback capabilities also complicates
|
|
human intervention scenarios, where domain experts may need to
|
|
modify workflow parameters or task assignments based on
|
|
intermediate results or changing requirements.
|
|
|
|
Scalability challenges emerge when current coordination approaches
|
|
encounter workflows with dozens or hundreds of interdependent
|
|
tasks distributed across multiple agent instances. Simple
|
|
centralized coordination quickly becomes a bottleneck, while fully
|
|
decentralized approaches struggle with consistency and deadlock
|
|
prevention. The lack of standardized protocols for parallel task
|
|
execution, resource allocation, and progress monitoring prevents
|
|
agent systems from efficiently utilizing available computational
|
|
resources. Additionally, without formal workflow representation,
|
|
it becomes difficult to optimize task scheduling, predict resource
|
|
requirements, or provide meaningful progress indicators to human
|
|
stakeholders.
|
|
|
|
These limitations necessitate a framework that provides:
|
|
structured representation of complex workflows with explicit
|
|
dependency management; standardized protocols for parallel
|
|
execution and agent coordination; systematic checkpoint and
|
|
recovery mechanisms that enable fault tolerance and human
|
|
intervention; integration with existing agent authorization and
|
|
context distribution mechanisms; and scalable execution patterns
|
|
that can accommodate workflows ranging from simple sequential
|
|
processes to complex parallel computations involving multiple
|
|
agent populations.
|
|
|
|
4. Agent Task DAG Framework
|
|
|
|
This section defines the core data model and execution semantics
|
|
for the Agent Task DAG framework. The framework provides a
|
|
structured approach for representing complex multi-agent workflows
|
|
as directed acyclic graphs, where individual tasks are modeled as
|
|
nodes and dependencies between tasks are represented as edges. The
|
|
data model builds upon existing agent protocol foundations while
|
|
introducing specific constructs needed for distributed workflow
|
|
orchestration.
|
|
|
|
4.1. DAG Data Model
|
|
|
|
An Agent Task DAG MUST be represented as a JSON object [RFC8259]
|
|
that contains the complete specification of a workflow. The DAG
|
|
structure consists of three primary components: metadata
|
|
describing the overall workflow, a collection of task nodes
|
|
representing individual units of work, and dependency
|
|
relationships that define execution ordering constraints. Each DAG
|
|
MUST include a unique identifier, version information, and
|
|
execution parameters that govern how the workflow should be
|
|
processed.
|
|
|
|
Task nodes within the DAG represent atomic units of work that can
|
|
be executed by autonomous agents. Each task node MUST specify its
|
|
execution requirements, including required agent capabilities,
|
|
input and output data schemas, and execution constraints such as
|
|
timeouts or resource limits. Task nodes SHOULD reference
|
|
standardized task types as defined in [draft-cui-ai-agent-task]
|
|
where applicable, enabling interoperability across different agent
|
|
implementations. The task specification MUST include sufficient
|
|
information for agents to determine their capability to execute
|
|
the task and negotiate execution parameters.
|
|
|
|
Dependency relationships between task nodes are expressed through
|
|
edge definitions that establish partial ordering constraints over
|
|
the DAG. Each edge MUST specify source and target task nodes, with
|
|
the semantic meaning that the target task cannot begin execution
|
|
until the source task has completed successfully. Edges MAY
|
|
include conditional execution logic, allowing for branching
|
|
workflows based on the results of predecessor tasks. The framework
|
|
supports both data dependencies, where output from one task serves
|
|
as input to another, and control dependencies, where task ordering
|
|
is required for correctness without direct data flow.
|
|
|
|
4.2. Execution Context Management
|
|
|
|
The Execution Context provides the runtime environment for DAG
|
|
processing and maintains state information throughout workflow
|
|
execution. The execution context MUST track the current state of
|
|
each task node, intermediate results produced during execution,
|
|
and metadata about agent assignments for each task. Context
|
|
information SHOULD be distributed among participating agents using
|
|
the mechanisms defined in [draft-chang-agent-context-interaction]
|
|
to ensure consistent state visibility across the multi-agent
|
|
system.
|
|
|
|
Agent binding within the execution context associates task nodes
|
|
with specific agent instances or agent capability requirements.
|
|
The framework supports both static binding, where task assignments
|
|
are predetermined before execution begins, and dynamic binding,
|
|
where task assignments are resolved at runtime based on agent
|
|
availability and capability matching. When integrated with Agent
|
|
Authorization Profiles [draft-aap-oauth-profile], the execution
|
|
context MUST validate that assigned agents possess the necessary
|
|
authorization claims to execute their bound tasks.
|
|
|
|
Checkpoint creation within the execution context enables
|
|
persistent state management and recovery capabilities. The
|
|
framework MUST support checkpoint creation at configurable
|
|
intervals, capturing the complete state of DAG execution including
|
|
task completion status, intermediate results, and current agent
|
|
assignments. Checkpoints SHOULD be created automatically before
|
|
task nodes that are marked as requiring human oversight, enabling
|
|
rollback to known-good states when human intervention modifies the
|
|
workflow execution path.
|
|
|
|
4.3. Task Execution Semantics
|
|
|
|
Task execution within the DAG framework follows a coordination
|
|
model where a DAG Coordinator orchestrates workflow progress while
|
|
individual agents execute assigned tasks autonomously. The
|
|
coordinator MUST maintain the global view of DAG state and
|
|
determine when task dependencies have been satisfied, enabling
|
|
parallel execution of independent task branches. Task scheduling
|
|
MUST respect dependency constraints while maximizing parallel
|
|
execution opportunities to optimize overall workflow completion
|
|
time.
|
|
|
|
The framework defines specific execution states for task nodes
|
|
including pending, ready, executing, completed, failed, and
|
|
skipped. State transitions MUST be coordinated through the DAG
|
|
Coordinator to ensure consistency across the distributed system.
|
|
When a task transitions to the ready state, the coordinator SHOULD
|
|
initiate agent assignment and task negotiation protocols to begin
|
|
execution. Failed tasks MAY trigger rollback procedures or
|
|
alternate execution paths depending on the configured failure
|
|
handling policies.
|
|
|
|
Integration with existing agent protocols occurs through
|
|
standardized interfaces that abstract the underlying communication
|
|
mechanisms. The framework MUST support protocol-agnostic bindings
|
|
that allow integration with different agent discovery,
|
|
authorization, and communication protocols. Task execution
|
|
requests SHOULD include structured claims as defined in [draft-
|
|
aap-oauth-profile] when agent authorization is required, ensuring
|
|
that security and audit requirements are maintained throughout the
|
|
distributed workflow execution.
|
|
|
|
5. Task Execution Protocol
|
|
|
|
The Agent Task DAG execution protocol defines a standardized
|
|
approach for coordinating the execution of complex workflows
|
|
across multiple autonomous agents. The protocol builds upon
|
|
existing agent communication mechanisms and authorization
|
|
frameworks, particularly the Agent Authorization Profile [draft-
|
|
aap-oauth-profile], to enable secure and auditable workflow
|
|
execution. The execution model supports both centralized
|
|
coordination through a designated DAG Coordinator and distributed
|
|
execution patterns where agents negotiate task assignments
|
|
dynamically.
|
|
|
|
The execution protocol operates through a series of well-defined
|
|
phases: initialization, task scheduling, parallel execution, and
|
|
completion verification. During initialization, the DAG
|
|
Coordinator validates the workflow structure, resolves task
|
|
bindings to available agents, and establishes the execution
|
|
context. Task scheduling follows topological ordering of the DAG,
|
|
with the coordinator identifying executable tasks (those with
|
|
satisfied dependencies) and dispatching them to appropriate
|
|
agents. The protocol supports parallel execution of independent
|
|
tasks while maintaining strict dependency ordering through state
|
|
synchronization mechanisms.
|
|
|
|
Agent coordination during DAG execution relies on structured
|
|
message exchanges that convey task assignments, status updates,
|
|
and result propagation. Task assignment messages MUST include the
|
|
complete task specification, execution context parameters, and any
|
|
required authorization tokens following the Agent Authorization
|
|
Profile format [draft-aap-oauth-profile]. Agents respond with
|
|
acceptance confirmations that include estimated execution time and
|
|
resource requirements. Status update messages provide real-time
|
|
execution progress and MUST be sent at configurable intervals to
|
|
enable failure detection and recovery operations.
|
|
|
|
State synchronization across the multi-agent system is achieved
|
|
through a combination of checkpoint mechanisms and distributed
|
|
context sharing. The DAG Coordinator maintains the authoritative
|
|
execution state, including task completion status, intermediate
|
|
results, and dependency satisfaction tracking. Agent Context
|
|
Distribution mechanisms [draft-chang-agent-context-interaction]
|
|
are employed to efficiently share relevant context information
|
|
among participating agents, reducing redundant data transfer while
|
|
ensuring each agent has access to necessary execution context.
|
|
Intermediate results from completed tasks are propagated to
|
|
dependent tasks through structured result messages that preserve
|
|
data lineage and enable audit trail construction.
|
|
|
|
The protocol defines specific message formats for each phase of
|
|
execution, using JSON [RFC8259] structures that can be embedded
|
|
within existing agent communication protocols. Task execution
|
|
requests include fields for task identification, input parameters,
|
|
execution constraints, and callback endpoints for status
|
|
reporting. Result messages contain structured output data,
|
|
execution metadata, and quality indicators that enable downstream
|
|
tasks to validate input requirements. Error and exception messages
|
|
provide detailed failure information including error codes,
|
|
diagnostic data, and suggested recovery actions.
|
|
|
|
Parallel execution coordination addresses the challenges of
|
|
resource contention and optimal scheduling across heterogeneous
|
|
agent capabilities. The protocol supports both push-based task
|
|
assignment, where the coordinator actively distributes work, and
|
|
pull-based execution, where agents request tasks based on their
|
|
availability and capabilities. Load balancing mechanisms consider
|
|
agent capacity, current workload, and task affinity when making
|
|
scheduling decisions. The protocol also defines procedures for
|
|
dynamic rescheduling when agents become unavailable or when
|
|
execution time estimates prove inaccurate, ensuring workflow
|
|
completion despite individual agent failures.
|
|
|
|
6. Checkpoint and Recovery Mechanisms
|
|
|
|
The Agent Task DAG framework MUST provide robust checkpoint and
|
|
recovery mechanisms to ensure workflow resilience and enable
|
|
graceful handling of failures, interruptions, and human
|
|
intervention points. Checkpoints represent persistent snapshots of
|
|
the DAG execution state at specific points in the workflow,
|
|
capturing sufficient information to resume execution from that
|
|
point or rollback to a previous stable state. The framework
|
|
defines three types of checkpoints: automatic checkpoints created
|
|
at predefined intervals or task completion boundaries, explicit
|
|
checkpoints requested by agents or human operators, and recovery
|
|
checkpoints generated immediately before high-risk operations that
|
|
may require rollback.
|
|
|
|
Checkpoint creation MUST capture the complete execution context as
|
|
defined in Section 4, including the current state of all task
|
|
nodes, intermediate results, agent assignments, and security
|
|
context derived from Agent Authorization Profiles [draft-aap-
|
|
oauth-profile]. Each checkpoint MUST include a unique identifier,
|
|
timestamp, DAG version, execution state hash, and references to
|
|
any external resources or agent context information as specified
|
|
in [draft-chang-agent-context-interaction]. The checkpoint data
|
|
structure SHOULD be serialized using JSON [RFC8259] with optional
|
|
compression for large state objects, and MUST be digitally signed
|
|
to ensure integrity and authenticity. Checkpoints MAY be stored in
|
|
distributed storage systems to ensure availability across multiple
|
|
DAG Coordinators.
|
|
|
|
The rollback procedure enables the DAG execution to revert to a
|
|
previous checkpoint when failures occur or human intervention
|
|
requires undoing completed work. When a rollback is initiated, the
|
|
DAG Coordinator MUST notify all participating agents of the
|
|
rollback operation, invalidate any results produced after the
|
|
target checkpoint, and restore the execution context to the
|
|
checkpoint state. Agents MUST acknowledge the rollback operation
|
|
and may need to perform agent-specific cleanup operations such as
|
|
releasing resources or notifying external systems. The rollback
|
|
operation MUST preserve audit trails by maintaining records of
|
|
both the original execution and the rollback event, ensuring
|
|
compliance with security and regulatory requirements.
|
|
|
|
Failure recovery strategies operate at multiple levels within the
|
|
DAG execution framework, from individual task failures to complete
|
|
coordinator failures. For task-level failures, the framework
|
|
supports automatic retry with exponential backoff, task
|
|
reassignment to alternative agents with compatible capabilities,
|
|
and conditional continuation where dependent tasks may proceed
|
|
with degraded inputs. When coordinator failures occur, recovery
|
|
mechanisms leverage distributed checkpoints and coordinator
|
|
election protocols to restore execution state on alternative
|
|
infrastructure. The framework MUST support human-in-the-loop
|
|
recovery scenarios where automated recovery is insufficient,
|
|
providing interfaces for human operators to inspect checkpoint
|
|
states, approve recovery actions, and inject corrective context
|
|
information.
|
|
|
|
The checkpoint and recovery mechanisms MUST integrate with the
|
|
agent authorization framework to ensure that recovery operations
|
|
maintain appropriate security boundaries and access controls.
|
|
Recovery operations SHOULD verify that participating agents still
|
|
possess valid authorization profiles and may require re-
|
|
authentication if significant time has elapsed since checkpoint
|
|
creation. The framework MUST provide configurable retention
|
|
policies for checkpoints, balancing storage efficiency with
|
|
recovery requirements, and MUST support secure deletion of
|
|
checkpoint data containing sensitive information when retention
|
|
periods expire or workflows complete successfully.
|
|
|
|
7. Integration with Existing Agent Protocols
|
|
|
|
This section describes how the Agent Task DAG framework integrates
|
|
with existing agent authorization, discovery, and communication
|
|
protocols to provide a comprehensive multi-agent workflow
|
|
execution environment. The framework is designed to be protocol-
|
|
agnostic while providing specific bindings for commonly used agent
|
|
protocols, enabling organizations to adopt DAG-based workflows
|
|
within their existing agent infrastructure.
|
|
|
|
The DAG framework builds upon the Agent Authorization Profile
|
|
(AAP) [draft-aap-oauth-profile] to establish secure task execution
|
|
contexts. When a DAG Coordinator initiates workflow execution, it
|
|
MUST obtain appropriate authorization tokens for each
|
|
participating agent using the structured claims defined in AAP.
|
|
The task context claim within the agent's JWT token includes the
|
|
DAG identifier, task node assignments, and operational constraints
|
|
specific to the workflow. This approach ensures that agents can
|
|
verify their authorization to execute specific tasks within the
|
|
broader workflow context while maintaining the delegation chains
|
|
and human oversight requirements established in their
|
|
authorization profiles.
|
|
|
|
Agent discovery and capability matching for DAG execution
|
|
leverages existing agent discovery protocols while extending them
|
|
with DAG-specific metadata. Agents participating in DAG workflows
|
|
SHOULD advertise their capabilities using structured capability
|
|
descriptors that include supported task types, execution
|
|
constraints, and checkpoint compatibility. The DAG Coordinator
|
|
uses this information during the task binding process to assign
|
|
task nodes to appropriate agents. When multiple agents are capable
|
|
of executing a particular task type, the coordinator MAY use load
|
|
balancing, geographic distribution, or other selection criteria to
|
|
optimize workflow execution.
|
|
|
|
Context distribution among agents executing DAG workflows follows
|
|
the mechanisms defined in [draft-chang-agent-context-interaction],
|
|
with specific extensions for DAG execution state management. The
|
|
execution context for a DAG workflow includes the complete graph
|
|
structure, current execution state, intermediate task results, and
|
|
checkpoint metadata. Agents MUST receive sufficient context to
|
|
execute their assigned tasks while minimizing the distribution of
|
|
sensitive information to unauthorized agents. The framework
|
|
supports both push-based context distribution, where the DAG
|
|
Coordinator sends relevant context to agents before task
|
|
execution, and pull-based approaches where agents request specific
|
|
context elements as needed.
|
|
|
|
The framework provides protocol bindings for common agent
|
|
communication patterns including HTTP-based REST APIs [RFC9110],
|
|
message queuing systems, and real-time communication protocols.
|
|
Each binding specifies how DAG execution messages are encoded, how
|
|
task results are reported, and how checkpoint operations are
|
|
coordinated across the distributed agent environment. Protocol-
|
|
specific considerations such as connection management, retry
|
|
mechanisms, and error handling are addressed within each binding
|
|
specification. For HTTP-based bindings, the framework defines
|
|
standardized endpoints for task execution, status reporting, and
|
|
checkpoint operations that can be implemented by any agent
|
|
supporting the DAG execution protocol.
|
|
|
|
Integration with existing agent task protocols [draft-cui-ai-
|
|
agent-task] is achieved through task node adapters that translate
|
|
between DAG task specifications and protocol-specific task
|
|
representations. These adapters handle differences in task
|
|
parameterization, result formatting, and execution semantics while
|
|
preserving the dependency relationships and execution guarantees
|
|
required by the DAG framework. The framework also supports
|
|
integration with audit and compliance systems through standardized
|
|
logging interfaces that capture task execution events,
|
|
authorization decisions, and checkpoint operations in formats
|
|
compatible with existing security and compliance tools.
|
|
|
|
8. Security Considerations
|
|
|
|
The Agent Task DAG framework introduces unique security challenges
|
|
that extend beyond traditional single-agent systems. Multi-agent
|
|
workflows create expanded attack surfaces through inter-agent
|
|
communication channels, shared execution contexts, and distributed
|
|
state management. Malicious actors may attempt to inject
|
|
unauthorized tasks into DAG structures, manipulate task
|
|
dependencies to create privilege escalation paths, or exploit
|
|
checkpoint mechanisms to gain persistent access to workflow state.
|
|
The distributed nature of DAG execution also amplifies risks
|
|
related to agent impersonation, context poisoning, and
|
|
unauthorized workflow modification during execution.
|
|
|
|
Task authorization within DAG workflows MUST leverage the Agent
|
|
Authorization Profile [draft-aap-oauth-profile] to establish fine-
|
|
grained permissions for each task node. Each task node SHOULD
|
|
include authorization requirements that specify which agent
|
|
capabilities, delegation chains, and operational constraints are
|
|
required for execution. The DAG Coordinator MUST verify that
|
|
assigned agents possess valid JWT tokens with appropriate
|
|
structured claims before initiating task execution. When tasks
|
|
involve sensitive operations or access to protected resources,
|
|
implementations SHOULD require fresh token validation rather than
|
|
relying on cached authorization state. Multi-step workflows that
|
|
span extended time periods MUST implement token refresh mechanisms
|
|
to maintain security throughout DAG execution.
|
|
|
|
Context isolation represents a critical security boundary in
|
|
multi-agent DAG systems. Execution contexts MUST be isolated
|
|
between different DAG instances to prevent information leakage and
|
|
unauthorized access to intermediate results. Implementations
|
|
SHOULD use cryptographic techniques to protect context data in
|
|
transit and at rest, particularly when context distribution
|
|
mechanisms [draft-chang-agent-context-interaction] are employed
|
|
across network boundaries. Task nodes that handle sensitive data
|
|
MUST implement appropriate data classification and handling
|
|
controls, ensuring that context information is only accessible to
|
|
authorized agents within the workflow. The framework SHOULD
|
|
support configurable context sharing policies that allow
|
|
administrators to define which context elements can be shared
|
|
between tasks and which must remain isolated.
|
|
|
|
Audit trail requirements for DAG execution are more complex than
|
|
single-agent scenarios due to the distributed and potentially
|
|
parallel nature of task execution. Implementations MUST maintain
|
|
comprehensive logs that capture DAG initiation, task assignments,
|
|
agent authorizations, execution outcomes, and any human
|
|
intervention points. Audit records SHOULD include cryptographic
|
|
signatures or integrity mechanisms to prevent tampering and
|
|
support forensic analysis. The checkpoint and recovery mechanisms
|
|
introduce additional logging requirements, as rollback operations
|
|
and failure recovery attempts MUST be fully auditable.
|
|
Organizations operating in regulated environments MAY require
|
|
enhanced audit capabilities that provide real-time monitoring of
|
|
DAG execution state and automated alerts for security policy
|
|
violations.
|
|
|
|
The integration of human oversight points within DAG workflows
|
|
creates additional security considerations around authentication,
|
|
authorization, and workflow integrity. Human operators MUST be
|
|
properly authenticated before approving task continuations or
|
|
modifying workflow parameters. The framework SHOULD support multi-
|
|
factor authentication and role-based access controls for human
|
|
intervention points. Implementations MUST ensure that human
|
|
approval requirements cannot be bypassed through agent
|
|
coordination or DAG manipulation. When human operators modify
|
|
workflow parameters or approve exceptional conditions, these
|
|
actions MUST be cryptographically signed and integrated into the
|
|
workflow's audit trail to maintain end-to-end accountability.
|
|
|
|
9. IANA Considerations
|
|
|
|
This document introduces several new protocol elements and
|
|
identifiers that require IANA registration to ensure global
|
|
uniqueness and interoperability across implementations. The Agent
|
|
Task DAG framework extends existing agent communication protocols
|
|
with new message types, node classifications, and execution state
|
|
identifiers that must be standardized for consistent
|
|
implementation.
|
|
|
|
The specification requires the establishment of a new "Agent Task
|
|
DAG Parameters" registry to manage the various identifiers used
|
|
within the framework. This registry MUST include sub-registries
|
|
for DAG node types, edge relationship types, execution states,
|
|
checkpoint types, and recovery action identifiers. Each sub-
|
|
registry MUST follow the "Specification Required" registration
|
|
policy as defined in [RFC8126], with designated experts reviewing
|
|
submissions for technical correctness and consistency with the
|
|
overall framework architecture. The registry MUST also accommodate
|
|
extensions that integrate with existing agent authorization
|
|
profiles as defined in [draft-aap-oauth-profile].
|
|
|
|
A new "application/vnd.ietf.agent-task-dag+json" media type
|
|
registration is REQUIRED for DAG workflow documents. This media
|
|
type MUST reference this specification and follow the JSON format
|
|
requirements specified in [RFC8259]. The media type enables proper
|
|
content negotiation when agents exchange DAG definitions and
|
|
execution state information. Additionally, new URI schemes "agent-
|
|
dag:" and "agent-task:" are proposed for identifying DAG instances
|
|
and individual task nodes respectively, requiring registration in
|
|
the "Uniform Resource Identifier (URI) Schemes" registry
|
|
maintained by IANA.
|
|
|
|
The framework introduces new JWT claim names for representing DAG
|
|
execution context and task bindings within agent authorization
|
|
tokens, extending the structured claims mechanism defined in
|
|
[draft-aap-oauth-profile]. These claim names MUST be registered in
|
|
the "JSON Web Token Claims" registry established by [RFC7519]. The
|
|
new claims include "dagid", "tasknode", "executioncontext",
|
|
"checkpointref", and "recovery_state", each with specific semantic
|
|
meanings within the DAG execution protocol. Registration of these
|
|
claims ensures consistent interpretation across different agent
|
|
implementations and authorization servers.
|
|
|
|
Finally, new HTTP header fields "DAG-Execution-ID" and "DAG-
|
|
Checkpoint" are introduced for coordination between agents during
|
|
DAG execution. These headers MUST be registered in the "Hypertext
|
|
Transfer Protocol (HTTP) Field Name Registry" as defined in
|
|
[RFC9110]. The headers enable stateless coordination mechanisms
|
|
and support the checkpoint and recovery procedures specified in
|
|
this framework, while maintaining compatibility with existing
|
|
HTTP-based agent communication protocols.
|
|
|
|
10. References
|
|
|
|
10.1. Normative References
|
|
|
|
[RFC 2119]
|
|
RFC 2119
|
|
|
|
[RFC 8174]
|
|
RFC 8174
|
|
|
|
[RFC 8259]
|
|
RFC 8259
|
|
|
|
[RFC 7519]
|
|
RFC 7519
|
|
|
|
[draft-aap-oauth-profile]
|
|
draft-aap-oauth-profile
|
|
|
|
[draft-cui-ai-agent-task]
|
|
draft-cui-ai-agent-task
|
|
|
|
[draft-guy-bary-stamp-protocol]
|
|
draft-guy-bary-stamp-protocol
|
|
|
|
10.2. Informative References
|
|
|
|
[RFC 6749]
|
|
RFC 6749
|
|
|
|
[RFC 9110]
|
|
RFC 9110
|
|
|
|
[draft-chang-agent-context-interaction]
|
|
draft-chang-agent-context-interaction
|
|
|
|
[draft-liu-dmsc-acps-arc]
|
|
draft-liu-dmsc-acps-arc
|
|
|
|
[draft-rosenberg-aiproto-framework]
|
|
draft-rosenberg-aiproto-framework
|
|
|
|
[draft-song-oauth-ai-agent-collaborate-authz]
|
|
draft-song-oauth-ai-agent-collaborate-authz
|
|
|
|
[draft-mao-rtgwg-apn-framework-for-ioa]
|
|
draft-mao-rtgwg-apn-framework-for-ioa
|
|
|
|
[draft-nandakumar-ai-agent-moq-transport]
|
|
draft-nandakumar-ai-agent-moq-transport
|
|
|
|
|
|
Author's Address
|
|
|
|
Generated by IETF Draft Analyzer
|
|
Family: agent-ecosystem
|
|
2026-03-04
|