feat: add draft data, gap analysis report, and workspace config

2026-04-06 18:47:15 +02:00
parent 4f310407b0
commit 2506b6325a
189 changed files with 62649 additions and 0 deletions
--- a/workspace/drafts/new-drafts/draft-b-atd-agent-task-dag-01.md
+++ b/workspace/drafts/new-drafts/draft-b-atd-agent-task-dag-01.md
@@ -0,0 +1,725 @@
+---
+title: "Agent Task DAG (ATD): Execution Model, Checkpoints, and Recovery"
+abbrev: "ATD"
+category: std
+docname: draft-atd-agent-task-dag-01
+submissiontype: IETF
+number:
+date:
+v: 3
+area: "OPS"
+workgroup: "NMOP"
+keyword:
+  - agent DAG
+  - checkpoint
+  - rollback
+  - error recovery
+  - circuit breaker
+
+author:
+  -
+    fullname: TBD
+    organization: Independent
+    email: placeholder@example.com
+
+normative:
+  RFC2119:
+  RFC8174:
+  RFC8446:
+  RFC9110:
+  RFC8615:
+  I-D.nennemann-wimse-ect:
+    title: "Execution Context Tokens for Distributed Agentic Workflows"
+    target: https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/
+  I-D.nennemann-agent-dag-hitl-safety:
+    title: "Agent Context Policy Token: DAG Delegation with Human Override"
+    target: https://datatracker.ietf.org/doc/draft-nennemann-agent-dag-hitl-safety/
+
+informative:
+
+--- abstract
+
+This document defines the Agent Task DAG (ATD) specification:
+execution semantics, checkpoints, error signaling, circuit
+breakers, and rollback for agent workflows.  ATD does not define a
+new DAG or token format.  It defines when agents MUST emit ECT
+nodes, what those nodes mean, and how to recover when things go
+wrong.  Checkpoints, errors, and rollback results are ECT nodes
+with specific `exec_act` values and `ext` claims.  Rollback walks
+the ECT DAG backwards.  Circuit breakers contain cascading
+failures.  Resource hints enable scheduling.  The protocol is
+transport-agnostic and builds on ECT for evidence and ACP-DAG-HITL
+for policy.
+
+--- middle
+
+# Introduction
+
+Autonomous agents increasingly make unsupervised decisions, yet no
+standard exists for how agents checkpoint state, signal errors to
+peers, contain cascading failures, or roll back decisions gone
+wrong.
+
+ATD borrows proven patterns from distributed systems: checkpoints
+from database transactions, circuit breakers from microservice
+architectures, and rollback from version control.  It adapts these
+to agent workflows where actions may be partially reversible and
+where the agent that caused the error may not be the best one to
+fix it.
+
+ATD does not define a new DAG format.  The ECT DAG
+{{I-D.nennemann-wimse-ect}} IS the execution graph.  ATD defines
+the semantics of specific node types within that graph.
+
+Design principles:
+
+1. Agents that take consequential actions MUST be able to undo
+   them, or MUST declare them irreversible upfront.
+2. Failure containment takes priority over failure diagnosis.
+3. The protocol adds minimal overhead to the happy path.
+
+# Conventions and Definitions
+
+{::boilerplate bcp14-tagged}
+
+Checkpoint:
+: An ECT node recording agent state before a consequential action,
+  sufficient to restore the system to that state.
+
+Circuit Breaker:
+: A mechanism that stops an agent from propagating requests to a
+  failing downstream agent, preventing cascading failures.
+
+Rollback:
+: The process of reverting an agent's actions and state to a
+  previously recorded checkpoint.
+
+Blast Radius:
+: The set of agents and systems affected by a single failure.
+
+Consequential Action:
+: An action that modifies external state (network configuration,
+  database records, API calls with side effects) such that
+  reversal requires explicit effort.
+
+# Execution Semantics {#execution}
+
+## Topological Order
+
+Tasks in the ECT DAG MUST execute in topological order: a task
+MUST NOT begin execution until all tasks referenced by its ECT
+`par` claims are in state `done`.
+
+Two tasks with no common ancestor in the DAG (no shared `par`
+lineage) MAY execute concurrently.  Orchestrators SHOULD
+exploit this parallelism for performance.
+
+Circular dependencies are prohibited.  Agents MUST reject
+ACP-DAG-HITL delegation DAGs containing cycles.
+
+## Workflow Boundary ECTs
+
+When a workflow begins, the initiating agent MUST emit:
+
+~~~json
+{
+  "exec_act": "atd:workflow_start",
+  "ext": {
+    "atd.wf_id": "wf-uuid",
+    "atd.description": "BGP failover workflow",
+    "atd.node_count": 5
+  }
+}
+~~~
+{: #fig-wf-start title="Workflow Start ECT"}
+
+When the workflow reaches a terminal state (all leaf nodes
+complete or any node failed with no rollback path), the
+orchestrator MUST emit:
+
+~~~json
+{
+  "exec_act": "atd:workflow_complete",
+  "par": ["wf-start-ect-uuid"],
+  "ext": {
+    "atd.wf_id": "wf-uuid",
+    "atd.terminal_status": "success",
+    "atd.elapsed_s": 42
+  }
+}
+~~~
+{: #fig-wf-complete title="Workflow Complete ECT"}
+
+Terminal status values: `success`, `partial`, `failed`,
+`rolled_back`, `escalated`.
+
+# Node States {#node-states}
+
+Each task node in the ECT DAG has an implicit state derived from
+subsequent ECT nodes:
+
+- **pending**: A delegation node exists in ACP-DAG-HITL but no
+  corresponding ECT has been emitted.
+- **running**: An ECT matching the task type has been emitted
+  but no completion or error ECT follows.
+- **done**: A completion ECT (or the next `par`-linked ECT) exists.
+- **failed**: An `atd:error` ECT references this node.
+- **rolled_back**: An `atd:rollback_result` ECT references this
+  node's checkpoint.
+- **escalated**: The task failed and a human has been notified
+  per HITL escalation rules.
+
+# Checkpoint Mechanism {#checkpoints}
+
+## Checkpoint Placement Policy
+
+An ATD-compliant agent MUST create a checkpoint before any action
+it classifies as consequential.  The following actions are always
+consequential and MUST be checkpointed:
+
+1. Any modification to network device configuration.
+2. Any write to a shared database or external data store.
+3. Any API call with side effects (non-idempotent HTTP methods).
+4. Any delegation to another agent that will itself take
+   consequential actions.
+
+The following SHOULD be checkpointed:
+
+1. Long-running computations (> `atd.resource_timeout_s`).
+2. Actions that cannot be verified without external state.
+
+The following are exempt from checkpoint requirements:
+
+1. Read-only queries.
+2. Sending notifications with no side effects.
+3. Internal state computations with no external observable effect.
+
+## Checkpoint ECT Format
+
+A checkpoint is an ECT with:
+
+- `exec_act`: `"atd:checkpoint"`
+- `par`: the ECT of the action being checkpointed
+
+~~~json
+{
+  "jti": "ckpt-uuid",
+  "exec_act": "atd:checkpoint",
+  "par": ["action-ect-uuid"],
+  "out_hash": "sha256-of-agent-state-snapshot",
+  "ext": {
+    "atd.reversible": true,
+    "atd.rollback_uri": "https://agent-b.example.com/.well-known/atd/rollback",
+    "atd.target": "router-07.example.com",
+    "atd.description": "Update BGP peer config",
+    "atd.ttl": 86400
+  }
+}
+~~~
+{: #fig-checkpoint title="Checkpoint ECT"}
+
+The `atd.reversible` field MUST be present.  If `false`, the agent
+declares that this action cannot be automatically undone and
+rollback requests MUST be escalated per the ACP-DAG-HITL
+`unreachable_human` policy.
+
+The `out_hash` provides integrity verification: the agent hashes
+its state at checkpoint time so that rollback can verify it is
+restoring to an authentic prior state.
+
+Checkpoints MUST be stored for at least `atd.ttl` seconds.  Agents
+SHOULD store checkpoints in durable storage that survives restarts.
+
+The rollback URI MUST be a well-known URI per {{RFC8615}} at the
+path `/.well-known/atd/rollback`.
+
+## Hierarchical Checkpoints
+
+Agents MAY create hierarchical checkpoints where a parent groups
+multiple child checkpoints from a multi-step operation.  Rolling
+back the parent rolls back all children.  The parent checkpoint's
+`par` array references all child checkpoint `jti` values.
+
+## Checkpoint `exec_act` Table
+
+| `exec_act` value | When emitted | Required `ext` fields |
+|-----------------|-------------|----------------------|
+| `atd:checkpoint` | Before consequential action | `atd.reversible`, `atd.rollback_uri`, `atd.ttl` |
+| `atd:error` | On failure detection | `atd.severity`, `atd.error_type`, `atd.checkpoint_id` |
+| `atd:circuit_open` | When error rate exceeds threshold | `atd.downstream_agent`, `atd.error_rate`, `atd.window_s` |
+| `atd:circuit_close` | When probe succeeds in HALF-OPEN | `atd.downstream_agent`, `atd.cooldown_s` |
+| `atd:rollback_request` | To initiate rollback | `atd.reason`, `atd.cascade` |
+| `atd:rollback_result` | Rollback complete or failed | `atd.status`, `atd.checkpoint_id`, `atd.cascaded` |
+| `atd:workflow_start` | Workflow begins | `atd.wf_id`, `atd.description` |
+| `atd:workflow_complete` | Workflow terminal | `atd.wf_id`, `atd.terminal_status` |
+{: #fig-actions title="ATD exec_act Values"}
+
+# Error Signaling {#errors}
+
+When an agent detects an error, it MUST emit an error ECT:
+
+- `exec_act`: `"atd:error"`
+- `par`: the ECT of the failed action
+
+~~~json
+{
+  "jti": "error-uuid",
+  "exec_act": "atd:error",
+  "par": ["failed-action-ect-uuid"],
+  "ext": {
+    "atd.severity": "critical",
+    "atd.error_type": "action_failed",
+    "atd.description": "BGP session did not establish",
+    "atd.checkpoint_id": "ckpt-uuid",
+    "atd.upstream_errors": []
+  }
+}
+~~~
+{: #fig-error title="Error ECT"}
+
+Severity levels (in increasing order): `info`, `warning`,
+`error`, `critical`.
+
+Error types: `action_failed`, `timeout`, `constraint_violation`,
+`resource_exhausted`, `upstream_cascade`, `unknown`.
+
+When an agent receives an error signal caused by an action it
+initiated, it MUST either:
+
+(a) Attempt automatic rollback of its checkpoint, or
+(b) Escalate per ACP-DAG-HITL HITL rules if the action was
+    irreversible.
+
+The `atd.upstream_errors` array allows agents to chain error
+context, building a causal trace from symptom to root cause.
+
+## HITL Escalation on Error
+
+Error ECTs with severity `critical` SHOULD trigger HITL
+escalation.  Deployments SHOULD define ACP-DAG-HITL rules such
+as:
+
+~~~json
+{
+  "id": "r-critical-error",
+  "trigger": {
+    "kind": "keyword_match",
+    "op": "eq",
+    "value": "critical",
+    "input_ref": "atd.severity"
+  },
+  "required_role": "operator:oncall",
+  "action": "escalate",
+  "allow_override": true,
+  "override_action": "continue"
+}
+~~~
+{: #fig-error-hitl title="HITL Rule for Critical Errors"}
+
+# Circuit Breaker Pattern {#circuit-breaker}
+
+Each agent MUST implement a circuit breaker for every downstream
+agent it communicates with.  The circuit breaker has three states:
+
+CLOSED (normal):
+: Requests flow through.  The agent tracks the error rate over a
+  sliding window (default: 60 seconds).
+
+OPEN (failure detected):
+: When the error rate exceeds a threshold (default: 50%), the
+  breaker opens.  All requests are immediately rejected.  The
+  agent MUST emit a circuit breaker open ECT:
+
+~~~json
+{
+  "exec_act": "atd:circuit_open",
+  "ext": {
+    "atd.downstream_agent": "spiffe://example.com/agent/b",
+    "atd.error_rate": 0.75,
+    "atd.window_s": 60
+  }
+}
+~~~
+{: #fig-circuit-open title="Circuit Breaker Open ECT"}
+
+HALF-OPEN (recovery probe):
+: After a cooldown period (default: 30s), the breaker allows one
+  probe request.  If it succeeds, the breaker returns to CLOSED
+  and MUST emit:
+
+~~~json
+{
+  "exec_act": "atd:circuit_close",
+  "ext": {
+    "atd.downstream_agent": "spiffe://example.com/agent/b",
+    "atd.cooldown_s": 30
+  }
+}
+~~~
+{: #fig-circuit-close title="Circuit Breaker Close ECT"}
+
+  If the probe fails, the breaker returns to OPEN with doubled
+  cooldown (exponential backoff, max 300s).
+
+## Circuit Breaker State Machine
+
+~~~
+         error_rate > threshold
+CLOSED ─────────────────────────► OPEN
+  ▲                                  │
+  │ probe success                    │ cooldown expires
+  │                                  ▼
+  └────────────────────────── HALF-OPEN
+         probe failure ──► OPEN (cooldown * 2)
+~~~
+{: #fig-fsm title="Circuit Breaker State Machine"}
+
+## Coordinated Circuit Breaking
+
+When multiple agents share a downstream dependency, each maintains
+its own circuit breaker independently.  However, agents SHOULD
+publish circuit breaker state via their ECT stream so peers can
+observe the signal.
+
+If an orchestrator observes N circuit breakers opening for the
+same downstream agent within a short window, it SHOULD initiate
+a HITL escalation rather than allowing N parallel recovery probes.
+
+## Circuit Breaker Policy Configuration
+
+Circuit breaker thresholds can be configured as ACP-DAG-HITL
+node constraints:
+
+~~~json
+{
+  "constraints": {
+    "atd.circuit_threshold": 0.5,
+    "atd.circuit_window_s": 60
+  }
+}
+~~~
+{: #fig-circuit-policy title="Circuit Breaker Policy"}
+
+# Rollback Protocol {#rollback}
+
+## Basic Rollback
+
+A rollback is initiated by emitting a rollback request ECT and
+sending an HTTP POST to the target agent's rollback endpoint:
+
+~~~
+POST /.well-known/atd/rollback HTTP/1.1
+Content-Type: application/json
+Execution-Context: <rollback-request-ect>
+~~~
+
+- `exec_act`: `"atd:rollback_request"`
+- `par`: the checkpoint ECT to roll back to
+
+~~~json
+{
+  "exec_act": "atd:rollback_request",
+  "par": ["ckpt-uuid"],
+  "ext": {
+    "atd.reason": "Upstream action caused cascading failure",
+    "atd.cascade": true
+  }
+}
+~~~
+{: #fig-rollback-req title="Rollback Request ECT"}
+
+When `atd.cascade` is `true`, the receiving agent MUST also
+initiate rollback of any downstream checkpoints created as a
+consequence of the checkpointed action.
+
+The agent MUST respond with a rollback result ECT:
+
+~~~json
+{
+  "exec_act": "atd:rollback_result",
+  "par": ["rollback-request-uuid"],
+  "out_hash": "sha256-of-restored-state",
+  "ext": {
+    "atd.status": "completed",
+    "atd.checkpoint_id": "ckpt-uuid",
+    "atd.cascaded": [
+      {"agent": "spiffe://example.com/agent/c", "status": "completed"},
+      {"agent": "spiffe://example.com/agent/d", "status": "escalated"}
+    ]
+  }
+}
+~~~
+{: #fig-rollback-result title="Rollback Result ECT"}
+
+Status values: `completed`, `partial`, `escalated`, `failed`.
+
+`escalated` means the action was irreversible and a human operator
+has been notified per ACP-DAG-HITL `unreachable_human` policy.
+
+## Partial Rollback and Blast Radius Containment
+
+When a failure occurs in the middle of a DAG, it is often
+undesirable to roll back the entire workflow.  ATD defines
+partial rollback as rolling back the failed subgraph while
+preserving completed sibling branches.
+
+Partial rollback MUST only proceed if:
+
+1. The checkpoints to be rolled back are in the same workflow
+   (`atd.wf_id`).
+2. No completed sibling task depends on the output of the
+   failed task (verified by walking the DAG forward from the
+   checkpoint).
+
+The blast radius is the set of agents holding checkpoints that
+are descendants of the failed node.  Orchestrators SHOULD
+compute blast radius before initiating cascade rollback to
+avoid unnecessary disruption.
+
+## Rollback Timeout and Escalation
+
+Rollback requests MUST include a timeout implicitly derived from
+the original checkpoint's `atd.ttl`.  If rollback is not
+completed within `atd.ttl / 2` seconds, the agent MUST:
+
+1. Emit an `atd:error` with `error_type: "timeout"` and
+   `atd.description` noting rollback timeout.
+2. Escalate to HITL per {{hitl-escalation}}.
+
+Agents MUST implement idempotent rollback: receiving the same
+rollback request ECT `jti` twice MUST return the same result.
+
+## Rollback Authorization {#rollback-authz}
+
+Only agents within the same workflow (`wid`) with checkpoint
+lineage in the DAG SHOULD be authorized to request rollback.
+Rollback requests from outside the originating workflow MUST be
+rejected with HTTP 403.
+
+# Interaction with HITL {#hitl-escalation}
+
+ATD escalates to HITL in the following scenarios:
+
+1. **Irreversible action failure**: An error ECT with
+   `atd.reversible: false` on the checkpoint MUST trigger
+   HITL Level 2 (approval required) per the companion HITL
+   specification.
+
+2. **Rollback failure**: A rollback result with `atd.status:
+   "failed"` MUST trigger HITL Level 3 (STOP) on the workflow.
+
+3. **Cascaded rollback of critical nodes**: When `atd.cascade:
+   true` rollback propagates to a node with `atd.severity:
+   critical`, HITL SHOULD be triggered at Level 1 (PAUSE)
+   to allow human review before proceeding.
+
+4. **Circuit breaker permanent open**: If a circuit breaker
+   re-opens after 3 successive HALF-OPEN probes, HITL Level 2
+   escalation SHOULD be triggered.
+
+ATD-to-HITL escalation is recorded as an ECT linked to both
+the triggering error ECT and the HITL override ECT, preserving
+the causal chain in the audit DAG.
+
+# Resource Hints {#resources}
+
+## Resource Claim Format
+
+Agents MAY declare resource requirements as ACP-DAG-HITL node
+constraints:
+
+~~~json
+{
+  "constraints": {
+    "atd.resource_cpu": "2",
+    "atd.resource_memory_mb": 4096,
+    "atd.resource_timeout_s": 300,
+    "atd.resource_priority": "high",
+    "atd.resource_gpu": "0",
+    "atd.resource_network_mbps": 100
+  }
+}
+~~~
+{: #fig-resources title="Resource Hints as Node Constraints"}
+
+## Priority Levels
+
+The `atd.resource_priority` field MUST be one of: `critical`,
+`high`, `normal`, `low`.  Orchestrators SHOULD map these to
+scheduling priority classes (e.g., Kubernetes QoS classes:
+`critical` → Guaranteed, `high`/`normal` → Burstable, `low`
+→ BestEffort).
+
+## Fair-Share Scheduling
+
+When multiple agents compete for a shared resource pool,
+orchestrators SHOULD implement fair-share scheduling:
+
+1. Each active workflow receives an equal base allocation.
+2. Unused allocation from `low` priority agents is redistributed
+   to `high`/`critical` agents within the same scheduling cycle.
+3. Starvation prevention: `low` priority agents MUST eventually
+   be scheduled within a configurable maximum wait (default: 300s).
+
+## Unsatisfiable Resource Hints
+
+Resource hints are advisory; agents MUST NOT depend on them for
+correctness.  When resource hints cannot be satisfied:
+
+- If `atd.resource_priority` is `critical`: orchestrator SHOULD
+  pre-empt lower-priority tasks.
+- If `critical` tasks still cannot be scheduled within 60s:
+  emit `atd:error` with `error_type: "resource_exhausted"` and
+  escalate to HITL.
+- All other priorities: proceed with degraded resources; log
+  a warning via `atd:error` with severity `warning`.
+
+# Optional Declarative Workflow Format {#workflow-format}
+
+To support pre-run planning and tooling, ATD defines an optional
+declarative workflow descriptor.  This is a planning artifact
+only; at runtime it is realized as ECTs per this specification.
+
+~~~json
+{
+  "wf_id": "bgp-failover-v2",
+  "description": "BGP peer failover with validation",
+  "nodes": [
+    {
+      "id": "n1",
+      "label": "validate-config",
+      "reversible": true,
+      "hitl_required": false,
+      "resource_hints": {
+        "priority": "normal",
+        "timeout_s": 30
+      }
+    },
+    {
+      "id": "n2",
+      "label": "update-bgp-peer",
+      "reversible": true,
+      "hitl_required": true,
+      "resource_hints": {
+        "priority": "critical",
+        "timeout_s": 120
+      }
+    },
+    {
+      "id": "n3",
+      "label": "verify-session",
+      "reversible": false,
+      "hitl_required": false,
+      "resource_hints": {
+        "priority": "high",
+        "timeout_s": 60
+      }
+    }
+  ],
+  "edges": [
+    {"from": "n1", "to": "n2"},
+    {"from": "n2", "to": "n3"}
+  ]
+}
+~~~
+{: #fig-workflow title="Declarative Workflow Descriptor"}
+
+The workflow descriptor media type is
+`application/atd-workflow+json`.  Orchestrators MAY store and
+version workflow descriptors independently of their ECT runtime
+realization.
+
+The `hitl_required` field is a hint to the HITL system that this
+node MUST have an approval gate as defined in the companion HITL
+specification.
+
+# Security Considerations
+
+## Rollback Authorization
+
+Rollback requests are high-privilege operations.  Agents MUST
+authenticate rollback requests using the ECT identity binding
+(L2/L3).  The rollback endpoint MUST require mutual TLS or a
+signed JWT from an agent within the same workflow DAG.
+
+Only agents that are ancestors in the ECT DAG of the checkpoint
+being rolled back SHOULD be authorized to request that rollback.
+
+## Checkpoint Confidentiality
+
+Checkpoint data may contain sensitive system state (API keys,
+session tokens, configuration).  Agents MUST:
+
+- Encrypt stored checkpoints at rest.
+- Reference checkpoint state via `out_hash` only in ECTs.
+- MUST NOT include checkpoint contents in error ECTs.
+
+## False Error Injection
+
+A malicious agent could send false `atd:error` ECTs to trigger
+unnecessary rollbacks and disrupt workflows.  Mitigation:
+
+- Agents SHOULD verify that error ECTs reference valid `par`
+  values within their own workflow DAG (`wid` claim).
+- Rollback MUST require authentication (see {{rollback-authz}}).
+- L2/L3 ECT signing prevents unauthenticated error injection.
+
+## Checkpoint Flooding
+
+An adversary could exhaust checkpoint storage by triggering
+many checkpoints.  Mitigation:
+
+- Agents SHOULD enforce a maximum checkpoint count per workflow.
+- Expired checkpoints (past `atd.ttl`) MUST be purged.
+- Checkpoint creation rate SHOULD be rate-limited per calling
+  workflow.
+
+## Circuit Breaker State Leakage
+
+The `atd:circuit_open` ECT reveals system health topology.  The
+audit ledger SHOULD enforce access controls: only agents within
+the same workflow or authorized operators SHOULD be able to query
+circuit breaker history.
+
+# IANA Considerations
+
+This document requests registration of the following values in
+the AEM Ecosystem Extension Registry established by
+draft-aem-agent-ecosystem-model:
+
+## `exec_act` Values
+
+| Value | Description | Reference |
+|-------|-------------|-----------|
+| `atd:checkpoint` | State snapshot before consequential action | This document |
+| `atd:error` | Error signal with severity and type | This document |
+| `atd:circuit_open` | Circuit breaker opened to downstream agent | This document |
+| `atd:circuit_close` | Circuit breaker returned to CLOSED state | This document |
+| `atd:rollback_request` | Initiate rollback to named checkpoint | This document |
+| `atd:rollback_result` | Result of rollback attempt | This document |
+| `atd:workflow_start` | Workflow began execution | This document |
+| `atd:workflow_complete` | Workflow reached terminal state | This document |
+{: #fig-iana-actions title="ATD exec_act Registrations"}
+
+## Well-Known URI
+
+This document requests registration of `atd/rollback` as a
+well-known URI suffix per {{RFC8615}}.
+
+## Media Type
+
+This document requests registration of
+`application/atd-workflow+json` for the declarative workflow
+descriptor format defined in {{workflow-format}}.
+
+--- back
+
+# Acknowledgments
+{:numbered="false"}
+
+ATD builds on ECT {{I-D.nennemann-wimse-ect}} for execution
+evidence and ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}}
+for delegation policy.  The circuit breaker pattern is adapted
+from microservice architecture best practices.  The declarative
+workflow format is inspired by workflow description languages
+(BPEL, BPMN) adapted for lightweight agent coordination.