feat: add draft data, gap analysis report, and workspace config

2026-04-06 18:47:15 +02:00
parent 4f310407b0
commit 2506b6325a
189 changed files with 62649 additions and 0 deletions
--- a/workspace/drafts/new-drafts/draft-c-hitl-human-in-the-loop-00.md
+++ b/workspace/drafts/new-drafts/draft-c-hitl-human-in-the-loop-00.md
@@ -0,0 +1,368 @@
+---
+title: "Human-in-the-Loop (HITL) Primitives for Agent Ecosystems"
+abbrev: "HITL"
+category: std
+docname: draft-hitl-human-in-the-loop-00
+submissiontype: IETF
+number:
+date:
+v: 3
+area: "OPS"
+workgroup: "NMOP"
+keyword:
+  - human override
+  - HITL
+  - emergency stop
+  - agentic safety
+
+author:
+  -
+    fullname: TBD
+    organization: Independent
+    email: placeholder@example.com
+
+normative:
+  RFC2119:
+  RFC8174:
+  RFC7519:
+  RFC8446:
+  RFC8615:
+  I-D.nennemann-wimse-ect:
+    title: "Execution Context Tokens for Distributed Agentic Workflows"
+    target: https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/
+  I-D.nennemann-agent-dag-hitl-safety:
+    title: "Agent Context Policy Token: DAG Delegation with Human Override"
+    target: https://datatracker.ietf.org/doc/draft-nennemann-agent-dag-hitl-safety/
+
+informative:
+
+--- abstract
+
+This document defines runtime HITL (Human-in-the-Loop) primitives
+for agent ecosystems: four escalating override levels, approval
+gates, escalation paths, and explainability hooks.  ACP-DAG-HITL
+defines WHEN humans must intervene (policy rules and triggers).
+This specification defines HOW the intervention actually happens at
+the protocol level: the HTTP endpoints, override semantics, agent
+compliance requirements, and acknowledgment flows.  All overrides
+and decisions produce ECT nodes, making human interventions part of
+the same auditable DAG as agent actions.
+
+--- middle
+
+# Introduction
+
+The current ratio of autonomous capability drafts to human
+oversight drafts in the IETF is roughly 7:1.  Agents can act but
+humans cannot reliably stop them.
+
+ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}} defines the
+policy: trigger conditions, required roles, and actions (`pause`,
+`escalate`, `abort`).  But it deliberately defers the runtime
+protocol — how does an operator actually send a stop command?  How
+does the agent acknowledge it?  What happens if the operator is
+unreachable?
+
+This specification fills that gap.  It is the runtime enforcement
+companion to ACP-DAG-HITL, inspired by industrial safety systems:
+the e-stop button on factory equipment, the circuit breaker in
+electrical systems, and the kill switch in robotics.
+
+HITL is deliberately not a governance framework, policy language,
+or accountability protocol.  It is a panic button with a
+well-defined interface.
+
+# Conventions and Definitions
+
+{::boilerplate bcp14-tagged}
+
+Override:
+: A human-initiated command that alters an agent's autonomous
+  operation, taking precedence over the agent's own decisions.
+
+Operator:
+: A human user authorized to issue override commands.
+
+Approval Gate:
+: A DAG node that blocks workflow progression until a human
+  approves or rejects continuation.
+
+# Relationship to ACP-DAG-HITL {#mapping}
+
+ACP-DAG-HITL defines three HITL actions.  This specification
+maps them to four runtime override levels and extends with
+CONSTRAIN (partial restriction):
+
+| ACP-DAG-HITL action | HITL Override Level | Behavior |
+|---------------------|---------------------|----------|
+| `pause` | Level 1: PAUSE | Suspend autonomous actions, hold state |
+| (no equivalent) | Level 2: CONSTRAIN | Restrict to an allowlist of actions |
+| `abort` | Level 3: STOP | Cease all actions, enter inert state |
+| `escalate` | Level 4: TAKEOVER | Transfer control to human operator |
+{: #fig-mapping title="ACP-DAG-HITL to HITL Level Mapping"}
+
+When ACP-DAG-HITL rules trigger, the runtime system uses the
+corresponding HITL level to enforce the action.
+
+# Override Levels {#levels}
+
+## Level 1: PAUSE
+
+The agent MUST suspend all autonomous actions and hold current
+state.  It MUST NOT initiate new actions but MAY complete
+in-progress actions if stopping mid-execution would cause harm
+(e.g., an in-flight database transaction).  The agent resumes
+when a RESUME command is received.
+
+## Level 2: CONSTRAIN
+
+The agent MUST restrict its actions to a specified subset.  The
+override command includes an allowlist of permitted action types.
+The agent MUST reject any action not on the allowlist.
+
+## Level 3: STOP
+
+The agent MUST immediately cease all autonomous actions and enter
+an inert state.  It MUST NOT take any autonomous actions until
+explicitly restarted.  This is the e-stop.
+
+## Level 4: TAKEOVER
+
+The agent MUST transfer operational control to the human operator.
+It enters a pass-through mode where it executes only explicit
+operator commands.  The agent's sensors and outputs remain
+available to the operator as tools.
+
+# Override Protocol {#protocol}
+
+## Override Command
+
+Override commands are sent as HTTP POST to the agent's well-known
+endpoint:
+
+~~~
+POST /.well-known/hitl/override HTTP/1.1
+Content-Type: application/json
+Authorization: Bearer <operator-jwt>
+Execution-Context: <override-ect>
+~~~
+
+The override ECT MUST contain:
+
+- `exec_act`: `"hitl:override"`
+- `par`: the most recent ECT from the agent being overridden
+  (linking the override into the workflow DAG)
+
+~~~json
+{
+  "exec_act": "hitl:override",
+  "par": ["agent-last-action-ect"],
+  "ext": {
+    "hitl.level": 3,
+    "hitl.reason": "Agent blocking legitimate traffic",
+    "hitl.operator_id": "user:alice",
+    "hitl.scope": "*",
+    "hitl.constraints": null,
+    "hitl.ttl": null
+  }
+}
+~~~
+{: #fig-override title="Override ECT"}
+
+Field definitions:
+
+- `hitl.level`: Integer 1-4. MUST be present.
+- `hitl.reason`: Human-readable text. MUST be logged.
+- `hitl.scope`: `"*"` for all functions, or an array of function
+  IDs for partial override.
+- `hitl.constraints`: For Level 2 only. Array of permitted action
+  types.
+- `hitl.ttl`: Duration in seconds. If set, override auto-expires.
+  If null, persists until explicitly lifted.
+
+## Acknowledgment
+
+The agent MUST respond with an acknowledgment ECT:
+
+- `exec_act`: `"hitl:ack"`
+- `par`: the override ECT
+
+~~~json
+{
+  "exec_act": "hitl:ack",
+  "par": ["override-ect-uuid"],
+  "ext": {
+    "hitl.status": "accepted",
+    "hitl.prior_state": "autonomous",
+    "hitl.current_state": "stopped",
+    "hitl.effective_at": "2026-03-01T12:00:00.123Z"
+  }
+}
+~~~
+{: #fig-ack title="Acknowledgment ECT"}
+
+The override/ack ECT pair serves as the Decision Record defined
+in ACP-DAG-HITL Section 6.5.  No separate audit mechanism is
+needed.
+
+## Resume and Lift
+
+To resume from PAUSE:
+
+~~~
+POST /.well-known/hitl/resume HTTP/1.1
+Execution-Context: <resume-ect with exec_act="hitl:resume">
+~~~
+
+To lift any override:
+
+~~~
+POST /.well-known/hitl/lift HTTP/1.1
+Execution-Context: <lift-ect with exec_act="hitl:lift">
+~~~
+
+Both produce ECTs linked to the original override ECT via `par`.
+
+# Agent Compliance Requirements {#compliance}
+
+Every HITL-compliant agent MUST:
+
+1. Implement the `/.well-known/hitl/override` endpoint.
+
+2. Process override commands within 1 second of receipt.  The
+   override path MUST be independent of the agent's main
+   processing loop.
+
+3. Acknowledge every override with an ECT response.
+
+4. An agent MUST NOT respond with "rejected".  Overrides are
+   mandatory.  If the agent cannot fully comply, it MUST respond
+   with status `partial` and describe what it could not do.
+
+5. Expose current override status at:
+
+~~~
+GET /.well-known/hitl/status
+~~~
+
+~~~json
+{
+  "agent_id": "spiffe://example.com/agent/firewall",
+  "override_active": true,
+  "current_level": 3,
+  "override_ect": "override-ect-uuid",
+  "since": "2026-03-01T12:00:00Z",
+  "operator_id": "user:alice"
+}
+~~~
+{: #fig-status title="Override Status Response"}
+
+# Approval Gates {#approval-gates}
+
+An approval gate is a DAG node that blocks workflow progression
+until a human approves.  Unlike overrides (which interrupt running
+agents), approval gates are planned checkpoints in the workflow.
+
+Approval gates are defined as ACP-DAG-HITL nodes with HITL rules:
+
+~~~json
+{
+  "dag": {
+    "nodes": [
+      {
+        "id": "n-approve",
+        "type": "hitl:approval_gate",
+        "agent": "system:hitl-gateway",
+        "constraints": {
+          "hitl.required_role": "clinician:oncall",
+          "hitl.timeout_s": 300,
+          "hitl.timeout_action": "safe_pause"
+        }
+      }
+    ]
+  }
+}
+~~~
+{: #fig-gate title="Approval Gate as DAG Node"}
+
+When the workflow reaches an approval gate, the system:
+
+1. Emits an ECT with `exec_act: "hitl:approval_request"`
+2. Notifies the required human role
+3. Waits for approval (ECT: `"hitl:approval_granted"`) or
+   rejection (ECT: `"hitl:approval_denied"`)
+4. On timeout, applies `hitl.timeout_action`
+
+# Broadcast Override {#broadcast}
+
+For environments with many agents, an operator MAY send a
+broadcast override to a management endpoint:
+
+~~~
+POST /hitl/broadcast HTTP/1.1
+Execution-Context: <broadcast-override-ect>
+
+{
+  "targets": ["spiffe://example.com/agent/a",
+               "spiffe://example.com/agent/b"],
+  "level": 3,
+  "reason": "Coordinated emergency stop"
+}
+~~~
+
+The broadcast endpoint fans out individual override ECTs to each
+target and returns per-agent results.
+
+# Dead Man's Switch {#dead-man}
+
+For maximum reliability, agents SHOULD implement a heartbeat
+mechanism: the agent periodically pings an operator heartbeat
+endpoint.  If the heartbeat is missed for a configurable duration,
+the agent automatically enters Level 1 (PAUSE).
+
+This provides a safety net when network connectivity to the
+operator is lost.  The `unreachable_human` policy from
+ACP-DAG-HITL governs behavior when the dead man's switch
+activates: either `abort` or `safe_pause`.
+
+# Security Considerations
+
+Override commands are high-privilege operations.  All override
+endpoints MUST require authentication via mutual TLS or signed
+JWTs.
+
+Override ECTs MUST be signed at L2 or L3.  Agents MUST verify
+signatures before processing.
+
+To prevent replay attacks, agents MUST reject override ECTs with
+`iat` more than 30 seconds in the past.  The `jti` MUST be unique;
+agents MUST reject duplicate `jti` values.
+
+Deployments SHOULD implement multi-operator approval for Level 4
+(TAKEOVER), requiring two independent operator identities.
+
+The override endpoint SHOULD be served on a separate port or
+network interface from the agent's main API to ensure availability
+during overload.
+
+# IANA Considerations
+
+This document requests the following registrations:
+
+1. Well-known URI registrations for `hitl/override`,
+   `hitl/resume`, `hitl/lift`, and `hitl/status` per {{RFC8615}}.
+
+2. Registration of `exec_act` values: `hitl:override`,
+   `hitl:ack`, `hitl:resume`, `hitl:lift`,
+   `hitl:approval_request`, `hitl:approval_granted`,
+   `hitl:approval_denied` in a future ECT action type registry.
+
+--- back
+
+# Acknowledgments
+{:numbered="false"}
+
+This specification is the runtime enforcement companion to
+ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}}.  Override
+design is inspired by industrial safety systems (IEC 62061,
+ISO 13849).