feat: add draft data, gap analysis report, and workspace config
This commit is contained in:
368
workspace/drafts/new-drafts/draft-c-hitl-human-in-the-loop-00.md
Normal file
368
workspace/drafts/new-drafts/draft-c-hitl-human-in-the-loop-00.md
Normal file
@@ -0,0 +1,368 @@
|
||||
---
|
||||
title: "Human-in-the-Loop (HITL) Primitives for Agent Ecosystems"
|
||||
abbrev: "HITL"
|
||||
category: std
|
||||
docname: draft-hitl-human-in-the-loop-00
|
||||
submissiontype: IETF
|
||||
number:
|
||||
date:
|
||||
v: 3
|
||||
area: "OPS"
|
||||
workgroup: "NMOP"
|
||||
keyword:
|
||||
- human override
|
||||
- HITL
|
||||
- emergency stop
|
||||
- agentic safety
|
||||
|
||||
author:
|
||||
-
|
||||
fullname: TBD
|
||||
organization: Independent
|
||||
email: placeholder@example.com
|
||||
|
||||
normative:
|
||||
RFC2119:
|
||||
RFC8174:
|
||||
RFC7519:
|
||||
RFC8446:
|
||||
RFC8615:
|
||||
I-D.nennemann-wimse-ect:
|
||||
title: "Execution Context Tokens for Distributed Agentic Workflows"
|
||||
target: https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/
|
||||
I-D.nennemann-agent-dag-hitl-safety:
|
||||
title: "Agent Context Policy Token: DAG Delegation with Human Override"
|
||||
target: https://datatracker.ietf.org/doc/draft-nennemann-agent-dag-hitl-safety/
|
||||
|
||||
informative:
|
||||
|
||||
--- abstract
|
||||
|
||||
This document defines runtime HITL (Human-in-the-Loop) primitives
|
||||
for agent ecosystems: four escalating override levels, approval
|
||||
gates, escalation paths, and explainability hooks. ACP-DAG-HITL
|
||||
defines WHEN humans must intervene (policy rules and triggers).
|
||||
This specification defines HOW the intervention actually happens at
|
||||
the protocol level: the HTTP endpoints, override semantics, agent
|
||||
compliance requirements, and acknowledgment flows. All overrides
|
||||
and decisions produce ECT nodes, making human interventions part of
|
||||
the same auditable DAG as agent actions.
|
||||
|
||||
--- middle
|
||||
|
||||
# Introduction
|
||||
|
||||
The current ratio of autonomous capability drafts to human
|
||||
oversight drafts in the IETF is roughly 7:1. Agents can act but
|
||||
humans cannot reliably stop them.
|
||||
|
||||
ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}} defines the
|
||||
policy: trigger conditions, required roles, and actions (`pause`,
|
||||
`escalate`, `abort`). But it deliberately defers the runtime
|
||||
protocol — how does an operator actually send a stop command? How
|
||||
does the agent acknowledge it? What happens if the operator is
|
||||
unreachable?
|
||||
|
||||
This specification fills that gap. It is the runtime enforcement
|
||||
companion to ACP-DAG-HITL, inspired by industrial safety systems:
|
||||
the e-stop button on factory equipment, the circuit breaker in
|
||||
electrical systems, and the kill switch in robotics.
|
||||
|
||||
HITL is deliberately not a governance framework, policy language,
|
||||
or accountability protocol. It is a panic button with a
|
||||
well-defined interface.
|
||||
|
||||
# Conventions and Definitions
|
||||
|
||||
{::boilerplate bcp14-tagged}
|
||||
|
||||
Override:
|
||||
: A human-initiated command that alters an agent's autonomous
|
||||
operation, taking precedence over the agent's own decisions.
|
||||
|
||||
Operator:
|
||||
: A human user authorized to issue override commands.
|
||||
|
||||
Approval Gate:
|
||||
: A DAG node that blocks workflow progression until a human
|
||||
approves or rejects continuation.
|
||||
|
||||
# Relationship to ACP-DAG-HITL {#mapping}
|
||||
|
||||
ACP-DAG-HITL defines three HITL actions. This specification
|
||||
maps them to four runtime override levels and extends with
|
||||
CONSTRAIN (partial restriction):
|
||||
|
||||
| ACP-DAG-HITL action | HITL Override Level | Behavior |
|
||||
|---------------------|---------------------|----------|
|
||||
| `pause` | Level 1: PAUSE | Suspend autonomous actions, hold state |
|
||||
| (no equivalent) | Level 2: CONSTRAIN | Restrict to an allowlist of actions |
|
||||
| `abort` | Level 3: STOP | Cease all actions, enter inert state |
|
||||
| `escalate` | Level 4: TAKEOVER | Transfer control to human operator |
|
||||
{: #fig-mapping title="ACP-DAG-HITL to HITL Level Mapping"}
|
||||
|
||||
When ACP-DAG-HITL rules trigger, the runtime system uses the
|
||||
corresponding HITL level to enforce the action.
|
||||
|
||||
# Override Levels {#levels}
|
||||
|
||||
## Level 1: PAUSE
|
||||
|
||||
The agent MUST suspend all autonomous actions and hold current
|
||||
state. It MUST NOT initiate new actions but MAY complete
|
||||
in-progress actions if stopping mid-execution would cause harm
|
||||
(e.g., an in-flight database transaction). The agent resumes
|
||||
when a RESUME command is received.
|
||||
|
||||
## Level 2: CONSTRAIN
|
||||
|
||||
The agent MUST restrict its actions to a specified subset. The
|
||||
override command includes an allowlist of permitted action types.
|
||||
The agent MUST reject any action not on the allowlist.
|
||||
|
||||
## Level 3: STOP
|
||||
|
||||
The agent MUST immediately cease all autonomous actions and enter
|
||||
an inert state. It MUST NOT take any autonomous actions until
|
||||
explicitly restarted. This is the e-stop.
|
||||
|
||||
## Level 4: TAKEOVER
|
||||
|
||||
The agent MUST transfer operational control to the human operator.
|
||||
It enters a pass-through mode where it executes only explicit
|
||||
operator commands. The agent's sensors and outputs remain
|
||||
available to the operator as tools.
|
||||
|
||||
# Override Protocol {#protocol}
|
||||
|
||||
## Override Command
|
||||
|
||||
Override commands are sent as HTTP POST to the agent's well-known
|
||||
endpoint:
|
||||
|
||||
~~~
|
||||
POST /.well-known/hitl/override HTTP/1.1
|
||||
Content-Type: application/json
|
||||
Authorization: Bearer <operator-jwt>
|
||||
Execution-Context: <override-ect>
|
||||
~~~
|
||||
|
||||
The override ECT MUST contain:
|
||||
|
||||
- `exec_act`: `"hitl:override"`
|
||||
- `par`: the most recent ECT from the agent being overridden
|
||||
(linking the override into the workflow DAG)
|
||||
|
||||
~~~json
|
||||
{
|
||||
"exec_act": "hitl:override",
|
||||
"par": ["agent-last-action-ect"],
|
||||
"ext": {
|
||||
"hitl.level": 3,
|
||||
"hitl.reason": "Agent blocking legitimate traffic",
|
||||
"hitl.operator_id": "user:alice",
|
||||
"hitl.scope": "*",
|
||||
"hitl.constraints": null,
|
||||
"hitl.ttl": null
|
||||
}
|
||||
}
|
||||
~~~
|
||||
{: #fig-override title="Override ECT"}
|
||||
|
||||
Field definitions:
|
||||
|
||||
- `hitl.level`: Integer 1-4. MUST be present.
|
||||
- `hitl.reason`: Human-readable text. MUST be logged.
|
||||
- `hitl.scope`: `"*"` for all functions, or an array of function
|
||||
IDs for partial override.
|
||||
- `hitl.constraints`: For Level 2 only. Array of permitted action
|
||||
types.
|
||||
- `hitl.ttl`: Duration in seconds. If set, override auto-expires.
|
||||
If null, persists until explicitly lifted.
|
||||
|
||||
## Acknowledgment
|
||||
|
||||
The agent MUST respond with an acknowledgment ECT:
|
||||
|
||||
- `exec_act`: `"hitl:ack"`
|
||||
- `par`: the override ECT
|
||||
|
||||
~~~json
|
||||
{
|
||||
"exec_act": "hitl:ack",
|
||||
"par": ["override-ect-uuid"],
|
||||
"ext": {
|
||||
"hitl.status": "accepted",
|
||||
"hitl.prior_state": "autonomous",
|
||||
"hitl.current_state": "stopped",
|
||||
"hitl.effective_at": "2026-03-01T12:00:00.123Z"
|
||||
}
|
||||
}
|
||||
~~~
|
||||
{: #fig-ack title="Acknowledgment ECT"}
|
||||
|
||||
The override/ack ECT pair serves as the Decision Record defined
|
||||
in ACP-DAG-HITL Section 6.5. No separate audit mechanism is
|
||||
needed.
|
||||
|
||||
## Resume and Lift
|
||||
|
||||
To resume from PAUSE:
|
||||
|
||||
~~~
|
||||
POST /.well-known/hitl/resume HTTP/1.1
|
||||
Execution-Context: <resume-ect with exec_act="hitl:resume">
|
||||
~~~
|
||||
|
||||
To lift any override:
|
||||
|
||||
~~~
|
||||
POST /.well-known/hitl/lift HTTP/1.1
|
||||
Execution-Context: <lift-ect with exec_act="hitl:lift">
|
||||
~~~
|
||||
|
||||
Both produce ECTs linked to the original override ECT via `par`.
|
||||
|
||||
# Agent Compliance Requirements {#compliance}
|
||||
|
||||
Every HITL-compliant agent MUST:
|
||||
|
||||
1. Implement the `/.well-known/hitl/override` endpoint.
|
||||
|
||||
2. Process override commands within 1 second of receipt. The
|
||||
override path MUST be independent of the agent's main
|
||||
processing loop.
|
||||
|
||||
3. Acknowledge every override with an ECT response.
|
||||
|
||||
4. An agent MUST NOT respond with "rejected". Overrides are
|
||||
mandatory. If the agent cannot fully comply, it MUST respond
|
||||
with status `partial` and describe what it could not do.
|
||||
|
||||
5. Expose current override status at:
|
||||
|
||||
~~~
|
||||
GET /.well-known/hitl/status
|
||||
~~~
|
||||
|
||||
~~~json
|
||||
{
|
||||
"agent_id": "spiffe://example.com/agent/firewall",
|
||||
"override_active": true,
|
||||
"current_level": 3,
|
||||
"override_ect": "override-ect-uuid",
|
||||
"since": "2026-03-01T12:00:00Z",
|
||||
"operator_id": "user:alice"
|
||||
}
|
||||
~~~
|
||||
{: #fig-status title="Override Status Response"}
|
||||
|
||||
# Approval Gates {#approval-gates}
|
||||
|
||||
An approval gate is a DAG node that blocks workflow progression
|
||||
until a human approves. Unlike overrides (which interrupt running
|
||||
agents), approval gates are planned checkpoints in the workflow.
|
||||
|
||||
Approval gates are defined as ACP-DAG-HITL nodes with HITL rules:
|
||||
|
||||
~~~json
|
||||
{
|
||||
"dag": {
|
||||
"nodes": [
|
||||
{
|
||||
"id": "n-approve",
|
||||
"type": "hitl:approval_gate",
|
||||
"agent": "system:hitl-gateway",
|
||||
"constraints": {
|
||||
"hitl.required_role": "clinician:oncall",
|
||||
"hitl.timeout_s": 300,
|
||||
"hitl.timeout_action": "safe_pause"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
~~~
|
||||
{: #fig-gate title="Approval Gate as DAG Node"}
|
||||
|
||||
When the workflow reaches an approval gate, the system:
|
||||
|
||||
1. Emits an ECT with `exec_act: "hitl:approval_request"`
|
||||
2. Notifies the required human role
|
||||
3. Waits for approval (ECT: `"hitl:approval_granted"`) or
|
||||
rejection (ECT: `"hitl:approval_denied"`)
|
||||
4. On timeout, applies `hitl.timeout_action`
|
||||
|
||||
# Broadcast Override {#broadcast}
|
||||
|
||||
For environments with many agents, an operator MAY send a
|
||||
broadcast override to a management endpoint:
|
||||
|
||||
~~~
|
||||
POST /hitl/broadcast HTTP/1.1
|
||||
Execution-Context: <broadcast-override-ect>
|
||||
|
||||
{
|
||||
"targets": ["spiffe://example.com/agent/a",
|
||||
"spiffe://example.com/agent/b"],
|
||||
"level": 3,
|
||||
"reason": "Coordinated emergency stop"
|
||||
}
|
||||
~~~
|
||||
|
||||
The broadcast endpoint fans out individual override ECTs to each
|
||||
target and returns per-agent results.
|
||||
|
||||
# Dead Man's Switch {#dead-man}
|
||||
|
||||
For maximum reliability, agents SHOULD implement a heartbeat
|
||||
mechanism: the agent periodically pings an operator heartbeat
|
||||
endpoint. If the heartbeat is missed for a configurable duration,
|
||||
the agent automatically enters Level 1 (PAUSE).
|
||||
|
||||
This provides a safety net when network connectivity to the
|
||||
operator is lost. The `unreachable_human` policy from
|
||||
ACP-DAG-HITL governs behavior when the dead man's switch
|
||||
activates: either `abort` or `safe_pause`.
|
||||
|
||||
# Security Considerations
|
||||
|
||||
Override commands are high-privilege operations. All override
|
||||
endpoints MUST require authentication via mutual TLS or signed
|
||||
JWTs.
|
||||
|
||||
Override ECTs MUST be signed at L2 or L3. Agents MUST verify
|
||||
signatures before processing.
|
||||
|
||||
To prevent replay attacks, agents MUST reject override ECTs with
|
||||
`iat` more than 30 seconds in the past. The `jti` MUST be unique;
|
||||
agents MUST reject duplicate `jti` values.
|
||||
|
||||
Deployments SHOULD implement multi-operator approval for Level 4
|
||||
(TAKEOVER), requiring two independent operator identities.
|
||||
|
||||
The override endpoint SHOULD be served on a separate port or
|
||||
network interface from the agent's main API to ensure availability
|
||||
during overload.
|
||||
|
||||
# IANA Considerations
|
||||
|
||||
This document requests the following registrations:
|
||||
|
||||
1. Well-known URI registrations for `hitl/override`,
|
||||
`hitl/resume`, `hitl/lift`, and `hitl/status` per {{RFC8615}}.
|
||||
|
||||
2. Registration of `exec_act` values: `hitl:override`,
|
||||
`hitl:ack`, `hitl:resume`, `hitl:lift`,
|
||||
`hitl:approval_request`, `hitl:approval_granted`,
|
||||
`hitl:approval_denied` in a future ECT action type registry.
|
||||
|
||||
--- back
|
||||
|
||||
# Acknowledgments
|
||||
{:numbered="false"}
|
||||
|
||||
This specification is the runtime enforcement companion to
|
||||
ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}}. Override
|
||||
design is inspired by industrial safety systems (IEC 62061,
|
||||
ISO 13849).
|
||||
Reference in New Issue
Block a user