369 lines
10 KiB
Markdown
369 lines
10 KiB
Markdown
---
|
|
title: "Human-in-the-Loop (HITL) Primitives for Agent Ecosystems"
|
|
abbrev: "HITL"
|
|
category: std
|
|
docname: draft-hitl-human-in-the-loop-00
|
|
submissiontype: IETF
|
|
number:
|
|
date:
|
|
v: 3
|
|
area: "OPS"
|
|
workgroup: "NMOP"
|
|
keyword:
|
|
- human override
|
|
- HITL
|
|
- emergency stop
|
|
- agentic safety
|
|
|
|
author:
|
|
-
|
|
fullname: TBD
|
|
organization: Independent
|
|
email: placeholder@example.com
|
|
|
|
normative:
|
|
RFC2119:
|
|
RFC8174:
|
|
RFC7519:
|
|
RFC8446:
|
|
RFC8615:
|
|
I-D.nennemann-wimse-ect:
|
|
title: "Execution Context Tokens for Distributed Agentic Workflows"
|
|
target: https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/
|
|
I-D.nennemann-agent-dag-hitl-safety:
|
|
title: "Agent Context Policy Token: DAG Delegation with Human Override"
|
|
target: https://datatracker.ietf.org/doc/draft-nennemann-agent-dag-hitl-safety/
|
|
|
|
informative:
|
|
|
|
--- abstract
|
|
|
|
This document defines runtime HITL (Human-in-the-Loop) primitives
|
|
for agent ecosystems: four escalating override levels, approval
|
|
gates, escalation paths, and explainability hooks. ACP-DAG-HITL
|
|
defines WHEN humans must intervene (policy rules and triggers).
|
|
This specification defines HOW the intervention actually happens at
|
|
the protocol level: the HTTP endpoints, override semantics, agent
|
|
compliance requirements, and acknowledgment flows. All overrides
|
|
and decisions produce ECT nodes, making human interventions part of
|
|
the same auditable DAG as agent actions.
|
|
|
|
--- middle
|
|
|
|
# Introduction
|
|
|
|
The current ratio of autonomous capability drafts to human
|
|
oversight drafts in the IETF is roughly 7:1. Agents can act but
|
|
humans cannot reliably stop them.
|
|
|
|
ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}} defines the
|
|
policy: trigger conditions, required roles, and actions (`pause`,
|
|
`escalate`, `abort`). But it deliberately defers the runtime
|
|
protocol — how does an operator actually send a stop command? How
|
|
does the agent acknowledge it? What happens if the operator is
|
|
unreachable?
|
|
|
|
This specification fills that gap. It is the runtime enforcement
|
|
companion to ACP-DAG-HITL, inspired by industrial safety systems:
|
|
the e-stop button on factory equipment, the circuit breaker in
|
|
electrical systems, and the kill switch in robotics.
|
|
|
|
HITL is deliberately not a governance framework, policy language,
|
|
or accountability protocol. It is a panic button with a
|
|
well-defined interface.
|
|
|
|
# Conventions and Definitions
|
|
|
|
{::boilerplate bcp14-tagged}
|
|
|
|
Override:
|
|
: A human-initiated command that alters an agent's autonomous
|
|
operation, taking precedence over the agent's own decisions.
|
|
|
|
Operator:
|
|
: A human user authorized to issue override commands.
|
|
|
|
Approval Gate:
|
|
: A DAG node that blocks workflow progression until a human
|
|
approves or rejects continuation.
|
|
|
|
# Relationship to ACP-DAG-HITL {#mapping}
|
|
|
|
ACP-DAG-HITL defines three HITL actions. This specification
|
|
maps them to four runtime override levels and extends with
|
|
CONSTRAIN (partial restriction):
|
|
|
|
| ACP-DAG-HITL action | HITL Override Level | Behavior |
|
|
|---------------------|---------------------|----------|
|
|
| `pause` | Level 1: PAUSE | Suspend autonomous actions, hold state |
|
|
| (no equivalent) | Level 2: CONSTRAIN | Restrict to an allowlist of actions |
|
|
| `abort` | Level 3: STOP | Cease all actions, enter inert state |
|
|
| `escalate` | Level 4: TAKEOVER | Transfer control to human operator |
|
|
{: #fig-mapping title="ACP-DAG-HITL to HITL Level Mapping"}
|
|
|
|
When ACP-DAG-HITL rules trigger, the runtime system uses the
|
|
corresponding HITL level to enforce the action.
|
|
|
|
# Override Levels {#levels}
|
|
|
|
## Level 1: PAUSE
|
|
|
|
The agent MUST suspend all autonomous actions and hold current
|
|
state. It MUST NOT initiate new actions but MAY complete
|
|
in-progress actions if stopping mid-execution would cause harm
|
|
(e.g., an in-flight database transaction). The agent resumes
|
|
when a RESUME command is received.
|
|
|
|
## Level 2: CONSTRAIN
|
|
|
|
The agent MUST restrict its actions to a specified subset. The
|
|
override command includes an allowlist of permitted action types.
|
|
The agent MUST reject any action not on the allowlist.
|
|
|
|
## Level 3: STOP
|
|
|
|
The agent MUST immediately cease all autonomous actions and enter
|
|
an inert state. It MUST NOT take any autonomous actions until
|
|
explicitly restarted. This is the e-stop.
|
|
|
|
## Level 4: TAKEOVER
|
|
|
|
The agent MUST transfer operational control to the human operator.
|
|
It enters a pass-through mode where it executes only explicit
|
|
operator commands. The agent's sensors and outputs remain
|
|
available to the operator as tools.
|
|
|
|
# Override Protocol {#protocol}
|
|
|
|
## Override Command
|
|
|
|
Override commands are sent as HTTP POST to the agent's well-known
|
|
endpoint:
|
|
|
|
~~~
|
|
POST /.well-known/hitl/override HTTP/1.1
|
|
Content-Type: application/json
|
|
Authorization: Bearer <operator-jwt>
|
|
Execution-Context: <override-ect>
|
|
~~~
|
|
|
|
The override ECT MUST contain:
|
|
|
|
- `exec_act`: `"hitl:override"`
|
|
- `par`: the most recent ECT from the agent being overridden
|
|
(linking the override into the workflow DAG)
|
|
|
|
~~~json
|
|
{
|
|
"exec_act": "hitl:override",
|
|
"par": ["agent-last-action-ect"],
|
|
"ext": {
|
|
"hitl.level": 3,
|
|
"hitl.reason": "Agent blocking legitimate traffic",
|
|
"hitl.operator_id": "user:alice",
|
|
"hitl.scope": "*",
|
|
"hitl.constraints": null,
|
|
"hitl.ttl": null
|
|
}
|
|
}
|
|
~~~
|
|
{: #fig-override title="Override ECT"}
|
|
|
|
Field definitions:
|
|
|
|
- `hitl.level`: Integer 1-4. MUST be present.
|
|
- `hitl.reason`: Human-readable text. MUST be logged.
|
|
- `hitl.scope`: `"*"` for all functions, or an array of function
|
|
IDs for partial override.
|
|
- `hitl.constraints`: For Level 2 only. Array of permitted action
|
|
types.
|
|
- `hitl.ttl`: Duration in seconds. If set, override auto-expires.
|
|
If null, persists until explicitly lifted.
|
|
|
|
## Acknowledgment
|
|
|
|
The agent MUST respond with an acknowledgment ECT:
|
|
|
|
- `exec_act`: `"hitl:ack"`
|
|
- `par`: the override ECT
|
|
|
|
~~~json
|
|
{
|
|
"exec_act": "hitl:ack",
|
|
"par": ["override-ect-uuid"],
|
|
"ext": {
|
|
"hitl.status": "accepted",
|
|
"hitl.prior_state": "autonomous",
|
|
"hitl.current_state": "stopped",
|
|
"hitl.effective_at": "2026-03-01T12:00:00.123Z"
|
|
}
|
|
}
|
|
~~~
|
|
{: #fig-ack title="Acknowledgment ECT"}
|
|
|
|
The override/ack ECT pair serves as the Decision Record defined
|
|
in ACP-DAG-HITL Section 6.5. No separate audit mechanism is
|
|
needed.
|
|
|
|
## Resume and Lift
|
|
|
|
To resume from PAUSE:
|
|
|
|
~~~
|
|
POST /.well-known/hitl/resume HTTP/1.1
|
|
Execution-Context: <resume-ect with exec_act="hitl:resume">
|
|
~~~
|
|
|
|
To lift any override:
|
|
|
|
~~~
|
|
POST /.well-known/hitl/lift HTTP/1.1
|
|
Execution-Context: <lift-ect with exec_act="hitl:lift">
|
|
~~~
|
|
|
|
Both produce ECTs linked to the original override ECT via `par`.
|
|
|
|
# Agent Compliance Requirements {#compliance}
|
|
|
|
Every HITL-compliant agent MUST:
|
|
|
|
1. Implement the `/.well-known/hitl/override` endpoint.
|
|
|
|
2. Process override commands within 1 second of receipt. The
|
|
override path MUST be independent of the agent's main
|
|
processing loop.
|
|
|
|
3. Acknowledge every override with an ECT response.
|
|
|
|
4. An agent MUST NOT respond with "rejected". Overrides are
|
|
mandatory. If the agent cannot fully comply, it MUST respond
|
|
with status `partial` and describe what it could not do.
|
|
|
|
5. Expose current override status at:
|
|
|
|
~~~
|
|
GET /.well-known/hitl/status
|
|
~~~
|
|
|
|
~~~json
|
|
{
|
|
"agent_id": "spiffe://example.com/agent/firewall",
|
|
"override_active": true,
|
|
"current_level": 3,
|
|
"override_ect": "override-ect-uuid",
|
|
"since": "2026-03-01T12:00:00Z",
|
|
"operator_id": "user:alice"
|
|
}
|
|
~~~
|
|
{: #fig-status title="Override Status Response"}
|
|
|
|
# Approval Gates {#approval-gates}
|
|
|
|
An approval gate is a DAG node that blocks workflow progression
|
|
until a human approves. Unlike overrides (which interrupt running
|
|
agents), approval gates are planned checkpoints in the workflow.
|
|
|
|
Approval gates are defined as ACP-DAG-HITL nodes with HITL rules:
|
|
|
|
~~~json
|
|
{
|
|
"dag": {
|
|
"nodes": [
|
|
{
|
|
"id": "n-approve",
|
|
"type": "hitl:approval_gate",
|
|
"agent": "system:hitl-gateway",
|
|
"constraints": {
|
|
"hitl.required_role": "clinician:oncall",
|
|
"hitl.timeout_s": 300,
|
|
"hitl.timeout_action": "safe_pause"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
~~~
|
|
{: #fig-gate title="Approval Gate as DAG Node"}
|
|
|
|
When the workflow reaches an approval gate, the system:
|
|
|
|
1. Emits an ECT with `exec_act: "hitl:approval_request"`
|
|
2. Notifies the required human role
|
|
3. Waits for approval (ECT: `"hitl:approval_granted"`) or
|
|
rejection (ECT: `"hitl:approval_denied"`)
|
|
4. On timeout, applies `hitl.timeout_action`
|
|
|
|
# Broadcast Override {#broadcast}
|
|
|
|
For environments with many agents, an operator MAY send a
|
|
broadcast override to a management endpoint:
|
|
|
|
~~~
|
|
POST /hitl/broadcast HTTP/1.1
|
|
Execution-Context: <broadcast-override-ect>
|
|
|
|
{
|
|
"targets": ["spiffe://example.com/agent/a",
|
|
"spiffe://example.com/agent/b"],
|
|
"level": 3,
|
|
"reason": "Coordinated emergency stop"
|
|
}
|
|
~~~
|
|
|
|
The broadcast endpoint fans out individual override ECTs to each
|
|
target and returns per-agent results.
|
|
|
|
# Dead Man's Switch {#dead-man}
|
|
|
|
For maximum reliability, agents SHOULD implement a heartbeat
|
|
mechanism: the agent periodically pings an operator heartbeat
|
|
endpoint. If the heartbeat is missed for a configurable duration,
|
|
the agent automatically enters Level 1 (PAUSE).
|
|
|
|
This provides a safety net when network connectivity to the
|
|
operator is lost. The `unreachable_human` policy from
|
|
ACP-DAG-HITL governs behavior when the dead man's switch
|
|
activates: either `abort` or `safe_pause`.
|
|
|
|
# Security Considerations
|
|
|
|
Override commands are high-privilege operations. All override
|
|
endpoints MUST require authentication via mutual TLS or signed
|
|
JWTs.
|
|
|
|
Override ECTs MUST be signed at L2 or L3. Agents MUST verify
|
|
signatures before processing.
|
|
|
|
To prevent replay attacks, agents MUST reject override ECTs with
|
|
`iat` more than 30 seconds in the past. The `jti` MUST be unique;
|
|
agents MUST reject duplicate `jti` values.
|
|
|
|
Deployments SHOULD implement multi-operator approval for Level 4
|
|
(TAKEOVER), requiring two independent operator identities.
|
|
|
|
The override endpoint SHOULD be served on a separate port or
|
|
network interface from the agent's main API to ensure availability
|
|
during overload.
|
|
|
|
# IANA Considerations
|
|
|
|
This document requests the following registrations:
|
|
|
|
1. Well-known URI registrations for `hitl/override`,
|
|
`hitl/resume`, `hitl/lift`, and `hitl/status` per {{RFC8615}}.
|
|
|
|
2. Registration of `exec_act` values: `hitl:override`,
|
|
`hitl:ack`, `hitl:resume`, `hitl:lift`,
|
|
`hitl:approval_request`, `hitl:approval_granted`,
|
|
`hitl:approval_denied` in a future ECT action type registry.
|
|
|
|
--- back
|
|
|
|
# Acknowledgments
|
|
{:numbered="false"}
|
|
|
|
This specification is the runtime enforcement companion to
|
|
ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}}. Override
|
|
design is inspired by industrial safety systems (IEC 62061,
|
|
ISO 13849).
|