ietf-draft-analyzer/workspace/drafts/new-drafts/draft-c-hitl-human-in-the-loop-00.md

---
title: "Human-in-the-Loop (HITL) Primitives for Agent Ecosystems"
abbrev: "HITL"
category: std
docname: draft-hitl-human-in-the-loop-00
submissiontype: IETF
number:
date:
v: 3
area: "OPS"
workgroup: "NMOP"
keyword:
  - human override
  - HITL
  - emergency stop
  - agentic safety

author:
  -
    fullname: TBD
    organization: Independent
    email: placeholder@example.com

normative:
  RFC2119:
  RFC8174:
  RFC7519:
  RFC8446:
  RFC8615:
  I-D.nennemann-wimse-ect:
    title: "Execution Context Tokens for Distributed Agentic Workflows"
    target: https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/
  I-D.nennemann-agent-dag-hitl-safety:
    title: "Agent Context Policy Token: DAG Delegation with Human Override"
    target: https://datatracker.ietf.org/doc/draft-nennemann-agent-dag-hitl-safety/

informative:

--- abstract

This document defines runtime HITL (Human-in-the-Loop) primitives
for agent ecosystems: four escalating override levels, approval
gates, escalation paths, and explainability hooks.  ACP-DAG-HITL
defines WHEN humans must intervene (policy rules and triggers).
This specification defines HOW the intervention actually happens at
the protocol level: the HTTP endpoints, override semantics, agent
compliance requirements, and acknowledgment flows.  All overrides
and decisions produce ECT nodes, making human interventions part of
the same auditable DAG as agent actions.

--- middle

# Introduction

The current ratio of autonomous capability drafts to human
oversight drafts in the IETF is roughly 7:1.  Agents can act but
humans cannot reliably stop them.

ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}} defines the
policy: trigger conditions, required roles, and actions (`pause`,
`escalate`, `abort`).  But it deliberately defers the runtime
protocol — how does an operator actually send a stop command?  How
does the agent acknowledge it?  What happens if the operator is
unreachable?

This specification fills that gap.  It is the runtime enforcement
companion to ACP-DAG-HITL, inspired by industrial safety systems:
the e-stop button on factory equipment, the circuit breaker in
electrical systems, and the kill switch in robotics.

HITL is deliberately not a governance framework, policy language,
or accountability protocol.  It is a panic button with a
well-defined interface.

# Conventions and Definitions

{::boilerplate bcp14-tagged}

Override:
: A human-initiated command that alters an agent's autonomous
  operation, taking precedence over the agent's own decisions.

Operator:
: A human user authorized to issue override commands.

Approval Gate:
: A DAG node that blocks workflow progression until a human
  approves or rejects continuation.

# Relationship to ACP-DAG-HITL {#mapping}

ACP-DAG-HITL defines three HITL actions.  This specification
maps them to four runtime override levels and extends with
CONSTRAIN (partial restriction):

| ACP-DAG-HITL action | HITL Override Level | Behavior |
|---------------------|---------------------|----------|
| `pause` | Level 1: PAUSE | Suspend autonomous actions, hold state |
| (no equivalent) | Level 2: CONSTRAIN | Restrict to an allowlist of actions |
| `abort` | Level 3: STOP | Cease all actions, enter inert state |
| `escalate` | Level 4: TAKEOVER | Transfer control to human operator |
{: #fig-mapping title="ACP-DAG-HITL to HITL Level Mapping"}

When ACP-DAG-HITL rules trigger, the runtime system uses the
corresponding HITL level to enforce the action.

# Override Levels {#levels}

## Level 1: PAUSE

The agent MUST suspend all autonomous actions and hold current
state.  It MUST NOT initiate new actions but MAY complete
in-progress actions if stopping mid-execution would cause harm
(e.g., an in-flight database transaction).  The agent resumes
when a RESUME command is received.

## Level 2: CONSTRAIN

The agent MUST restrict its actions to a specified subset.  The
override command includes an allowlist of permitted action types.
The agent MUST reject any action not on the allowlist.

## Level 3: STOP

The agent MUST immediately cease all autonomous actions and enter
an inert state.  It MUST NOT take any autonomous actions until
explicitly restarted.  This is the e-stop.

## Level 4: TAKEOVER

The agent MUST transfer operational control to the human operator.
It enters a pass-through mode where it executes only explicit
operator commands.  The agent's sensors and outputs remain
available to the operator as tools.

# Override Protocol {#protocol}

## Override Command

Override commands are sent as HTTP POST to the agent's well-known
endpoint:

~~~
POST /.well-known/hitl/override HTTP/1.1
Content-Type: application/json
Authorization: Bearer <operator-jwt>
Execution-Context: <override-ect>
~~~

The override ECT MUST contain:

- `exec_act`: `"hitl:override"`
- `par`: the most recent ECT from the agent being overridden
  (linking the override into the workflow DAG)

~~~json
{
  "exec_act": "hitl:override",
  "par": ["agent-last-action-ect"],
  "ext": {
    "hitl.level": 3,
    "hitl.reason": "Agent blocking legitimate traffic",
    "hitl.operator_id": "user:alice",
    "hitl.scope": "*",
    "hitl.constraints": null,
    "hitl.ttl": null
  }
}
~~~
{: #fig-override title="Override ECT"}

Field definitions:

- `hitl.level`: Integer 1-4. MUST be present.
- `hitl.reason`: Human-readable text. MUST be logged.
- `hitl.scope`: `"*"` for all functions, or an array of function
  IDs for partial override.
- `hitl.constraints`: For Level 2 only. Array of permitted action
  types.
- `hitl.ttl`: Duration in seconds. If set, override auto-expires.
  If null, persists until explicitly lifted.

## Acknowledgment

The agent MUST respond with an acknowledgment ECT:

- `exec_act`: `"hitl:ack"`
- `par`: the override ECT

~~~json
{
  "exec_act": "hitl:ack",
  "par": ["override-ect-uuid"],
  "ext": {
    "hitl.status": "accepted",
    "hitl.prior_state": "autonomous",
    "hitl.current_state": "stopped",
    "hitl.effective_at": "2026-03-01T12:00:00.123Z"
  }
}
~~~
{: #fig-ack title="Acknowledgment ECT"}

The override/ack ECT pair serves as the Decision Record defined
in ACP-DAG-HITL Section 6.5.  No separate audit mechanism is
needed.

## Resume and Lift

To resume from PAUSE:

~~~
POST /.well-known/hitl/resume HTTP/1.1
Execution-Context: <resume-ect with exec_act="hitl:resume">
~~~

To lift any override:

~~~
POST /.well-known/hitl/lift HTTP/1.1
Execution-Context: <lift-ect with exec_act="hitl:lift">
~~~

Both produce ECTs linked to the original override ECT via `par`.

# Agent Compliance Requirements {#compliance}

Every HITL-compliant agent MUST:

1. Implement the `/.well-known/hitl/override` endpoint.

2. Process override commands within 1 second of receipt.  The
   override path MUST be independent of the agent's main
   processing loop.

3. Acknowledge every override with an ECT response.

4. An agent MUST NOT respond with "rejected".  Overrides are
   mandatory.  If the agent cannot fully comply, it MUST respond
   with status `partial` and describe what it could not do.

5. Expose current override status at:

~~~
GET /.well-known/hitl/status
~~~

~~~json
{
  "agent_id": "spiffe://example.com/agent/firewall",
  "override_active": true,
  "current_level": 3,
  "override_ect": "override-ect-uuid",
  "since": "2026-03-01T12:00:00Z",
  "operator_id": "user:alice"
}
~~~
{: #fig-status title="Override Status Response"}

# Approval Gates {#approval-gates}

An approval gate is a DAG node that blocks workflow progression
until a human approves.  Unlike overrides (which interrupt running
agents), approval gates are planned checkpoints in the workflow.

Approval gates are defined as ACP-DAG-HITL nodes with HITL rules:

~~~json
{
  "dag": {
    "nodes": [
      {
        "id": "n-approve",
        "type": "hitl:approval_gate",
        "agent": "system:hitl-gateway",
        "constraints": {
          "hitl.required_role": "clinician:oncall",
          "hitl.timeout_s": 300,
          "hitl.timeout_action": "safe_pause"
        }
      }
    ]
  }
}
~~~
{: #fig-gate title="Approval Gate as DAG Node"}

When the workflow reaches an approval gate, the system:

1. Emits an ECT with `exec_act: "hitl:approval_request"`
2. Notifies the required human role
3. Waits for approval (ECT: `"hitl:approval_granted"`) or
   rejection (ECT: `"hitl:approval_denied"`)
4. On timeout, applies `hitl.timeout_action`

# Broadcast Override {#broadcast}

For environments with many agents, an operator MAY send a
broadcast override to a management endpoint:

~~~
POST /hitl/broadcast HTTP/1.1
Execution-Context: <broadcast-override-ect>

{
  "targets": ["spiffe://example.com/agent/a",
               "spiffe://example.com/agent/b"],
  "level": 3,
  "reason": "Coordinated emergency stop"
}
~~~

The broadcast endpoint fans out individual override ECTs to each
target and returns per-agent results.

# Dead Man's Switch {#dead-man}

For maximum reliability, agents SHOULD implement a heartbeat
mechanism: the agent periodically pings an operator heartbeat
endpoint.  If the heartbeat is missed for a configurable duration,
the agent automatically enters Level 1 (PAUSE).

This provides a safety net when network connectivity to the
operator is lost.  The `unreachable_human` policy from
ACP-DAG-HITL governs behavior when the dead man's switch
activates: either `abort` or `safe_pause`.

# Security Considerations

Override commands are high-privilege operations.  All override
endpoints MUST require authentication via mutual TLS or signed
JWTs.

Override ECTs MUST be signed at L2 or L3.  Agents MUST verify
signatures before processing.

To prevent replay attacks, agents MUST reject override ECTs with
`iat` more than 30 seconds in the past.  The `jti` MUST be unique;
agents MUST reject duplicate `jti` values.

Deployments SHOULD implement multi-operator approval for Level 4
(TAKEOVER), requiring two independent operator identities.

The override endpoint SHOULD be served on a separate port or
network interface from the agent's main API to ensure availability
during overload.

# IANA Considerations

This document requests the following registrations:

1. Well-known URI registrations for `hitl/override`,
   `hitl/resume`, `hitl/lift`, and `hitl/status` per {{RFC8615}}.

2. Registration of `exec_act` values: `hitl:override`,
   `hitl:ack`, `hitl:resume`, `hitl:lift`,
   `hitl:approval_request`, `hitl:approval_granted`,
   `hitl:approval_denied` in a future ECT action type registry.

--- back

# Acknowledgments
{:numbered="false"}

This specification is the runtime enforcement companion to
ACP-DAG-HITL {{I-D.nennemann-agent-dag-hitl-safety}}.  Override
design is inspired by industrial safety systems (IEC 62061,
ISO 13849).