--- title: "Standardized Human Override Protocol for Autonomous Agents" abbrev: "Agent Override Protocol" category: std docname: draft-nennemann-agent-override-protocol-00 submissiontype: IETF number: date: v: 3 area: "OPS" workgroup: "NMOP" keyword: - human override - autonomous agents - kill switch - override protocol - agent safety author: - fullname: Christian Nennemann organization: Independent Researcher email: ietf@nennemann.de normative: RFC2119: RFC8174: RFC7519: RFC7515: RFC9110: I-D.nennemann-wimse-ect: title: "Execution Context Tokens for Distributed Agentic Workflows" target: https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/ I-D.nennemann-agent-dag-hitl-safety: title: "Agent Context Policy Token: DAG Delegation with Human Override" target: https://datatracker.ietf.org/doc/draft-nennemann-agent-dag-hitl-safety/ informative: I-D.nennemann-agent-gap-analysis: title: "Gap Analysis of IETF Standards for Agentic AI Workflows" target: https://datatracker.ietf.org/doc/draft-nennemann-agent-gap-analysis/ --- abstract This document defines a cross-vendor interoperable protocol for human operators to override autonomous agent decisions at multiple authority levels, with verified compliance and audit trails. It absorbs and supersedes the override mechanisms described in earlier HEOP and HITL drafts, providing a single unified protocol that works across agent implementations from different vendors. The protocol specifies three override levels (Advisory, Mandatory, Emergency), a JWT-based override signal format, multiple delivery mechanisms, compliance verification, and graceful degradation semantics. Override events are recorded as Execution Context Token (ECT) nodes for tamper-evident audit. --- middle # Introduction Gap 7 of the agentic AI gap analysis {{I-D.nennemann-agent-gap-analysis}} identifies the absence of a standardized human override mechanism as a critical deficiency. Current human-in-the-loop (HITL) mechanisms are vendor-specific: each agent platform implements its own override interface, authentication scheme, and compliance model. When agents from different vendors collaborate in a shared workflow, there is no universal mechanism for a human operator to intervene. Earlier drafts addressed portions of this problem. The Human Emergency Override Protocol (HEOP) defined four override levels with ECT integration. The HITL Primitives draft added approval gates, explainability tokens, and timeout policies. This document absorbs and supersedes the override protocol aspects of both, providing a single cross-vendor interoperable specification. The design draws from industrial safety: the emergency stop button on factory equipment, the circuit breaker in electrical systems, and the kill switch in robotics. The override mechanism must be simpler and more reliable than the system it controls. The protocol integrates with the Agent Context Policy Token {{I-D.nennemann-agent-dag-hitl-safety}} for authorization and with the Execution Context Token {{I-D.nennemann-wimse-ect}} for audit. # Terminology {::boilerplate bcp14-tagged} Override Signal: : A signed message from an authorized human operator directing one or more agents to change their autonomous behavior. Override Authority: : The authenticated identity and role of a human operator authorized to issue override signals, as defined in ACP-DAG-HITL policy. Override Scope: : The set of agents or agent functions targeted by an override signal. Override Level: : One of three escalating intervention types: Advisory (Level 1), Mandatory (Level 2), or Emergency (Level 3). Compliance Verification: : The process of confirming that an agent has changed its behavior in accordance with an override signal. Acknowledgment: : A signed response from an agent confirming receipt and processing of an override signal. Graceful Degradation: : The behavior of the override system when the target agent is unreachable or non-responsive. Kill Switch: : An Emergency (Level 3) override that requires immediate cessation of all autonomous agent activity. # Override Protocol ## Override Architecture The following diagram illustrates the override signal flow from a human operator through the override system to the target agent(s): ~~~ +----------+ Override Signal +------------------+ | Human |--(JWT-signed msg)--->| Override | | Operator | | Dispatcher | +----------+ +------------------+ ^ | | | | +---------+ | +---------+ | v v v | +---------+ +---------+ +---------+ | | Agent A | | Agent B | | Agent C | | | (push) | | (pull) | | (bcast) | | +---------+ +---------+ +---------+ | | | | +-----(Ack ECT)-----+-----(Ack)---+-----(Ack)---+ | | | +----v-------------v-------------v----+ | Compliance Verification | | & Audit Trail (ECT DAG) | +-------------------------------------+ ~~~ {: #fig-architecture title="Override Architecture"} The Override Dispatcher receives the operator's signed override signal and routes it to target agents via the appropriate delivery mechanism. Each agent acknowledges the override with an ECT. The compliance verification layer monitors agent behavior to confirm the override was applied. ## Override Authority Levels ### Level 1: Advisory An Advisory override is a suggestion for the agent to reconsider its current course of action. The agent MAY comply with an Advisory override. If the agent does not comply, it MUST acknowledge receipt and provide a reason for non-compliance. Advisory overrides are appropriate when the operator wants to influence agent behavior without mandating a specific outcome. ### Level 2: Mandatory A Mandatory override is a directive for the agent to change its behavior. The agent MUST comply with a Mandatory override. The agent MUST alter its behavior as specified in the override signal and confirm compliance. Mandatory overrides are appropriate when the operator requires a specific behavioral change but the situation does not require immediate cessation of all activity. ### Level 3: Emergency An Emergency override requires immediate halt of all autonomous agent activity. The agent MUST stop all autonomous actions immediately upon receipt. The agent MUST NOT initiate any new actions until explicitly released by an authorized operator. This is the kill switch. Emergency overrides are appropriate in safety-critical situations where continued autonomous operation poses unacceptable risk. The agent MUST process Emergency overrides within 1 second of receipt. The override processing path MUST be independent of the agent's main processing loop. ### Authority Delegation and Chain of Command Override authority is derived from ACP-DAG-HITL policy. The policy defines which operator roles are authorized for each override level: - Level 1 (Advisory): Any operator with `advisory_override` role - Level 2 (Mandatory): Operators with `mandatory_override` role - Level 3 (Emergency): Operators with `emergency_override` role An operator with a higher-level role implicitly holds all lower-level roles. Authority delegation (one operator authorizing another to act on their behalf) MUST be recorded as an ECT and MUST be time-bounded. ## Override Scope ### Single Agent Override Targets a specific agent identified by its agent identifier (e.g., a SPIFFE ID). The override signal contains a single `target` value. ### Agent Group Override Targets a set of agents identified by a tag or label. The override signal contains a `target_group` value that matches agents sharing a common label (e.g., `group:firewall-agents`). ### Workflow-Wide Override Targets all agents participating in a specific workflow DAG. The override signal contains a `target_workflow` value referencing the workflow identifier. ### Domain-Wide Override Targets all agents within an administrative domain. The override signal contains `target_domain` set to `"*"` or a specific domain identifier. ## Override Signal Format Override signals are JSON Web Tokens (JWTs) {{RFC7519}} signed by the override authority using JSON Web Signature (JWS) {{RFC7515}}. The JWT payload MUST contain the following claims: ~~~json { "jti": "urn:uuid:f47ac10b-58cc-4372-a567-0e02b2c3d479", "iss": "spiffe://example.com/human/alice", "iat": 1741042800, "override_level": 3, "override_scope": { "type": "single", "target": "spiffe://example.com/agent/firewall-mgr" }, "override_action": "stop", "override_reason": "Agent blocking legitimate traffic", "override_expiry": 1741046400, "nonce": "a3f8b2c1e9d74506" } ~~~ {: #fig-signal title="Override Signal JWT Payload"} Claim definitions: `override_level`: : Integer 1-3. MUST be present. Specifies the override authority level. `override_scope`: : Object. MUST be present. Contains `type` (one of `single`, `group`, `workflow`, `domain`) and the corresponding target identifier. `override_action`: : String. MUST be present. The action the agent should take. Values include `reconsider`, `change_behavior`, `stop`, `restrict`, and `resume`. `override_reason`: : String. MUST be present. Human-readable explanation for the override. `override_expiry`: : Integer (Unix timestamp) or null. If set, the override expires automatically at this time and the agent resumes its prior mode. If null, the override persists until explicitly lifted. `nonce`: : String. MUST be present. A random value to prevent replay attacks. ### Delivery Mechanisms #### Push (Webhook) The override dispatcher sends the signed override signal as an HTTP POST {{RFC9110}} to the agent's override endpoint: ~~~ POST /.well-known/agent-override HTTP/1.1 Host: agent.example.com Content-Type: application/jose Authorization: Bearer ~~~ {: #fig-push title="Push Delivery"} #### Pull (Polling Endpoint) Agents that cannot receive inbound connections MAY poll for pending overrides: ~~~ GET /.well-known/agent-override/pending HTTP/1.1 Host: override-service.example.com Authorization: Bearer ~~~ {: #fig-pull title="Pull Delivery"} The polling interval SHOULD NOT exceed 10 seconds. For Emergency overrides, agents relying on pull delivery MUST poll at least every 5 seconds. #### Broadcast For domain-wide or group overrides, the dispatcher MAY use a broadcast mechanism. The dispatcher fans out the override signal to all matching agents and collects acknowledgments. ~~~ POST /override/broadcast HTTP/1.1 Host: override-service.example.com Content-Type: application/jose ~~~ {: #fig-broadcast title="Broadcast Delivery"} ## Override Endpoint Discovery Agents MUST advertise their override endpoint at the well-known URI `/.well-known/agent-override` per {{RFC9110}}. A GET request to `/.well-known/agent-override` MUST return the agent's override capabilities: ~~~json { "agent_id": "spiffe://example.com/agent/firewall-mgr", "supported_levels": [1, 2, 3], "delivery_mechanisms": ["push", "pull"], "max_response_time_ms": 1000, "status_endpoint": "/.well-known/agent-override/status", "protocol_version": "1.0" } ~~~ {: #fig-discovery title="Override Capability Advertisement"} # Compliance and Verification ## Acknowledgment Protocol ### Override Receipt Acknowledgment Upon receiving an override signal, the agent MUST respond with an acknowledgment within the following timeframes: - Level 1 (Advisory): 5 seconds - Level 2 (Mandatory): 2 seconds - Level 3 (Emergency): 1 second The acknowledgment is an ECT with `exec_act` set to the appropriate override acknowledgment value: ~~~json { "exec_act": "override_ack", "par": [""], "ext": { "override.status": "received", "override.level": 3, "override.prior_state": "autonomous", "override.effective_at": "2026-03-06T12:00:00.123Z" } } ~~~ {: #fig-ack title="Override Receipt Acknowledgment ECT"} ### Compliance Confirmation After the agent has changed its behavior in response to the override, it MUST emit a compliance confirmation ECT: ~~~json { "exec_act": "override_complied", "par": [""], "ext": { "override.status": "complied", "override.current_state": "stopped", "override.actions_terminated": 3, "override.evidence": "All autonomous tasks halted" } } ~~~ {: #fig-compliance title="Compliance Confirmation ECT"} ### Non-Compliance Reporting and Escalation For Level 1 (Advisory) overrides, the agent MAY decline to comply. In this case, the agent MUST emit a non-compliance ECT: ~~~json { "exec_act": "override_declined", "par": [""], "ext": { "override.status": "declined", "override.reason": "Action is within policy bounds", "override.level": 1 } } ~~~ {: #fig-noncompliance title="Non-Compliance ECT (Advisory Only)"} For Level 2 and Level 3 overrides, the agent MUST NOT decline. If the agent cannot fully comply (e.g., due to hardware limitations), it MUST report partial compliance with a description of what could not be done. The override dispatcher MUST escalate partial compliance to the operator. ## Compliance Verification ### Behavioral Verification Post-Override After an agent acknowledges an override, the compliance verification system SHOULD monitor the agent's subsequent behavior to confirm the override was actually applied. Verification methods include: - Observing that the agent's ECT emissions cease (for Level 3) - Checking that subsequent ECTs contain only permitted actions (for Level 2 with restrictions) - Querying the agent's status endpoint ### Timeout and Retry Semantics If the agent does not acknowledge within the required timeframe: 1. The dispatcher MUST retry the override signal once after 2 seconds. 2. If no acknowledgment is received after the retry, the dispatcher MUST escalate to the operator. 3. For Level 3 (Emergency) overrides, the dispatcher SHOULD attempt alternative delivery mechanisms (e.g., switching from push to broadcast). 4. If all delivery attempts fail, the graceful degradation policy applies (see {{graceful-degradation}}). ## Graceful Degradation {#graceful-degradation} ### Unreachable Override Target When the override target agent is unreachable, the system MUST: 1. Log an ECT with `exec_act`: `"override_delivery_failed"` documenting the failure. 2. Notify the operator of the delivery failure. 3. Attempt delivery via alternative mechanisms. ### Failsafe Defaults Agents MUST implement a dead man's switch: if the agent loses contact with the override service for a configurable duration (default: 90 seconds), the agent MUST enter a failsafe state equivalent to Level 2 (Mandatory) with restricted operations. The failsafe policy is configured in the agent's ACP-DAG-HITL policy and MUST specify one of: - `safe_pause`: Enter Level 2 with read-only operations permitted. - `full_stop`: Enter Level 3 equivalent (cease all actions). - `continue_logged`: Continue operating but emit warning ECTs at elevated frequency. This option is only permitted at HITL intensity I0 or I1. ### Proxy Override for Offline Agents When an agent is offline, the override dispatcher MAY apply the override to the agent's proxy or orchestrator. The proxy MUST: 1. Queue the override signal for delivery when the agent reconnects. 2. Prevent new tasks from being dispatched to the offline agent. 3. Emit an ECT recording the proxy override action. When the agent reconnects, the proxy MUST deliver the queued override signal. The agent MUST process it as if it were received in real time, applying the override level and action specified. # Integration with ACP-DAG-HITL and ECT ## Override Authorization via ACP Policy Override authority is governed by ACP-DAG-HITL policy tokens {{I-D.nennemann-agent-dag-hitl-safety}}. The policy token specifies: - Which operator roles are authorized for each override level. - Which agents or agent groups each role may override. - Escalation chains when primary operators are unavailable. The override dispatcher MUST verify the operator's JWT against the ACP policy before routing the override signal. An override signal from an unauthorized operator MUST be rejected with HTTP 403 and logged as a security event. ## Override Events as ECT Nodes Every override interaction produces ECT nodes {{I-D.nennemann-wimse-ect}} that are linked into the workflow DAG: | Event | `exec_act` value | |-------|------------------| | Advisory override issued | `override_advisory` | | Mandatory override issued | `override_mandatory` | | Emergency override issued | `override_emergency` | | Override acknowledged | `override_ack` | | Override complied | `override_complied` | | Override declined (Advisory only) | `override_declined` | | Override delivery failed | `override_delivery_failed` | | Override lifted | `override_lifted` | | Override expired | `override_expired` | {: #fig-ect-actions title="Override ECT exec_act Values"} Each override ECT references the triggering override signal's `jti` via the `par` claim, maintaining the causal chain in the DAG. ## Override Audit Trail The sequence of override ECTs provides a complete, tamper-evident audit trail: 1. The operator issues an override (override ECT with operator identity, reason, and level). 2. The agent acknowledges (ack ECT linked to override ECT). 3. The agent confirms compliance (compliance ECT linked to ack ECT). 4. Optionally, the operator lifts the override (lift ECT linked to override ECT). At AEM assurance level L3, all override ECTs MUST be committed to the immutable audit ledger. # Security Considerations ## Unauthorized Override Attempts Override signals that fail authentication or authorization MUST be rejected. The agent MUST NOT alter its behavior in response to an unsigned or improperly signed override signal. All rejected override attempts MUST be logged with the source identity (if available) and the reason for rejection. ## Replay Protection for Override Signals Agents MUST reject override signals with: - An `iat` claim more than 30 seconds in the past. - A `jti` that matches a previously processed override signal. - A missing or invalid `nonce` claim. Agents MUST maintain a cache of recently processed `jti` values for at least 5 minutes to detect replays. ## Override Signal Tampering Override signals are signed JWTs. Agents MUST verify the signature against the operator's public key (as registered in ACP-DAG-HITL policy) before processing. Agents MUST reject signals with invalid or expired signatures. ## Denial-of-Service via Override Flooding To prevent abuse, agents SHOULD implement rate limiting on the override endpoint: - Level 1 (Advisory): Maximum 10 signals per minute per operator. - Level 2 (Mandatory): Maximum 5 signals per minute per operator. - Level 3 (Emergency): No rate limit (to ensure emergency overrides are never blocked), but agents MUST log high-frequency Emergency overrides as potential abuse. The override endpoint SHOULD be served on a separate port or network interface from the agent's main API to ensure availability during overload conditions. ## Authority Impersonation Agents MUST verify override authority by: 1. Validating the operator JWT signature against trusted keys. 2. Confirming the operator's role matches the required role for the override level. 3. Verifying the operator is authorized to override the specific target agent(s) per ACP policy. Deployments SHOULD implement multi-operator approval for Level 3 (Emergency) overrides affecting domain-wide scope, requiring two independent operator JWTs. # IANA Considerations ## Well-Known URI Registration This document requests registration of the following well-known URI suffix per {{RFC9110}}: | URI Suffix | Description | |------------|-------------| | `agent-override` | Agent override endpoint for receiving override signals, querying capabilities, and reporting status | {: #fig-wellknown title="Well-Known URI Registration"} ## Override exec_act Values This document requests registration of the following `exec_act` values in the ECT Action Type Registry: | Value | Description | Reference | |-------|-------------|-----------| | `override_advisory` | Advisory override signal issued | This document | | `override_mandatory` | Mandatory override signal issued | This document | | `override_emergency` | Emergency override signal issued | This document | | `override_ack` | Agent acknowledgment of override | This document | | `override_complied` | Agent confirmed compliance | This document | | `override_declined` | Agent declined advisory override | This document | | `override_delivery_failed` | Override delivery failure | This document | | `override_lifted` | Override explicitly lifted | This document | | `override_expired` | Override expired by TTL | This document | {: #fig-iana-actions title="Override exec_act Value Registrations"} ## Override JWT Claims This document requests registration of the following JWT claims in the IANA JSON Web Token Claims registry: | Claim Name | Description | Reference | |------------|-------------|-----------| | `override_level` | Override authority level (1-3) | This document | | `override_scope` | Target scope of the override | This document | | `override_action` | Directed action for the agent | This document | | `override_reason` | Human-readable override justification | This document | | `override_expiry` | Override expiration timestamp | This document | {: #fig-iana-claims title="Override JWT Claim Registrations"} --- back # Acknowledgments {:numbered="false"} This document absorbs and supersedes the override protocol aspects of the Human Emergency Override Protocol (HEOP) and the HITL Primitives specification. The override level design is inspired by industrial safety systems (IEC 62061, ISO 13849). The protocol integrates with the Agent Context Policy Token {{I-D.nennemann-agent-dag-hitl-safety}} for authorization and the Execution Context Token {{I-D.nennemann-wimse-ect}} for audit.