Internet-Draft nmrg Intended status: standards-track March 2026 Expires: September 05, 2026 Real-Time Agent Rollback Protocol (RARP) for Autonomous Network Operations draft-agent-ecosystem-agent-rollback-protocol-00 Abstract Autonomous agents in network operations environments require the ability to quickly and safely rollback actions when incorrect decisions are made. While existing protocols enable agent communication and coordination, no standardized mechanism exists for distributed rollback operations across heterogeneous agent systems. This document specifies the Real-Time Agent Rollback Protocol (RARP), which provides coordinated rollback mechanisms for autonomous network agents. RARP defines checkpoint creation, rollback initiation procedures, state consistency verification, and cross-domain rollback coordination through agent gateways. The protocol integrates with existing agent communication frameworks and supports both immediate rollback for safety-critical scenarios and delayed rollback for complex distributed operations. RARP enables production deployment of autonomous network operations by providing the safety mechanisms necessary for agent decision reversal across distributed systems. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. This document is intended to have standards-track status. Distribution of this memo is unlimited. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Rollback Point A consistent state snapshot across distributed agents from which rollback operations can be initiated Agent Transaction A coordinated set of actions performed by one or more agents that can be treated as an atomic unit for rollback purposes Rollback Coordinator An entity responsible for orchestrating rollback operations across multiple agents and domains Checkpoint Consistency The property that all agents participating in a rollback point have synchronized their state at the same logical time Cross-Domain Rollback A rollback operation that spans multiple administrative or protocol domains requiring gateway-mediated coordination Immediate Rollback A rollback operation initiated without coordination delays for safety-critical scenarios Coordinated Rollback A rollback operation that requires multi-agent coordination and consensus before execution Table of Contents 1. Introduction ................................................ 3 2. Terminology ................................................. 4 3. Problem Statement ........................................... 5 4. RARP Architecture and Components ............................ 6 5. Checkpoint Creation and Management .......................... 7 6. Rollback Initiation and Coordination ........................ 8 7. Integration with Existing Agent Protocols ................... 9 8. Security Considerations ..................................... 10 9. IANA Considerations ......................................... 11 10. References .................................................. 12 1. Introduction The proliferation of autonomous agents in network operations has introduced unprecedented capabilities for self-healing, optimization, and adaptive management across complex distributed systems. As described in [draft-chuyi-nmrg-ai-agent-network], AI- powered agents can now perform sophisticated reasoning and decision-making across previously isolated network management domains. However, the autonomous nature of these systems introduces a critical challenge: when agents make incorrect decisions or encounter unexpected conditions, there exists no standardized mechanism to safely and efficiently reverse their actions across distributed environments. Current agent communication frameworks, including those specified in [draft-fu-nmop-agent-communication-framework] and [draft-li- dmsc-macp], provide robust mechanisms for agent coordination and message exchange but do not address the fundamental requirement for transaction-like rollback capabilities. While traditional network management protocols such as NETCONF [RFC6241] include rollback mechanisms for configuration changes, these operate within single administrative domains and cannot coordinate complex rollback operations across heterogeneous agent systems spanning multiple domains and protocol layers. The Real-Time Agent Rollback Protocol (RARP) addresses this gap by providing a standardized framework for coordinated rollback operations in autonomous network environments. RARP builds upon existing agent communication protocols and extends the cross- domain collaboration mechanisms outlined in [draft-han-rtgwg- agent-gateway-intercomm-framework] to enable rollback coordination through gateway intermediaries. The protocol supports both immediate rollback for safety-critical scenarios where agent actions must be reversed without delay, and coordinated rollback for complex distributed operations requiring multi-agent consensus and state synchronization. The architecture defined in this document integrates with existing agent controller coordination mechanisms [draft-jadoon-nmrg- agentic-ai-autonomous-networks] while introducing specialized rollback coordinators and checkpoint managers that operate alongside current agent communication infrastructure. RARP leverages established security frameworks including TLS 1.3 [RFC8446] and OAuth 2.0 [RFC6749] to ensure authenticated and authorized rollback operations across administrative boundaries. By providing these safety mechanisms, RARP enables the production deployment of autonomous network operations with the confidence that agent decisions can be safely reversed when necessary. This specification defines the protocol semantics, message formats using JSON [RFC8259] encoding, and integration patterns necessary for implementing RARP across diverse agent ecosystems. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Terminology This document uses terminology consistent with existing agent communication and network management protocols. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. The following terms are defined for use throughout this specification: Agent: An autonomous software entity capable of making decisions and performing actions in network operations environments, as defined in [draft-fu-nmop-agent-communication-framework]. Agents operate with varying degrees of autonomy and may collaborate through standardized communication protocols. Agent Gateway: A protocol intermediary that enables communication and coordination between agents operating in different administrative domains or using different communication protocols, as specified in [draft-han-rtgwg-agent-gateway-intercomm- framework]. Agent gateways provide protocol translation and policy enforcement for cross-domain agent interactions. Agent Transaction: A coordinated set of actions performed by one or more agents that can be treated as an atomic unit for rollback purposes. Agent transactions may span multiple network devices, protocol domains, or administrative boundaries and maintain consistency properties across distributed operations. Checkpoint: A persistent snapshot of agent state and network configuration that serves as a potential rollback target. Checkpoints contain sufficient information to restore agents and affected network elements to a previously known consistent state. Checkpoint Consistency: The property that all agents participating in a rollback point have synchronized their state at the same logical time. Consistency verification ensures that rollback operations restore the system to a coherent state across all participating entities. Checkpoint Manager: A system component responsible for creating, storing, validating, and managing rollback checkpoints. Checkpoint managers coordinate with agents to capture state snapshots and maintain checkpoint metadata required for rollback operations. Coordination State: The current status of multi-agent collaboration activities, including pending transactions, active rollback operations, and inter-agent dependencies. Coordination states are maintained by rollback coordinators to ensure proper sequencing of rollback operations. Cross-Domain Rollback: A rollback operation that spans multiple administrative or protocol domains requiring gateway-mediated coordination. Cross-domain rollbacks involve additional complexity for authentication, authorization, and state synchronization across domain boundaries. Coordinated Rollback: A rollback operation that requires multi- agent coordination and consensus before execution. Coordinated rollbacks involve explicit agreement protocols to ensure all affected agents participate in the rollback operation and reach consistent post-rollback states. Immediate Rollback: A rollback operation initiated without coordination delays for safety-critical scenarios. Immediate rollbacks prioritize rapid response over coordination completeness and are typically used when network safety or security is at immediate risk. Rollback Coordinator: An entity responsible for orchestrating rollback operations across multiple agents and domains. Rollback coordinators implement the consensus and coordination protocols required for distributed rollback operations and may operate in hierarchical configurations for scalability. Rollback Point: A consistent state snapshot across distributed agents from which rollback operations can be initiated. Rollback points represent verified consistent states that can be safely restored through coordinated agent actions. 3. Problem Statement The deployment of autonomous agents in network operations environments introduces fundamental challenges in ensuring operational safety through reliable rollback mechanisms. Current agent communication protocols, including those specified in [draft-fu-nmop-agent-communication-framework] and [draft-han- rtgwg-agent-gateway-intercomm-framework], provide sophisticated mechanisms for agent coordination and cross-domain collaboration but lack standardized approaches for distributed rollback operations. When autonomous agents make incorrect decisions or encounter unexpected failure conditions, the ability to quickly and consistently revert to a known-good state becomes critical for maintaining network stability and service availability. State consistency across distributed agent systems presents the most significant challenge in implementing effective rollback mechanisms. Unlike traditional centralized systems where rollback operations can be performed atomically, autonomous network agents operate across multiple administrative domains, protocol layers, and time scales as described in [draft-jadoon-nmrg-agentic-ai- autonomous-networks]. Each agent maintains its own local state and interacts with network infrastructure through different interfaces, including NETCONF [RFC6241], RESTful APIs, and proprietary management protocols. Ensuring that all participating agents can synchronously return to a consistent checkpoint state requires sophisticated coordination mechanisms that current agent communication frameworks do not provide. The distributed nature of these systems means that network partitions, communication delays, and partial failures can result in inconsistent rollback states where some agents successfully revert while others remain in post- action states. Cross-domain coordination introduces additional complexity as agents operating in different administrative domains must coordinate rollback operations through gateway intermediaries. The agent gateway framework specified in [draft-han-rtgwg-agent- gateway-intercomm-framework] enables cross-domain agent collaboration but does not address the specific requirements for propagating rollback requests, maintaining checkpoint consistency across domain boundaries, or handling authorization and security constraints in multi-domain rollback scenarios. Different domains may have varying rollback policies, checkpoint retention requirements, and security constraints that must be negotiated and enforced during cross-domain rollback operations. Furthermore, the hierarchical nature of network operations means that rollback decisions made at higher levels may cascade to multiple lower- level domains, requiring sophisticated dependency tracking and coordination protocols. Timing constraints in network operations environments create additional challenges for rollback protocol design. Safety- critical scenarios, such as security incidents or cascading failures, require immediate rollback capabilities that cannot wait for full distributed coordination to complete. However, immediate rollback operations risk creating inconsistent states if not all participating agents can execute the rollback synchronously. Conversely, complex distributed operations may require coordinated rollback procedures that involve extensive negotiation and validation phases, but network conditions may change during these coordination periods, potentially invalidating the target rollback state. Current agent communication protocols lack mechanisms for expressing these timing constraints and do not provide differentiated handling for immediate versus coordinated rollback scenarios. Existing agent communication frameworks also lack adequate mechanisms for rollback-specific concerns including checkpoint metadata management, rollback authorization, and audit trail generation. The multi-agent coordination protocols specified in [draft-li-dmsc-macp] provide general coordination primitives but do not address the specific state management requirements for maintaining consistent checkpoint data across distributed systems. Additionally, current protocols do not define standardized approaches for validating checkpoint integrity, handling rollback conflicts when multiple agents attempt simultaneous rollback operations, or providing the detailed audit capabilities required for post-rollback analysis and compliance reporting in production network environments. 4. RARP Architecture and Components The Real-Time Agent Rollback Protocol architecture is designed to integrate seamlessly with existing autonomous agent infrastructures while providing coordinated rollback capabilities across distributed network operations environments. The architecture follows a layered approach that separates rollback coordination logic from agent-specific implementations, enabling deployment across heterogeneous agent systems. RARP components leverage existing agent communication frameworks defined in [draft-fu-nmop-agent-communication-framework] and integrate with agent gateway mechanisms specified in [draft-han-rtgwg-agent- gateway-intercomm-framework] to provide cross-domain rollback coordination capabilities. The core RARP architecture consists of three primary component types: Rollback Coordinators, Checkpoint Managers, and Agent Rollback Interfaces. Rollback Coordinators serve as the orchestration layer for rollback operations and MUST implement coordination protocols for both immediate and delayed rollback scenarios. These coordinators maintain awareness of agent relationships, transaction boundaries, and rollback dependencies across the distributed system. Checkpoint Managers handle the creation, storage, validation, and retrieval of rollback points, implementing consistency verification procedures to ensure distributed state coherence. Agent Rollback Interfaces provide the integration layer between RARP components and existing agent systems, translating rollback operations into agent-specific state restoration procedures while maintaining compatibility with established agent communication protocols. RARP supports both hierarchical and distributed deployment models to accommodate varying network topologies and administrative requirements. In hierarchical deployments, a primary Rollback Coordinator oversees subordinate coordinators within each administrative domain, providing centralized rollback decision- making while delegating local coordination to domain-specific components. This model aligns with the centralized agent controller coordination patterns described in [draft-jadoon-nmrg- agentic-ai-autonomous-networks] and enables efficient rollback operations across large-scale autonomous network deployments. Distributed deployments eliminate single points of failure by implementing peer-to-peer coordination among Rollback Coordinators, using consensus mechanisms to ensure consistent rollback decisions across all participating domains. Integration with existing agent gateway infrastructure enables RARP to operate across heterogeneous agent systems without requiring modifications to established communication protocols. Agent gateways specified in [draft-han-rtgwg-agent-gateway- intercomm-framework] are extended with RARP capability negotiation and rollback message translation functions, allowing rollback coordination between agents using different communication frameworks. The architecture maintains protocol compatibility by implementing rollback operations as extensions to existing agent collaboration protocols rather than replacing established communication mechanisms. This approach ensures that RARP can be incrementally deployed in production environments without disrupting existing agent operations. The RARP architecture incorporates checkpoint consistency verification mechanisms that operate independently of agent- specific state representations. Checkpoint Managers implement distributed timestamp synchronization and state validation procedures to ensure that rollback points represent truly consistent distributed states across all participating agents. The architecture supports integration with AI Agent Network systems as described in [draft-chuyi-nmrg-ai-agent-network] by providing rollback interfaces that can reverse automated reasoning and decision-making operations performed by large language model-based agents. Component communication within the RARP architecture utilizes secure transport mechanisms including TLS 1.3 [RFC8446] and QUIC [RFC9000] to ensure rollback coordination messages are protected against tampering and unauthorized access during transmission between distributed components. 5. Checkpoint Creation and Management Checkpoint creation in RARP enables autonomous agents to establish consistent state snapshots that serve as restoration points for rollback operations. Agents MUST implement checkpoint creation capabilities that capture both local state information and coordination metadata necessary for distributed rollback operations. The checkpoint creation process involves state serialization, metadata generation, and consistency coordination with peer agents participating in the same logical transaction scope. Agents SHOULD create checkpoints at natural transaction boundaries and MAY create additional checkpoints based on risk assessment algorithms or external triggers. The checkpoint data structure MUST include agent state information, transaction identifiers, temporal consistency markers, and dependency relationships with other agents as specified in [draft-han-rtgwg-agent-gateway-intercomm-framework]. Checkpoint metadata MUST conform to the JSON format specified in [RFC8259] and include fields for checkpoint identifier, creation timestamp, agent identifier, transaction scope, dependency list, and integrity verification data. Cross-domain checkpoints MUST additionally include gateway coordination information and domain- specific authorization tokens as defined in [draft-fu-nmop-agent- communication-framework]. The checkpoint identifier MUST be globally unique and SHOULD incorporate both temporal and spatial components to ensure uniqueness across distributed deployments. Checkpoint storage mechanisms MUST provide durability guarantees appropriate for the operational context and SHOULD implement redundancy strategies to prevent single points of failure. Agents MAY utilize local storage, distributed storage systems, or centralized checkpoint repositories depending on deployment constraints and consistency requirements. Storage implementations MUST support atomic write operations and SHOULD provide integrity verification through cryptographic mechanisms as specified in [RFC8446]. Cross-domain checkpoint storage MUST implement access control mechanisms that respect administrative boundaries while enabling authorized rollback operations. Checkpoint consistency verification ensures that distributed checkpoints represent a globally consistent state across all participating agents. The consistency verification process MUST implement logical clock synchronization or vector clock mechanisms to establish temporal relationships between distributed checkpoints. Agents MUST validate checkpoint consistency before committing checkpoint data and SHOULD implement timeout mechanisms to handle non-responsive participants. For cross-domain scenarios, consistency verification MUST account for network partitions and administrative policy constraints that may affect coordination capabilities. Checkpoint lifecycle management encompasses creation, validation, storage, retrieval, and cleanup operations across the distributed agent system. Agents MUST implement checkpoint retention policies that balance storage costs with rollback capability requirements and SHOULD provide configuration mechanisms for policy customization. Checkpoint cleanup operations MUST respect dependency relationships and transaction boundaries to prevent premature deletion of required rollback data. The checkpoint manager component SHOULD implement background processes for checkpoint optimization, compression, and garbage collection to maintain system performance over extended operational periods. 6. Rollback Initiation and Coordination Rollback operations in RARP are initiated through a well-defined trigger and coordination mechanism that ensures consistent state recovery across distributed agent systems. Rollback initiation can occur through multiple pathways: explicit administrative commands, automated safety triggers when agents detect anomalous conditions, or cascade triggers when dependent agent operations fail. The protocol defines two primary rollback modes - immediate rollback for safety-critical scenarios where rapid state recovery is essential, and coordinated rollback for complex distributed operations requiring multi-agent consensus. All rollback operations MUST specify a target rollback point identifier and include sufficient context information to enable receiving agents to validate the rollback request against their local checkpoint metadata. The coordination messaging framework builds upon the Cross-Domain Agent Collaboration Protocol [draft-han-rtgwg-agent-gateway- intercomm-framework] to enable rollback operations across heterogeneous agent systems and administrative boundaries. When a rollback coordinator receives a rollback initiation request, it MUST first validate the requesting entity's authorization and verify that the target rollback point exists across all participating agents. The coordinator then broadcasts a rollback preparation message to all agents within the rollback scope, allowing each agent to perform local consistency checks and report any conflicts or dependencies that might prevent successful rollback. This two-phase approach ensures that rollback operations only proceed when all participating agents can successfully return to the specified rollback point without creating inconsistent intermediate states. Immediate rollback scenarios bypass the standard coordination phase when safety-critical conditions are detected, such as security breaches or network failures that require rapid remediation. In immediate rollback mode, the rollback coordinator MUST issue rollback execution commands directly to all participating agents without waiting for preparation confirmations, accepting the risk of temporary inconsistency in favor of rapid recovery. Agents receiving immediate rollback commands SHALL prioritize rollback execution over normal operations and SHOULD complete rollback within the time bounds specified in the rollback request. The protocol defines fallback procedures for handling agents that cannot complete immediate rollback operations, including isolation mechanisms to prevent inconsistent agents from affecting the recovered system state. Coordinated rollback operations involve a more complex multi-phase protocol that ensures consistency across distributed agent systems through explicit consensus mechanisms. Following the preparation phase, agents that successfully validate the rollback request send confirmation messages to the rollback coordinator, while agents that detect conflicts or missing checkpoint data send abort messages with detailed error information. The coordinator implements a configurable consensus policy that determines whether to proceed with rollback based on the responses received - strict consensus requires all agents to confirm, while majority consensus allows rollback to proceed if a sufficient percentage of agents confirm readiness. If consensus is achieved, the coordinator broadcasts commit messages triggering simultaneous rollback execution; if consensus fails, the coordinator issues abort messages and logs the rollback attempt for administrative review. Conflict resolution mechanisms address scenarios where multiple concurrent rollback requests or overlapping rollback scopes create coordination challenges. The protocol employs a priority-based conflict resolution system where rollback requests include priority levels, timestamps, and scope identifiers that enable coordinators to determine precedence when conflicts occur. Higher priority rollback operations, such as security-related rollbacks, automatically supersede lower priority operations, while rollback requests with overlapping scope are serialized based on timestamp ordering. Cross-domain rollback conflicts are resolved through gateway-mediated negotiation procedures that leverage the agent controller coordination mechanisms defined in [draft-jadoon-nmrg- agentic-ai-autonomous-networks] to ensure consistent rollback decisions across administrative boundaries. The protocol includes comprehensive error handling and recovery procedures for rollback coordination failures, recognizing that rollback operations themselves may encounter system failures or network partitions. When rollback coordination fails due to network issues or coordinator failures, backup coordinators automatically assume responsibility for completing the rollback operation using persistent coordination state stored during the initial phases. Partial rollback failures, where some agents successfully rollback while others fail, trigger automatic reconciliation procedures that either retry the failed rollback operations or initiate compensating actions to restore system consistency. All rollback coordination activities are logged with sufficient detail to enable post-incident analysis and continuous improvement of rollback procedures in production autonomous network operations environments. 7. Integration with Existing Agent Protocols RARP is designed to integrate seamlessly with existing agent communication frameworks and protocols, leveraging established mechanisms while extending them with rollback-specific capabilities. The protocol operates as an overlay service that can be bound to various underlying agent communication protocols, including those defined in [draft-fu-nmop-agent-communication- framework] and [draft-li-dmsc-macp]. Integration is achieved through protocol-specific binding specifications that map RARP operations to the message formats and coordination mechanisms of the underlying framework. This approach ensures that RARP can be deployed incrementally without requiring wholesale replacement of existing agent infrastructure. For cross-domain scenarios, RARP extends the gateway mechanisms defined in [draft-han-rtgwg-agent-gateway-intercomm-framework] to support rollback coordination across administrative boundaries. Agent gateways MUST implement RARP-specific message translation and state synchronization functions when serving as intermediaries for cross-domain rollback operations. The gateway extensions include rollback capability negotiation during agent discovery, checkpoint metadata translation between domains, and coordination of distributed rollback timing. Gateways SHOULD maintain rollback context for active cross-domain agent transactions and MUST participate in checkpoint consistency verification procedures when coordinating multi-domain rollbacks. RARP bindings for common transport protocols are defined to ensure broad compatibility with existing deployments. For NETCONF-based agent communication [RFC6241], RARP operations are encapsulated within custom RPC operations that extend the base protocol capabilities. HTTP/2 and HTTP/3 [RFC9000] bindings utilize JSON- encoded messages [RFC8259] for rollback coordination, with TLS 1.3 [RFC8446] providing transport security. WebSocket connections MAY be used for real-time rollback notifications in environments requiring low-latency coordination. Each binding specification defines the mapping between RARP primitive operations and the specific message formats and error handling mechanisms of the underlying protocol. The integration architecture supports both centralized and distributed coordination models as described in [draft-jadoon- nmrg-agentic-ai-autonomous-networks]. In centralized deployments, a single rollback coordinator interfaces with existing agent controllers to provide system-wide rollback capabilities. Distributed deployments utilize peer-to-peer coordination among agents while maintaining compatibility with hierarchical agent architectures. RARP implementations MUST support capability advertisement through existing agent discovery mechanisms, allowing agents to negotiate rollback support and identify compatible rollback coordinators during system initialization. Authentication and authorization for RARP operations leverage existing agent security frameworks where possible. OAuth 2.0 [RFC6749] tokens MAY be used for cross-domain authorization when integrating with web-based agent platforms. The protocol defines extension points for integrating with domain-specific authentication mechanisms while maintaining consistent rollback authorization policies. Implementations SHOULD reuse existing agent identity management infrastructure to minimize operational complexity and ensure consistent security policies across normal operations and rollback scenarios. 8. Security Considerations The rollback capabilities provided by RARP introduce several security considerations that must be addressed to ensure safe deployment in production autonomous network environments. Rollback operations inherently involve state manipulation and coordination across distributed systems, creating potential attack vectors that could be exploited to disrupt network operations or gain unauthorized access to sensitive network state information. The cross-domain nature of RARP operations, as described in [draft- han-rtgwg-agent-gateway-intercomm-framework], further amplifies these security concerns by introducing trust boundaries and protocol translation points where security policies may differ. Authorization and access control for rollback operations MUST be implemented using strong authentication mechanisms consistent with [RFC8446] for transport-layer security and [RFC6749] for authorization delegation across domains. Each rollback coordinator and participating agent MUST authenticate its identity before initiating or participating in rollback operations. The protocol MUST enforce role-based access control where only authorized entities can initiate rollback operations for specific network domains or agent systems. Cross-domain rollback operations MUST validate authorization chains through gateway intermediaries, ensuring that rollback requests are properly authenticated at each administrative boundary. Emergency or immediate rollback operations SHOULD maintain security requirements while providing expedited authorization paths for safety-critical scenarios. Comprehensive audit trails MUST be maintained for all rollback operations to ensure accountability and enable forensic analysis of network incidents. The audit system MUST record rollback initiation events, participating agents, checkpoint identifiers, authorization decisions, and completion status using tamper- resistant logging mechanisms. These audit records MUST be synchronized across participating domains and stored with sufficient integrity protection to prevent unauthorized modification. The audit trail format SHOULD be compatible with existing network management audit systems and MUST include sufficient detail to reconstruct the sequence of events leading to and following rollback operations. Protection against malicious rollback attacks requires careful consideration of potential attack vectors including replay attacks, unauthorized rollback initiation, and checkpoint poisoning. The protocol MUST implement sequence numbers and timestamps to prevent replay of rollback messages, with verification of message freshness using techniques consistent with [RFC9000]. Rollback coordinators MUST validate checkpoint integrity before executing rollback operations and SHOULD implement rate limiting to prevent denial-of-service attacks through excessive rollback requests. The protocol MUST detect and mitigate attempts to rollback to compromised or maliciously modified checkpoints through cryptographic verification of checkpoint contents and metadata. Cross-domain security implications require special consideration for trust establishment and security policy coordination between administrative domains. Gateway entities facilitating cross-domain rollback MUST enforce security policy translation and ensure that rollback operations comply with the security requirements of all participating domains. The protocol MUST support security policy negotiation to establish common security parameters for cross- domain rollback operations while maintaining the security standards of the most restrictive participating domain. Inter- domain rollback operations SHOULD implement additional verification steps and MAY require human authorization for operations that could significantly impact network stability across domain boundaries. 9. IANA Considerations This document requests the creation of several new IANA registries for the Real-Time Agent Rollback Protocol (RARP) and the registration of initial values. The registries are necessary to ensure consistent implementation and interoperability of RARP across different autonomous agent systems and administrative domains. These registries support the protocol's integration with existing agent communication frameworks as defined in [draft-fu- nmop-agent-communication-framework] and cross-domain coordination mechanisms specified in [draft-han-rtgwg-agent-gateway-intercomm- framework]. IANA is requested to create a new registry group titled "Real-Time Agent Rollback Protocol (RARP) Parameters" with four sub- registries. The "RARP Message Types" registry MUST contain 16-bit unsigned integer values from 0 to 65535, with values 0-255 reserved for IANA allocation and 256-65535 designated for first- come, first-served registration following [RFC8126] guidelines. Initial registrations MUST include: ROLLBACKREQUEST (1), ROLLBACKRESPONSE (2), CHECKPOINTCREATE (3), CHECKPOINTVALIDATE (4), COORDINATIONINIT (5), and COORDINATIONCOMPLETE (6). Each registration requires a message type name, numeric value, description, and reference to this specification or subsequent extensions. The "RARP Error Codes" registry SHALL use 16-bit unsigned integer values with similar allocation policies. Initial error code registrations MUST include: CHECKPOINTNOTFOUND (1001), INSUFFICIENTPERMISSIONS (1002), ROLLBACKCONFLICT (1003), CROSSDOMAINFAILURE (1004), STATEINCONSISTENT (1005), and COORDINATIONTIMEOUT (1006). The "RARP Capability Identifiers" registry uses string-based identifiers following the reverse DNS naming convention to prevent namespace collisions. Initial capability identifiers SHOULD include "rollback.immediate", "rollback.coordinated", "checkpoint.distributed", and "integration.gateway" to support the core protocol functionality and integration patterns described in this specification. The "RARP Agent Transaction Types" registry supports the classification and coordination of rollback operations across heterogeneous agent systems. This registry uses string-based identifiers and MUST include initial registrations for "network.configuration", "routing.policy", "security.rule", and "service.deployment" to align with common network operations use cases. Registration procedures for all RARP registries MUST require specification of the parameter name, value, description, security considerations if applicable, and reference document. Registrants SHOULD provide interoperability considerations when the parameter affects cross-domain operations or integration with existing protocols such as NETCONF [RFC6241] or agent gateway frameworks. All RARP registry entries MUST be subject to expert review for values in the IANA allocation ranges, with designated experts evaluating technical soundness, potential conflicts with existing registrations, and alignment with RARP architectural principles. The expert review process SHALL consider the impact on cross- domain rollback coordination and compatibility with existing agent communication protocols. Registry updates affecting security- sensitive parameters such as authorization capabilities or cross- domain coordination mechanisms require additional security review to ensure consistency with the security considerations outlined in Section 8 of this specification and general security practices for autonomous network operations. 10. References 10.1. Normative References [RFC 2119] RFC 2119 [RFC 8174] RFC 8174 [RFC 8259] RFC 8259 [RFC 6241] RFC 6241 [draft-han-rtgwg-agent-gateway-intercomm-framework] draft-han-rtgwg-agent-gateway-intercomm-framework [draft-li-dmsc-macp] draft-li-dmsc-macp [draft-fu-nmop-agent-communication-framework] draft-fu-nmop-agent-communication-framework 10.2. Informative References [RFC 8446] RFC 8446 [RFC 9000] RFC 9000 [RFC 6749] RFC 6749 [draft-chuyi-nmrg-ai-agent-network] draft-chuyi-nmrg-ai-agent-network [draft-jadoon-nmrg-agentic-ai-autonomous-networks] draft-jadoon-nmrg-agentic-ai-autonomous-networks [draft-vandoulas-aidp] draft-vandoulas-aidp [draft-cui-ai-agent-discovery-invocation] draft-cui-ai-agent-discovery-invocation [draft-wang-nmrg-magent-im] draft-wang-nmrg-magent-im [draft-cui-nmrg-llm-benchmark] draft-cui-nmrg-llm-benchmark [draft-yue-anima-agent-recovery-networks] draft-yue-anima-agent-recovery-networks Author's Address Generated by IETF Draft Analyzer Family: agent-ecosystem 2026-03-04