feat: proposal intake pipeline with AI-powered generation on /proposals/new

Add full proposal system: DB schema (proposals + proposal_gaps tables),
CLI `ietf intake` command, and web UI with Quick Generate on /proposals/new.
The new page merges AI intake (paste URL/text → Haiku generates multiple
proposals auto-linked to gaps) with manual form entry. Generated proposals
are clickable cards that fill the editor below for refinement.

Uses claude_model_cheap (Haiku) for cost-efficient web intake. Includes
CaML-inspired draft proposals from arXiv:2503.18813 analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-09 03:15:11 +01:00
parent ae5e5f8cbf
commit 5ec7410b89
20 changed files with 3316 additions and 2 deletions

View File

@@ -0,0 +1,268 @@
---
title: "Data Provenance Tracking Protocol for AI Agent Communications"
draft_name: draft-nennemann-ai-agent-provenance-00
intended_wg: SECDISPATCH or WIMSE
status: outline
gaps_addressed: [84, 88, 93]
camel_sections: [5.3, 5.4]
date: 2026-03-09
---
# Data Provenance Tracking Protocol for AI Agent Communications
## 1. Problem Statement
When AI agents process data through multi-step tool-calling pipelines, the **origin and transformation history** of each piece of data is lost. This creates three critical problems:
1. **No explainability** (Gap #84): when an agent makes a decision, there is no standard way to trace *which data influenced it* and *where that data came from* in real time
2. **Incompatible audit trails** (Gap #88): different agent platforms log decisions in incompatible formats, making cross-system forensics impossible
3. **Privacy leakage** (Gap #93): without provenance tracking, agents cannot enforce data handling policies — private training data, user interactions, and proprietary algorithms may leak through tool calls
CaML demonstrates that tracking provenance at the **individual value level** (not just the message level) is both feasible and essential for security. Every variable in CaML's interpreter carries metadata about its sources and allowed readers.
## 2. Scope
This document defines:
1. A **provenance record format** for tracking data origin and transformation chains
2. A **provenance propagation protocol** for maintaining provenance across agent boundaries
3. A **provenance query interface** for real-time explainability
4. **Privacy constraints** on provenance metadata itself
## 3. Provenance Model
### 3.1 Provenance Record
Every data value in an agent system carries a provenance record:
```json
{
"prov:id": "prov-8c3a2d",
"prov:value_ref": "val-email-body",
"prov:origin": {
"type": "tool_output",
"tool": "read_email",
"invocation_id": "inv-4f2a1b",
"agent_id": "agent-a@org1.example",
"timestamp": "2026-03-09T14:30:00Z",
"inner_sources": [
{
"type": "external_entity",
"identifier": "sender:bob@example.com",
"trust_level": "untrusted"
}
]
},
"prov:transformations": [
{
"type": "llm_extraction",
"model_role": "quarantined",
"operation": "extract_email_address",
"input_provenance": ["prov-7b2a1c"],
"timestamp": "2026-03-09T14:30:01Z"
}
],
"prov:classification": {
"trust_level": "untrusted",
"sensitivity": "pii",
"readers": ["user", "bob@example.com"]
}
}
```
### 3.2 Origin Types
| Origin Type | Description | Trust Default |
|-------------|-------------|---------------|
| `user_input` | Directly from the authenticated user's query | trusted |
| `tool_output` | Returned by a tool invocation | depends on tool |
| `llm_generation` | Generated by an LLM (P-LLM or Q-LLM) | depends on role |
| `literal` | Hardcoded in the execution plan | trusted |
| `external_entity` | Inner source within tool data (e.g., email sender) | untrusted |
| `derived` | Computed from other values | min(input trust levels) |
### 3.3 Transformation Types
| Transform Type | Description | Provenance Effect |
|---------------|-------------|-------------------|
| `llm_extraction` | Q-LLM parses unstructured → structured | inherits all input provenance |
| `computation` | Deterministic operation (concat, filter) | union of input provenance |
| `aggregation` | Multiple values combined | union of all input provenance |
| `user_approval` | User explicitly approved a value | upgrades trust to "user_approved" |
| `redaction` | Sensitive content removed | may upgrade trust classification |
## 4. Propagation Protocol
### 4.1 Intra-Agent Propagation
Within a single agent, the execution engine (interpreter) maintains provenance automatically:
```
val_a = tool_1() → prov: {origin: tool_1}
val_b = tool_2() → prov: {origin: tool_2}
val_c = extract(val_a) → prov: {origin: tool_1, transform: extraction}
val_d = combine(val_b, c) → prov: {origin: [tool_1, tool_2], transform: computation}
```
**Rule**: derived values inherit the **union** of all input provenances and the **minimum** trust level.
### 4.2 Inter-Agent Propagation
When data crosses agent boundaries (via A2A, HTTP, message queues):
```
Agent A Agent B
┌──────────┐ ┌──────────┐
│ val_d │ │ │
│ prov: { │ ──── message ────► │ val_e │
│ A's │ with provenance │ prov: { │
│ chain │ header/metadata │ A's chain + │
│ } │ │ hop record │
└──────────┘ │ } │
└──────────┘
```
Provenance headers in inter-agent messages:
```http
POST /agent-b/task HTTP/1.1
Content-Type: application/json
X-Agent-Provenance: eyJwcm92OmlkIjoicHJvdi04YzNhMmQi... (base64-encoded provenance chain)
X-Agent-Provenance-Signature: <signed by agent A>
```
Or as a structured field in A2A messages:
```json
{
"a2a:message": { ... },
"a2a:provenance": {
"chain": [ ... ],
"hop": {
"agent_id": "agent-a@org1.example",
"timestamp": "2026-03-09T14:30:02Z",
"attestation": "<signature>"
}
}
}
```
### 4.3 Provenance Compaction
For long chains, provenance can be compacted:
1. **Hash chaining**: replace full chain with Merkle tree root + most recent N entries
2. **Trust boundary summarization**: when crossing org boundaries, summarize internal provenance as a single attested record
3. **TTL-based pruning**: provenance entries older than a configurable TTL are archived (reference retained, detail available on request)
## 5. Real-Time Provenance Query
*Directly addresses Gap #84: Real-time AI agent explainability protocols.*
### 5.1 Query Interface
Any participant (user, operator, peer agent) can query provenance:
```json
{
"query:type": "explain_value",
"query:value_ref": "val-d",
"query:depth": "full",
"query:format": "graph"
}
```
Response:
```json
{
"explain:value_ref": "val-d",
"explain:summary": "Email address extracted from meeting notes retrieved from cloud storage, combined with user-specified recipient name",
"explain:graph": {
"nodes": [
{"id": "user_input", "trust": "trusted", "content_hint": "user query"},
{"id": "tool_1:search_notes", "trust": "tool", "content_hint": "meeting notes"},
{"id": "q_llm:extract", "trust": "untrusted", "content_hint": "extracted email"}
],
"edges": [
{"from": "tool_1:search_notes", "to": "q_llm:extract"},
{"from": "q_llm:extract", "to": "val-d"}
]
},
"explain:trust_assessment": "UNTRUSTED — depends on quarantined LLM extraction from tool output",
"explain:timestamp": "2026-03-09T14:30:05Z"
}
```
### 5.2 Streaming Provenance
For long-running agent tasks, provenance can be streamed:
- SSE (Server-Sent Events) or WebSocket connection
- Each tool invocation emits a provenance event
- Operators see the dependency graph build in real time
## 6. Privacy-Preserving Provenance
*Addresses Gap #93: Privacy-preserving agent-to-agent communication.*
### 6.1 The Provenance Privacy Paradox
Provenance metadata can itself leak sensitive information:
- Knowing *which tools were called* reveals the user's intent
- Knowing *inner sources* (e.g., email senders) reveals the user's contacts
- The transformation chain reveals the agent's reasoning process
### 6.2 Privacy Controls
1. **Selective disclosure**: agents can share provenance summaries (trust level, origin type) without full chains
2. **Zero-knowledge trust**: "this value is trusted" attested by a trusted third party, without revealing the full provenance
3. **Provenance redaction**: when crossing privacy boundaries, inner sources are replaced with attestations
4. **Need-to-know**: provenance detail levels based on the requester's authorization
```json
{
"prov:origin": {
"type": "attested",
"attestor": "org1.example",
"trust_level": "trusted",
"detail": "redacted — contact org1.example for full provenance"
}
}
```
## 7. Relationship to ECT
Execution Context Tokens (draft-nennemann-wimse-ect) record *what happened* in a DAG of signed tokens. Provenance tracking records *where data came from*. They are complementary:
| Aspect | ECT | This Draft |
|--------|-----|-----------|
| **Tracks** | Task execution events | Data origin and flow |
| **Granularity** | Per-task | Per-value |
| **Format** | JWT with DAG links | JSON provenance records |
| **Purpose** | Audit "what was done" | Explain "why this data" |
Integration: ECT claims can reference provenance records, and provenance records can link to ECT task IDs.
## 8. Security Considerations
- Provenance records must be integrity-protected (signed by the producing agent)
- Provenance forgery (claiming a higher trust level) must be detectable via attestation chains
- Provenance metadata size can be significant — compaction mechanisms are essential
- Timing information in provenance can leak operational patterns
## 9. Open Questions
1. **Standard vocabulary**: should provenance types be extensible or fixed?
2. **Cross-standard alignment**: how does this relate to W3C PROV (provenance ontology)?
3. **Storage**: who is responsible for storing provenance long-term? Each agent? A shared ledger?
4. **Legal implications**: does provenance tracking create liability for organizations that produce it?
## 10. References
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
- Denning. "A lattice model of secure information flow." CACM, 1976.
- W3C PROV: Provenance Data Model. W3C Recommendation, 2013.
- draft-nennemann-wimse-ect (Execution Context Tokens)
- draft-ietf-wimse-arch (WIMSE architecture)