feat: proposal intake pipeline with AI-powered generation on /proposals/new

Add full proposal system: DB schema (proposals + proposal_gaps tables), CLI `ietf intake` command, and web UI with Quick Generate on /proposals/new. The new page merges AI intake (paste URL/text → Haiku generates multiple proposals auto-linked to gaps) with manual form entry. Generated proposals are clickable cards that fill the editor below for refinement. Uses claude_model_cheap (Haiku) for cost-efficient web intake. Includes CaML-inspired draft proposals from arXiv:2503.18813 analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 03:15:11 +01:00
parent ae5e5f8cbf
commit 5ec7410b89
20 changed files with 3316 additions and 2 deletions
--- a/data/reports/draft-proposals/camel-inspired/03-data-provenance-tracking.md
+++ b/data/reports/draft-proposals/camel-inspired/03-data-provenance-tracking.md
@@ -0,0 +1,268 @@
+---
+title: "Data Provenance Tracking Protocol for AI Agent Communications"
+draft_name: draft-nennemann-ai-agent-provenance-00
+intended_wg: SECDISPATCH or WIMSE
+status: outline
+gaps_addressed: [84, 88, 93]
+camel_sections: [5.3, 5.4]
+date: 2026-03-09
+---
+
+# Data Provenance Tracking Protocol for AI Agent Communications
+
+## 1. Problem Statement
+
+When AI agents process data through multi-step tool-calling pipelines, the **origin and transformation history** of each piece of data is lost. This creates three critical problems:
+
+1. **No explainability** (Gap #84): when an agent makes a decision, there is no standard way to trace *which data influenced it* and *where that data came from* in real time
+2. **Incompatible audit trails** (Gap #88): different agent platforms log decisions in incompatible formats, making cross-system forensics impossible
+3. **Privacy leakage** (Gap #93): without provenance tracking, agents cannot enforce data handling policies — private training data, user interactions, and proprietary algorithms may leak through tool calls
+
+CaML demonstrates that tracking provenance at the **individual value level** (not just the message level) is both feasible and essential for security. Every variable in CaML's interpreter carries metadata about its sources and allowed readers.
+
+## 2. Scope
+
+This document defines:
+
+1. A **provenance record format** for tracking data origin and transformation chains
+2. A **provenance propagation protocol** for maintaining provenance across agent boundaries
+3. A **provenance query interface** for real-time explainability
+4. **Privacy constraints** on provenance metadata itself
+
+## 3. Provenance Model
+
+### 3.1 Provenance Record
+
+Every data value in an agent system carries a provenance record:
+
+```json
+{
+  "prov:id": "prov-8c3a2d",
+  "prov:value_ref": "val-email-body",
+  "prov:origin": {
+    "type": "tool_output",
+    "tool": "read_email",
+    "invocation_id": "inv-4f2a1b",
+    "agent_id": "agent-a@org1.example",
+    "timestamp": "2026-03-09T14:30:00Z",
+    "inner_sources": [
+      {
+        "type": "external_entity",
+        "identifier": "sender:bob@example.com",
+        "trust_level": "untrusted"
+      }
+    ]
+  },
+  "prov:transformations": [
+    {
+      "type": "llm_extraction",
+      "model_role": "quarantined",
+      "operation": "extract_email_address",
+      "input_provenance": ["prov-7b2a1c"],
+      "timestamp": "2026-03-09T14:30:01Z"
+    }
+  ],
+  "prov:classification": {
+    "trust_level": "untrusted",
+    "sensitivity": "pii",
+    "readers": ["user", "bob@example.com"]
+  }
+}
+```
+
+### 3.2 Origin Types
+
+| Origin Type | Description | Trust Default |
+|-------------|-------------|---------------|
+| `user_input` | Directly from the authenticated user's query | trusted |
+| `tool_output` | Returned by a tool invocation | depends on tool |
+| `llm_generation` | Generated by an LLM (P-LLM or Q-LLM) | depends on role |
+| `literal` | Hardcoded in the execution plan | trusted |
+| `external_entity` | Inner source within tool data (e.g., email sender) | untrusted |
+| `derived` | Computed from other values | min(input trust levels) |
+
+### 3.3 Transformation Types
+
+| Transform Type | Description | Provenance Effect |
+|---------------|-------------|-------------------|
+| `llm_extraction` | Q-LLM parses unstructured → structured | inherits all input provenance |
+| `computation` | Deterministic operation (concat, filter) | union of input provenance |
+| `aggregation` | Multiple values combined | union of all input provenance |
+| `user_approval` | User explicitly approved a value | upgrades trust to "user_approved" |
+| `redaction` | Sensitive content removed | may upgrade trust classification |
+
+## 4. Propagation Protocol
+
+### 4.1 Intra-Agent Propagation
+
+Within a single agent, the execution engine (interpreter) maintains provenance automatically:
+
+```
+val_a = tool_1()           → prov: {origin: tool_1}
+val_b = tool_2()           → prov: {origin: tool_2}
+val_c = extract(val_a)     → prov: {origin: tool_1, transform: extraction}
+val_d = combine(val_b, c)  → prov: {origin: [tool_1, tool_2], transform: computation}
+```
+
+**Rule**: derived values inherit the **union** of all input provenances and the **minimum** trust level.
+
+### 4.2 Inter-Agent Propagation
+
+When data crosses agent boundaries (via A2A, HTTP, message queues):
+
+```
+Agent A                              Agent B
+┌──────────┐                        ┌──────────┐
+│ val_d    │                        │          │
+│ prov: {  │ ──── message ────►     │ val_e    │
+│   A's    │   with provenance      │ prov: {  │
+│   chain  │   header/metadata      │   A's chain + │
+│ }        │                        │   hop record  │
+└──────────┘                        │ }        │
+                                    └──────────┘
+```
+
+Provenance headers in inter-agent messages:
+
+```http
+POST /agent-b/task HTTP/1.1
+Content-Type: application/json
+X-Agent-Provenance: eyJwcm92OmlkIjoicHJvdi04YzNhMmQi...  (base64-encoded provenance chain)
+X-Agent-Provenance-Signature: <signed by agent A>
+```
+
+Or as a structured field in A2A messages:
+
+```json
+{
+  "a2a:message": { ... },
+  "a2a:provenance": {
+    "chain": [ ... ],
+    "hop": {
+      "agent_id": "agent-a@org1.example",
+      "timestamp": "2026-03-09T14:30:02Z",
+      "attestation": "<signature>"
+    }
+  }
+}
+```
+
+### 4.3 Provenance Compaction
+
+For long chains, provenance can be compacted:
+
+1. **Hash chaining**: replace full chain with Merkle tree root + most recent N entries
+2. **Trust boundary summarization**: when crossing org boundaries, summarize internal provenance as a single attested record
+3. **TTL-based pruning**: provenance entries older than a configurable TTL are archived (reference retained, detail available on request)
+
+## 5. Real-Time Provenance Query
+
+*Directly addresses Gap #84: Real-time AI agent explainability protocols.*
+
+### 5.1 Query Interface
+
+Any participant (user, operator, peer agent) can query provenance:
+
+```json
+{
+  "query:type": "explain_value",
+  "query:value_ref": "val-d",
+  "query:depth": "full",
+  "query:format": "graph"
+}
+```
+
+Response:
+
+```json
+{
+  "explain:value_ref": "val-d",
+  "explain:summary": "Email address extracted from meeting notes retrieved from cloud storage, combined with user-specified recipient name",
+  "explain:graph": {
+    "nodes": [
+      {"id": "user_input", "trust": "trusted", "content_hint": "user query"},
+      {"id": "tool_1:search_notes", "trust": "tool", "content_hint": "meeting notes"},
+      {"id": "q_llm:extract", "trust": "untrusted", "content_hint": "extracted email"}
+    ],
+    "edges": [
+      {"from": "tool_1:search_notes", "to": "q_llm:extract"},
+      {"from": "q_llm:extract", "to": "val-d"}
+    ]
+  },
+  "explain:trust_assessment": "UNTRUSTED — depends on quarantined LLM extraction from tool output",
+  "explain:timestamp": "2026-03-09T14:30:05Z"
+}
+```
+
+### 5.2 Streaming Provenance
+
+For long-running agent tasks, provenance can be streamed:
+
+- SSE (Server-Sent Events) or WebSocket connection
+- Each tool invocation emits a provenance event
+- Operators see the dependency graph build in real time
+
+## 6. Privacy-Preserving Provenance
+
+*Addresses Gap #93: Privacy-preserving agent-to-agent communication.*
+
+### 6.1 The Provenance Privacy Paradox
+
+Provenance metadata can itself leak sensitive information:
+
+- Knowing *which tools were called* reveals the user's intent
+- Knowing *inner sources* (e.g., email senders) reveals the user's contacts
+- The transformation chain reveals the agent's reasoning process
+
+### 6.2 Privacy Controls
+
+1. **Selective disclosure**: agents can share provenance summaries (trust level, origin type) without full chains
+2. **Zero-knowledge trust**: "this value is trusted" attested by a trusted third party, without revealing the full provenance
+3. **Provenance redaction**: when crossing privacy boundaries, inner sources are replaced with attestations
+4. **Need-to-know**: provenance detail levels based on the requester's authorization
+
+```json
+{
+  "prov:origin": {
+    "type": "attested",
+    "attestor": "org1.example",
+    "trust_level": "trusted",
+    "detail": "redacted — contact org1.example for full provenance"
+  }
+}
+```
+
+## 7. Relationship to ECT
+
+Execution Context Tokens (draft-nennemann-wimse-ect) record *what happened* in a DAG of signed tokens. Provenance tracking records *where data came from*. They are complementary:
+
+| Aspect | ECT | This Draft |
+|--------|-----|-----------|
+| **Tracks** | Task execution events | Data origin and flow |
+| **Granularity** | Per-task | Per-value |
+| **Format** | JWT with DAG links | JSON provenance records |
+| **Purpose** | Audit "what was done" | Explain "why this data" |
+
+Integration: ECT claims can reference provenance records, and provenance records can link to ECT task IDs.
+
+## 8. Security Considerations
+
+- Provenance records must be integrity-protected (signed by the producing agent)
+- Provenance forgery (claiming a higher trust level) must be detectable via attestation chains
+- Provenance metadata size can be significant — compaction mechanisms are essential
+- Timing information in provenance can leak operational patterns
+
+## 9. Open Questions
+
+1. **Standard vocabulary**: should provenance types be extensible or fixed?
+2. **Cross-standard alignment**: how does this relate to W3C PROV (provenance ontology)?
+3. **Storage**: who is responsible for storing provenance long-term? Each agent? A shared ledger?
+4. **Legal implications**: does provenance tracking create liability for organizations that produce it?
+
+## 10. References
+
+- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
+- Denning. "A lattice model of secure information flow." CACM, 1976.
+- W3C PROV: Provenance Data Model. W3C Recommendation, 2013.
+- draft-nennemann-wimse-ect (Execution Context Tokens)
+- draft-ietf-wimse-arch (WIMSE architecture)