Add full proposal system: DB schema (proposals + proposal_gaps tables), CLI `ietf intake` command, and web UI with Quick Generate on /proposals/new. The new page merges AI intake (paste URL/text → Haiku generates multiple proposals auto-linked to gaps) with manual form entry. Generated proposals are clickable cards that fill the editor below for refinement. Uses claude_model_cheap (Haiku) for cost-efficient web intake. Includes CaML-inspired draft proposals from arXiv:2503.18813 analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9.8 KiB
title, draft_name, intended_wg, status, gaps_addressed, camel_sections, date
| title | draft_name | intended_wg | status | gaps_addressed | camel_sections | date | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Data Provenance Tracking Protocol for AI Agent Communications | draft-nennemann-ai-agent-provenance-00 | SECDISPATCH or WIMSE | outline |
|
|
2026-03-09 |
Data Provenance Tracking Protocol for AI Agent Communications
1. Problem Statement
When AI agents process data through multi-step tool-calling pipelines, the origin and transformation history of each piece of data is lost. This creates three critical problems:
- No explainability (Gap #84): when an agent makes a decision, there is no standard way to trace which data influenced it and where that data came from in real time
- Incompatible audit trails (Gap #88): different agent platforms log decisions in incompatible formats, making cross-system forensics impossible
- Privacy leakage (Gap #93): without provenance tracking, agents cannot enforce data handling policies — private training data, user interactions, and proprietary algorithms may leak through tool calls
CaML demonstrates that tracking provenance at the individual value level (not just the message level) is both feasible and essential for security. Every variable in CaML's interpreter carries metadata about its sources and allowed readers.
2. Scope
This document defines:
- A provenance record format for tracking data origin and transformation chains
- A provenance propagation protocol for maintaining provenance across agent boundaries
- A provenance query interface for real-time explainability
- Privacy constraints on provenance metadata itself
3. Provenance Model
3.1 Provenance Record
Every data value in an agent system carries a provenance record:
{
"prov:id": "prov-8c3a2d",
"prov:value_ref": "val-email-body",
"prov:origin": {
"type": "tool_output",
"tool": "read_email",
"invocation_id": "inv-4f2a1b",
"agent_id": "agent-a@org1.example",
"timestamp": "2026-03-09T14:30:00Z",
"inner_sources": [
{
"type": "external_entity",
"identifier": "sender:bob@example.com",
"trust_level": "untrusted"
}
]
},
"prov:transformations": [
{
"type": "llm_extraction",
"model_role": "quarantined",
"operation": "extract_email_address",
"input_provenance": ["prov-7b2a1c"],
"timestamp": "2026-03-09T14:30:01Z"
}
],
"prov:classification": {
"trust_level": "untrusted",
"sensitivity": "pii",
"readers": ["user", "bob@example.com"]
}
}
3.2 Origin Types
| Origin Type | Description | Trust Default |
|---|---|---|
user_input |
Directly from the authenticated user's query | trusted |
tool_output |
Returned by a tool invocation | depends on tool |
llm_generation |
Generated by an LLM (P-LLM or Q-LLM) | depends on role |
literal |
Hardcoded in the execution plan | trusted |
external_entity |
Inner source within tool data (e.g., email sender) | untrusted |
derived |
Computed from other values | min(input trust levels) |
3.3 Transformation Types
| Transform Type | Description | Provenance Effect |
|---|---|---|
llm_extraction |
Q-LLM parses unstructured → structured | inherits all input provenance |
computation |
Deterministic operation (concat, filter) | union of input provenance |
aggregation |
Multiple values combined | union of all input provenance |
user_approval |
User explicitly approved a value | upgrades trust to "user_approved" |
redaction |
Sensitive content removed | may upgrade trust classification |
4. Propagation Protocol
4.1 Intra-Agent Propagation
Within a single agent, the execution engine (interpreter) maintains provenance automatically:
val_a = tool_1() → prov: {origin: tool_1}
val_b = tool_2() → prov: {origin: tool_2}
val_c = extract(val_a) → prov: {origin: tool_1, transform: extraction}
val_d = combine(val_b, c) → prov: {origin: [tool_1, tool_2], transform: computation}
Rule: derived values inherit the union of all input provenances and the minimum trust level.
4.2 Inter-Agent Propagation
When data crosses agent boundaries (via A2A, HTTP, message queues):
Agent A Agent B
┌──────────┐ ┌──────────┐
│ val_d │ │ │
│ prov: { │ ──── message ────► │ val_e │
│ A's │ with provenance │ prov: { │
│ chain │ header/metadata │ A's chain + │
│ } │ │ hop record │
└──────────┘ │ } │
└──────────┘
Provenance headers in inter-agent messages:
POST /agent-b/task HTTP/1.1
Content-Type: application/json
X-Agent-Provenance: eyJwcm92OmlkIjoicHJvdi04YzNhMmQi... (base64-encoded provenance chain)
X-Agent-Provenance-Signature: <signed by agent A>
Or as a structured field in A2A messages:
{
"a2a:message": { ... },
"a2a:provenance": {
"chain": [ ... ],
"hop": {
"agent_id": "agent-a@org1.example",
"timestamp": "2026-03-09T14:30:02Z",
"attestation": "<signature>"
}
}
}
4.3 Provenance Compaction
For long chains, provenance can be compacted:
- Hash chaining: replace full chain with Merkle tree root + most recent N entries
- Trust boundary summarization: when crossing org boundaries, summarize internal provenance as a single attested record
- TTL-based pruning: provenance entries older than a configurable TTL are archived (reference retained, detail available on request)
5. Real-Time Provenance Query
Directly addresses Gap #84: Real-time AI agent explainability protocols.
5.1 Query Interface
Any participant (user, operator, peer agent) can query provenance:
{
"query:type": "explain_value",
"query:value_ref": "val-d",
"query:depth": "full",
"query:format": "graph"
}
Response:
{
"explain:value_ref": "val-d",
"explain:summary": "Email address extracted from meeting notes retrieved from cloud storage, combined with user-specified recipient name",
"explain:graph": {
"nodes": [
{"id": "user_input", "trust": "trusted", "content_hint": "user query"},
{"id": "tool_1:search_notes", "trust": "tool", "content_hint": "meeting notes"},
{"id": "q_llm:extract", "trust": "untrusted", "content_hint": "extracted email"}
],
"edges": [
{"from": "tool_1:search_notes", "to": "q_llm:extract"},
{"from": "q_llm:extract", "to": "val-d"}
]
},
"explain:trust_assessment": "UNTRUSTED — depends on quarantined LLM extraction from tool output",
"explain:timestamp": "2026-03-09T14:30:05Z"
}
5.2 Streaming Provenance
For long-running agent tasks, provenance can be streamed:
- SSE (Server-Sent Events) or WebSocket connection
- Each tool invocation emits a provenance event
- Operators see the dependency graph build in real time
6. Privacy-Preserving Provenance
Addresses Gap #93: Privacy-preserving agent-to-agent communication.
6.1 The Provenance Privacy Paradox
Provenance metadata can itself leak sensitive information:
- Knowing which tools were called reveals the user's intent
- Knowing inner sources (e.g., email senders) reveals the user's contacts
- The transformation chain reveals the agent's reasoning process
6.2 Privacy Controls
- Selective disclosure: agents can share provenance summaries (trust level, origin type) without full chains
- Zero-knowledge trust: "this value is trusted" attested by a trusted third party, without revealing the full provenance
- Provenance redaction: when crossing privacy boundaries, inner sources are replaced with attestations
- Need-to-know: provenance detail levels based on the requester's authorization
{
"prov:origin": {
"type": "attested",
"attestor": "org1.example",
"trust_level": "trusted",
"detail": "redacted — contact org1.example for full provenance"
}
}
7. Relationship to ECT
Execution Context Tokens (draft-nennemann-wimse-ect) record what happened in a DAG of signed tokens. Provenance tracking records where data came from. They are complementary:
| Aspect | ECT | This Draft |
|---|---|---|
| Tracks | Task execution events | Data origin and flow |
| Granularity | Per-task | Per-value |
| Format | JWT with DAG links | JSON provenance records |
| Purpose | Audit "what was done" | Explain "why this data" |
Integration: ECT claims can reference provenance records, and provenance records can link to ECT task IDs.
8. Security Considerations
- Provenance records must be integrity-protected (signed by the producing agent)
- Provenance forgery (claiming a higher trust level) must be detectable via attestation chains
- Provenance metadata size can be significant — compaction mechanisms are essential
- Timing information in provenance can leak operational patterns
9. Open Questions
- Standard vocabulary: should provenance types be extensible or fixed?
- Cross-standard alignment: how does this relate to W3C PROV (provenance ontology)?
- Storage: who is responsible for storing provenance long-term? Each agent? A shared ledger?
- Legal implications: does provenance tracking create liability for organizations that produce it?
10. References
- Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
- Denning. "A lattice model of secure information flow." CACM, 1976.
- W3C PROV: Provenance Data Model. W3C Recommendation, 2013.
- draft-nennemann-wimse-ect (Execution Context Tokens)
- draft-ietf-wimse-arch (WIMSE architecture)