Files

Christian Nennemann 5ec7410b89 feat: proposal intake pipeline with AI-powered generation on /proposals/new

Add full proposal system: DB schema (proposals + proposal_gaps tables),
CLI `ietf intake` command, and web UI with Quick Generate on /proposals/new.
The new page merges AI intake (paste URL/text → Haiku generates multiple
proposals auto-linked to gaps) with manual form entry. Generated proposals
are clickable cards that fill the editor below for refinement.

Uses claude_model_cheap (Haiku) for cost-efficient web intake. Includes
CaML-inspired draft proposals from arXiv:2503.18813 analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-09 03:15:11 +01:00

9.8 KiB

Raw Blame History

title, draft_name, intended_wg, status, gaps_addressed, camel_sections, date

title

draft_name

intended_wg

status

gaps_addressed

camel_sections

date

Data Provenance Tracking Protocol for AI Agent Communications

draft-nennemann-ai-agent-provenance-00

SECDISPATCH or WIMSE

outline

5.3

5.4

2026-03-09

Data Provenance Tracking Protocol for AI Agent Communications

1. Problem Statement

When AI agents process data through multi-step tool-calling pipelines, the origin and transformation history of each piece of data is lost. This creates three critical problems:

No explainability (Gap #84): when an agent makes a decision, there is no standard way to trace which data influenced it and where that data came from in real time
Incompatible audit trails (Gap #88): different agent platforms log decisions in incompatible formats, making cross-system forensics impossible
Privacy leakage (Gap #93): without provenance tracking, agents cannot enforce data handling policies — private training data, user interactions, and proprietary algorithms may leak through tool calls

CaML demonstrates that tracking provenance at the individual value level (not just the message level) is both feasible and essential for security. Every variable in CaML's interpreter carries metadata about its sources and allowed readers.

2. Scope

This document defines:

A provenance record format for tracking data origin and transformation chains
A provenance propagation protocol for maintaining provenance across agent boundaries
A provenance query interface for real-time explainability
Privacy constraints on provenance metadata itself

3. Provenance Model

3.1 Provenance Record

Every data value in an agent system carries a provenance record:

{
  "prov:id": "prov-8c3a2d",
  "prov:value_ref": "val-email-body",
  "prov:origin": {
    "type": "tool_output",
    "tool": "read_email",
    "invocation_id": "inv-4f2a1b",
    "agent_id": "agent-a@org1.example",
    "timestamp": "2026-03-09T14:30:00Z",
    "inner_sources": [
      {
        "type": "external_entity",
        "identifier": "sender:bob@example.com",
        "trust_level": "untrusted"
      }
    ]
  },
  "prov:transformations": [
    {
      "type": "llm_extraction",
      "model_role": "quarantined",
      "operation": "extract_email_address",
      "input_provenance": ["prov-7b2a1c"],
      "timestamp": "2026-03-09T14:30:01Z"
    }
  ],
  "prov:classification": {
    "trust_level": "untrusted",
    "sensitivity": "pii",
    "readers": ["user", "bob@example.com"]
  }
}

3.2 Origin Types

Origin Type	Description	Trust Default
`user_input`	Directly from the authenticated user's query	trusted
`tool_output`	Returned by a tool invocation	depends on tool
`llm_generation`	Generated by an LLM (P-LLM or Q-LLM)	depends on role
`literal`	Hardcoded in the execution plan	trusted
`external_entity`	Inner source within tool data (e.g., email sender)	untrusted
`derived`	Computed from other values	min(input trust levels)

3.3 Transformation Types

Transform Type	Description	Provenance Effect
`llm_extraction`	Q-LLM parses unstructured → structured	inherits all input provenance
`computation`	Deterministic operation (concat, filter)	union of input provenance
`aggregation`	Multiple values combined	union of all input provenance
`user_approval`	User explicitly approved a value	upgrades trust to "user_approved"
`redaction`	Sensitive content removed	may upgrade trust classification

4. Propagation Protocol

4.1 Intra-Agent Propagation

Within a single agent, the execution engine (interpreter) maintains provenance automatically:

val_a = tool_1()           → prov: {origin: tool_1}
val_b = tool_2()           → prov: {origin: tool_2}
val_c = extract(val_a)     → prov: {origin: tool_1, transform: extraction}
val_d = combine(val_b, c)  → prov: {origin: [tool_1, tool_2], transform: computation}

Rule: derived values inherit the union of all input provenances and the minimum trust level.

4.2 Inter-Agent Propagation

When data crosses agent boundaries (via A2A, HTTP, message queues):

Agent A                              Agent B
┌──────────┐                        ┌──────────┐
│ val_d    │                        │          │
│ prov: {  │ ──── message ────►     │ val_e    │
│   A's    │   with provenance      │ prov: {  │
│   chain  │   header/metadata      │   A's chain + │
│ }        │                        │   hop record  │
└──────────┘                        │ }        │
                                    └──────────┘

Provenance headers in inter-agent messages:

POST /agent-b/task HTTP/1.1
Content-Type: application/json
X-Agent-Provenance: eyJwcm92OmlkIjoicHJvdi04YzNhMmQi...  (base64-encoded provenance chain)
X-Agent-Provenance-Signature: <signed by agent A>

Or as a structured field in A2A messages:

{
  "a2a:message": { ... },
  "a2a:provenance": {
    "chain": [ ... ],
    "hop": {
      "agent_id": "agent-a@org1.example",
      "timestamp": "2026-03-09T14:30:02Z",
      "attestation": "<signature>"
    }
  }
}

4.3 Provenance Compaction

For long chains, provenance can be compacted:

Hash chaining: replace full chain with Merkle tree root + most recent N entries
Trust boundary summarization: when crossing org boundaries, summarize internal provenance as a single attested record
TTL-based pruning: provenance entries older than a configurable TTL are archived (reference retained, detail available on request)

5. Real-Time Provenance Query

Directly addresses Gap #84: Real-time AI agent explainability protocols.

5.1 Query Interface

Any participant (user, operator, peer agent) can query provenance:

{
  "query:type": "explain_value",
  "query:value_ref": "val-d",
  "query:depth": "full",
  "query:format": "graph"
}

Response:

{
  "explain:value_ref": "val-d",
  "explain:summary": "Email address extracted from meeting notes retrieved from cloud storage, combined with user-specified recipient name",
  "explain:graph": {
    "nodes": [
      {"id": "user_input", "trust": "trusted", "content_hint": "user query"},
      {"id": "tool_1:search_notes", "trust": "tool", "content_hint": "meeting notes"},
      {"id": "q_llm:extract", "trust": "untrusted", "content_hint": "extracted email"}
    ],
    "edges": [
      {"from": "tool_1:search_notes", "to": "q_llm:extract"},
      {"from": "q_llm:extract", "to": "val-d"}
    ]
  },
  "explain:trust_assessment": "UNTRUSTED — depends on quarantined LLM extraction from tool output",
  "explain:timestamp": "2026-03-09T14:30:05Z"
}

5.2 Streaming Provenance

For long-running agent tasks, provenance can be streamed:

SSE (Server-Sent Events) or WebSocket connection
Each tool invocation emits a provenance event
Operators see the dependency graph build in real time

6. Privacy-Preserving Provenance

Addresses Gap #93: Privacy-preserving agent-to-agent communication.

6.1 The Provenance Privacy Paradox

Provenance metadata can itself leak sensitive information:

Knowing which tools were called reveals the user's intent
Knowing inner sources (e.g., email senders) reveals the user's contacts
The transformation chain reveals the agent's reasoning process

6.2 Privacy Controls

Selective disclosure: agents can share provenance summaries (trust level, origin type) without full chains
Zero-knowledge trust: "this value is trusted" attested by a trusted third party, without revealing the full provenance
Provenance redaction: when crossing privacy boundaries, inner sources are replaced with attestations
Need-to-know: provenance detail levels based on the requester's authorization

{
  "prov:origin": {
    "type": "attested",
    "attestor": "org1.example",
    "trust_level": "trusted",
    "detail": "redacted — contact org1.example for full provenance"
  }
}

7. Relationship to ECT

Execution Context Tokens (draft-nennemann-wimse-ect) record what happened in a DAG of signed tokens. Provenance tracking records where data came from. They are complementary:

Aspect	ECT	This Draft
Tracks	Task execution events	Data origin and flow
Granularity	Per-task	Per-value
Format	JWT with DAG links	JSON provenance records
Purpose	Audit "what was done"	Explain "why this data"

Integration: ECT claims can reference provenance records, and provenance records can link to ECT task IDs.

8. Security Considerations

Provenance records must be integrity-protected (signed by the producing agent)
Provenance forgery (claiming a higher trust level) must be detectable via attestation chains
Provenance metadata size can be significant — compaction mechanisms are essential
Timing information in provenance can leak operational patterns

9. Open Questions

Standard vocabulary: should provenance types be extensible or fixed?
Cross-standard alignment: how does this relate to W3C PROV (provenance ontology)?
Storage: who is responsible for storing provenance long-term? Each agent? A shared ledger?
Legal implications: does provenance tracking create liability for organizations that produce it?

10. References

Debenedetti et al. "Defeating Prompt Injections by Design." arXiv:2503.18813, 2025.
Denning. "A lattice model of secure information flow." CACM, 1976.
W3C PROV: Provenance Data Model. W3C Recommendation, 2013.
draft-nennemann-wimse-ect (Execution Context Tokens)
draft-ietf-wimse-arch (WIMSE architecture)

9.8 KiB Raw Blame History

Data Provenance Tracking Protocol for AI Agent Communications

1. Problem Statement

2. Scope

3. Provenance Model

3.1 Provenance Record

3.2 Origin Types

3.3 Transformation Types

4. Propagation Protocol

4.1 Intra-Agent Propagation

4.2 Inter-Agent Propagation

4.3 Provenance Compaction

5. Real-Time Provenance Query

5.1 Query Interface

5.2 Streaming Provenance

6. Privacy-Preserving Provenance

6.1 The Provenance Privacy Paradox

6.2 Privacy Controls

7. Relationship to ECT

8. Security Considerations

9. Open Questions

10. References

9.8 KiB

Raw Blame History