Files
ietf-draft-analyzer/workspace/drafts/new-drafts/draft-dats-dynamic-agent-trust-scoring-00.txt
Christian Nennemann 2506b6325a
Some checks failed
CI / test (3.11) (push) Failing after 1m37s
CI / test (3.12) (push) Failing after 57s
feat: add draft data, gap analysis report, and workspace config
2026-04-06 18:47:15 +02:00

299 lines
11 KiB
Plaintext

Internet-Draft AI/Agent WG
Intended status: Standards Track March 2026
Expires: September 15, 2026
Dynamic Agent Trust Scoring (DATS)
draft-dats-dynamic-agent-trust-scoring-00
Abstract
This document defines the Dynamic Agent Trust Scoring (DATS)
protocol, a mechanism for AI agents to build, assess, and
revoke trust relationships based on observed behavior over
time. Static authentication (certificates, API keys) verifies
identity but says nothing about whether an agent is reliable,
accurate, or well-behaved. DATS augments identity-based auth
with a numeric trust score that adjusts dynamically based on
interaction outcomes. The protocol defines trust score
computation, propagation between agents, decay over inactivity,
and threshold-based access policies. DATS is intentionally
simple: a single score per agent-pair, standard adjustment
events, and a JWT-based transport for trust assertions.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
This document is intended to have Standards Track status.
Distribution of this memo is unlimited.
Table of Contents
1. Introduction
2. Terminology
3. Problem Statement
4. Trust Score Model
5. Trust Events and Adjustments
6. Trust Propagation
7. Threshold-Based Access Policies
8. Security Considerations
9. IANA Considerations
1. Introduction
The IETF has 98 drafts addressing agent identity and
authentication, providing strong mechanisms for verifying who
an agent is. But identity alone is insufficient for long-
running autonomous systems. A properly authenticated agent
may still produce bad results, violate expectations, or
degrade over time. Static certificates cannot capture this.
DATS adds a behavioral dimension to agent trust. It answers
the question: "I know who you are, but should I rely on you?"
The model is deliberately simple — a single floating-point
score between 0.0 and 1.0 per agent relationship — because
complex reputation systems tend to be gamed or ignored.
The protocol is inspired by:
- TCP congestion control: trust increases slowly (additive)
and decreases quickly (multiplicative) on failure.
- TLS certificate transparency: trust assertions are logged
for auditability.
- Web of trust (PGP): trust can propagate through
intermediaries, with attenuation.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in RFC 2119 [RFC2119].
Trust Score: A floating-point value in [0.0, 1.0] representing
one agent's assessed reliability of another, based on observed
interaction outcomes.
Trust Event: An observable interaction outcome that causes a
trust score adjustment. Events are either positive (task
completed successfully) or negative (task failed, timeout,
policy violation).
Trust Decay: The automatic reduction of trust scores over
periods of inactivity, reflecting the principle that trust
requires ongoing evidence.
Trust Assertion: A signed statement by one agent about another
agent's trust score, transportable as a JWT claim.
3. Problem Statement
Agent A delegates a task to Agent B. Agent B completes it
correctly. Agent A delegates again. After 100 successful
interactions, Agent B starts returning subtly incorrect results
(model drift, adversarial manipulation, or simple degradation).
Agent A has no standard way to:
1. Track B's reliability over time.
2. Reduce B's privileges based on degraded performance.
3. Share its experience with Agent C, who is considering
delegating to Agent B.
4. Automatically revoke B's access when trust drops below
acceptable levels.
Existing attestation drafts (STAMP, DAAP) provide
cryptographic proof of specific actions but not ongoing
behavioral assessment. DATS fills this gap.
4. Trust Score Model
Each agent maintains a trust table: a mapping from peer agent
IDs to trust scores.
{
"urn:uuid:agent-b": {
"score": 0.82,
"interactions": 147,
"last_updated": "2026-03-01T11:30:00Z",
"last_event": "task_success"
}
}
Initial trust for an unknown agent is a deployment-configured
default. A value of 0.5 is RECOMMENDED as a neutral starting
point, but deployments MAY use lower values (e.g., 0.1) for
zero-trust environments.
Trust scores are updated using an additive-increase,
multiplicative-decrease (AIMD) algorithm:
On positive event:
score = min(1.0, score + alpha)
On negative event:
score = max(0.0, score * beta)
Default parameters: alpha = 0.01, beta = 0.8.
This means trust builds slowly (100 successes to go from 0.5
to ~1.0) but drops quickly (a single failure reduces an 0.82
score to 0.66). This asymmetry is intentional: in autonomous
systems, the cost of trusting a bad agent exceeds the cost of
being slow to trust a good one.
Agents MAY tune alpha and beta per relationship or per action
type, but MUST use the AIMD structure.
5. Trust Events and Adjustments
The following standard trust events are defined:
| Event | Direction | Default Weight |
|----------------------|-----------|----------------|
| task_success | positive | 1x alpha |
| task_partial_success | positive | 0.5x alpha |
| task_failure | negative | 1x beta |
| task_timeout | negative | 1x beta |
| policy_violation | negative | applied twice |
| attestation_invalid | negative | applied twice |
| rollback_triggered | negative | 1x beta |
"applied twice" means the multiplicative decrease is applied
two times in succession (score * beta * beta), reflecting the
severity of policy violations versus simple failures.
Trust decay: if no interaction occurs for a configurable
period (default: 7 days), the trust score decays:
score = max(initial_default, score - decay_rate)
Default decay_rate: 0.01 per day. This ensures that stale
trust relationships gradually return to the default level
rather than persisting indefinitely.
Agents MUST record all trust events in a local audit log.
6. Trust Propagation
Agent A may share its trust assessment of Agent B with Agent C
through a signed trust assertion. The assertion is a JWT
(RFC 7519) with the following claims:
{
"iss": "urn:uuid:agent-a",
"sub": "urn:uuid:agent-b",
"iat": 1709294400,
"exp": 1709380800,
"dats_score": 0.82,
"dats_interactions": 147,
"dats_confidence": "high"
}
"dats_confidence" is based on interaction count: "low" (<10),
"medium" (10-99), "high" (100+).
When Agent C receives this assertion, it MAY incorporate it
into its own trust score for Agent B using attenuation:
c_score_for_b = max(c_score_for_b,
a_score_for_b * trust_of_a * attenuation)
Where:
- a_score_for_b is Agent A's reported score for B (0.82)
- trust_of_a is Agent C's trust score for Agent A
- attenuation is a constant (default: 0.5) preventing
unbounded trust propagation
Trust assertions are advisory. Agents MUST NOT blindly adopt
propagated scores. An agent's own direct observations always
take precedence over propagated trust.
To prevent trust laundering (colluding agents inflating each
other's scores), agents SHOULD limit propagation depth to 1
hop by default. The "dats_hops" claim tracks propagation
depth; agents MUST NOT propagate assertions where dats_hops
exceeds their configured maximum.
7. Threshold-Based Access Policies
Agents SHOULD define trust thresholds for different action
categories:
{
"thresholds": {
"read_data": 0.3,
"execute_task": 0.5,
"modify_config": 0.7,
"delegate_auth": 0.9
}
}
When an agent requests an action, the serving agent checks the
requester's trust score against the threshold for that action
type. If the score is below the threshold, the request is
denied with a 403 response including a DATS-specific error:
{
"error": "trust_insufficient",
"required_score": 0.7,
"current_score": 0.54,
"action": "modify_config"
}
The response SHOULD NOT reveal the exact current score in
production deployments to prevent score probing. Instead, it
MAY return only the "trust_insufficient" error.
Automatic revocation: when an agent's trust score drops below
a configured floor (default: 0.2), the trusting agent SHOULD
revoke all outstanding delegations and emit a trust revocation
event. This provides automatic containment of agents that
have become unreliable.
8. Security Considerations
Trust scores are sensitive metadata. Agents MUST NOT expose
their full trust tables to peers. Only pairwise trust
assertions (Section 6) should be shared, and only
intentionally.
Trust assertion JWTs MUST be signed using algorithms from
RFC 7518 (e.g., ES256, EdDSA). Agents MUST verify signatures
before processing trust assertions.
Score manipulation attacks: a malicious agent could
intentionally behave well for many interactions to build trust,
then exploit high trust for a damaging action. Mitigation:
policy_violation events apply double penalties, and
deployments SHOULD set trust thresholds high for critical
actions regardless of accumulated trust.
Sybil attacks: an attacker could create many agents to
generate fake positive trust assertions. Mitigation: agents
SHOULD weight propagated trust by their own direct trust in
the asserting agent (Section 6 attenuation) and SHOULD
require agents to be registered in a trusted directory (e.g.,
ANS) before accepting trust assertions.
All trust-related communications MUST use TLS 1.3 [RFC8446].
9. IANA Considerations
This document requests IANA establish the following:
1. Registration of JWT claims "dats_score",
"dats_interactions", "dats_confidence", and "dats_hops"
in the JSON Web Token Claims registry per RFC 7519.
2. A "DATS Trust Event Type" registry under Specification
Required policy. Initial entries: "task_success",
"task_partial_success", "task_failure", "task_timeout",
"policy_violation", "attestation_invalid",
"rollback_triggered".
Author's Address
Generated by IETF Draft Analyzer
2026-03-01