299 lines
11 KiB
Plaintext
299 lines
11 KiB
Plaintext
Internet-Draft AI/Agent WG
|
|
Intended status: Standards Track March 2026
|
|
Expires: September 15, 2026
|
|
|
|
|
|
Dynamic Agent Trust Scoring (DATS)
|
|
draft-dats-dynamic-agent-trust-scoring-00
|
|
|
|
Abstract
|
|
|
|
This document defines the Dynamic Agent Trust Scoring (DATS)
|
|
protocol, a mechanism for AI agents to build, assess, and
|
|
revoke trust relationships based on observed behavior over
|
|
time. Static authentication (certificates, API keys) verifies
|
|
identity but says nothing about whether an agent is reliable,
|
|
accurate, or well-behaved. DATS augments identity-based auth
|
|
with a numeric trust score that adjusts dynamically based on
|
|
interaction outcomes. The protocol defines trust score
|
|
computation, propagation between agents, decay over inactivity,
|
|
and threshold-based access policies. DATS is intentionally
|
|
simple: a single score per agent-pair, standard adjustment
|
|
events, and a JWT-based transport for trust assertions.
|
|
|
|
Status of This Memo
|
|
|
|
This Internet-Draft is submitted in full conformance with the
|
|
provisions of BCP 78 and BCP 79.
|
|
|
|
This document is intended to have Standards Track status.
|
|
Distribution of this memo is unlimited.
|
|
|
|
Table of Contents
|
|
|
|
1. Introduction
|
|
2. Terminology
|
|
3. Problem Statement
|
|
4. Trust Score Model
|
|
5. Trust Events and Adjustments
|
|
6. Trust Propagation
|
|
7. Threshold-Based Access Policies
|
|
8. Security Considerations
|
|
9. IANA Considerations
|
|
|
|
1. Introduction
|
|
|
|
The IETF has 98 drafts addressing agent identity and
|
|
authentication, providing strong mechanisms for verifying who
|
|
an agent is. But identity alone is insufficient for long-
|
|
running autonomous systems. A properly authenticated agent
|
|
may still produce bad results, violate expectations, or
|
|
degrade over time. Static certificates cannot capture this.
|
|
|
|
DATS adds a behavioral dimension to agent trust. It answers
|
|
the question: "I know who you are, but should I rely on you?"
|
|
The model is deliberately simple — a single floating-point
|
|
score between 0.0 and 1.0 per agent relationship — because
|
|
complex reputation systems tend to be gamed or ignored.
|
|
|
|
The protocol is inspired by:
|
|
- TCP congestion control: trust increases slowly (additive)
|
|
and decreases quickly (multiplicative) on failure.
|
|
- TLS certificate transparency: trust assertions are logged
|
|
for auditability.
|
|
- Web of trust (PGP): trust can propagate through
|
|
intermediaries, with attenuation.
|
|
|
|
2. Terminology
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
|
|
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
|
|
"OPTIONAL" in this document are to be interpreted as described
|
|
in RFC 2119 [RFC2119].
|
|
|
|
Trust Score: A floating-point value in [0.0, 1.0] representing
|
|
one agent's assessed reliability of another, based on observed
|
|
interaction outcomes.
|
|
|
|
Trust Event: An observable interaction outcome that causes a
|
|
trust score adjustment. Events are either positive (task
|
|
completed successfully) or negative (task failed, timeout,
|
|
policy violation).
|
|
|
|
Trust Decay: The automatic reduction of trust scores over
|
|
periods of inactivity, reflecting the principle that trust
|
|
requires ongoing evidence.
|
|
|
|
Trust Assertion: A signed statement by one agent about another
|
|
agent's trust score, transportable as a JWT claim.
|
|
|
|
3. Problem Statement
|
|
|
|
Agent A delegates a task to Agent B. Agent B completes it
|
|
correctly. Agent A delegates again. After 100 successful
|
|
interactions, Agent B starts returning subtly incorrect results
|
|
(model drift, adversarial manipulation, or simple degradation).
|
|
Agent A has no standard way to:
|
|
|
|
1. Track B's reliability over time.
|
|
2. Reduce B's privileges based on degraded performance.
|
|
3. Share its experience with Agent C, who is considering
|
|
delegating to Agent B.
|
|
4. Automatically revoke B's access when trust drops below
|
|
acceptable levels.
|
|
|
|
Existing attestation drafts (STAMP, DAAP) provide
|
|
cryptographic proof of specific actions but not ongoing
|
|
behavioral assessment. DATS fills this gap.
|
|
|
|
4. Trust Score Model
|
|
|
|
Each agent maintains a trust table: a mapping from peer agent
|
|
IDs to trust scores.
|
|
|
|
{
|
|
"urn:uuid:agent-b": {
|
|
"score": 0.82,
|
|
"interactions": 147,
|
|
"last_updated": "2026-03-01T11:30:00Z",
|
|
"last_event": "task_success"
|
|
}
|
|
}
|
|
|
|
Initial trust for an unknown agent is a deployment-configured
|
|
default. A value of 0.5 is RECOMMENDED as a neutral starting
|
|
point, but deployments MAY use lower values (e.g., 0.1) for
|
|
zero-trust environments.
|
|
|
|
Trust scores are updated using an additive-increase,
|
|
multiplicative-decrease (AIMD) algorithm:
|
|
|
|
On positive event:
|
|
score = min(1.0, score + alpha)
|
|
|
|
On negative event:
|
|
score = max(0.0, score * beta)
|
|
|
|
Default parameters: alpha = 0.01, beta = 0.8.
|
|
|
|
This means trust builds slowly (100 successes to go from 0.5
|
|
to ~1.0) but drops quickly (a single failure reduces an 0.82
|
|
score to 0.66). This asymmetry is intentional: in autonomous
|
|
systems, the cost of trusting a bad agent exceeds the cost of
|
|
being slow to trust a good one.
|
|
|
|
Agents MAY tune alpha and beta per relationship or per action
|
|
type, but MUST use the AIMD structure.
|
|
|
|
5. Trust Events and Adjustments
|
|
|
|
The following standard trust events are defined:
|
|
|
|
| Event | Direction | Default Weight |
|
|
|----------------------|-----------|----------------|
|
|
| task_success | positive | 1x alpha |
|
|
| task_partial_success | positive | 0.5x alpha |
|
|
| task_failure | negative | 1x beta |
|
|
| task_timeout | negative | 1x beta |
|
|
| policy_violation | negative | applied twice |
|
|
| attestation_invalid | negative | applied twice |
|
|
| rollback_triggered | negative | 1x beta |
|
|
|
|
"applied twice" means the multiplicative decrease is applied
|
|
two times in succession (score * beta * beta), reflecting the
|
|
severity of policy violations versus simple failures.
|
|
|
|
Trust decay: if no interaction occurs for a configurable
|
|
period (default: 7 days), the trust score decays:
|
|
|
|
score = max(initial_default, score - decay_rate)
|
|
|
|
Default decay_rate: 0.01 per day. This ensures that stale
|
|
trust relationships gradually return to the default level
|
|
rather than persisting indefinitely.
|
|
|
|
Agents MUST record all trust events in a local audit log.
|
|
|
|
6. Trust Propagation
|
|
|
|
Agent A may share its trust assessment of Agent B with Agent C
|
|
through a signed trust assertion. The assertion is a JWT
|
|
(RFC 7519) with the following claims:
|
|
|
|
{
|
|
"iss": "urn:uuid:agent-a",
|
|
"sub": "urn:uuid:agent-b",
|
|
"iat": 1709294400,
|
|
"exp": 1709380800,
|
|
"dats_score": 0.82,
|
|
"dats_interactions": 147,
|
|
"dats_confidence": "high"
|
|
}
|
|
|
|
"dats_confidence" is based on interaction count: "low" (<10),
|
|
"medium" (10-99), "high" (100+).
|
|
|
|
When Agent C receives this assertion, it MAY incorporate it
|
|
into its own trust score for Agent B using attenuation:
|
|
|
|
c_score_for_b = max(c_score_for_b,
|
|
a_score_for_b * trust_of_a * attenuation)
|
|
|
|
Where:
|
|
- a_score_for_b is Agent A's reported score for B (0.82)
|
|
- trust_of_a is Agent C's trust score for Agent A
|
|
- attenuation is a constant (default: 0.5) preventing
|
|
unbounded trust propagation
|
|
|
|
Trust assertions are advisory. Agents MUST NOT blindly adopt
|
|
propagated scores. An agent's own direct observations always
|
|
take precedence over propagated trust.
|
|
|
|
To prevent trust laundering (colluding agents inflating each
|
|
other's scores), agents SHOULD limit propagation depth to 1
|
|
hop by default. The "dats_hops" claim tracks propagation
|
|
depth; agents MUST NOT propagate assertions where dats_hops
|
|
exceeds their configured maximum.
|
|
|
|
7. Threshold-Based Access Policies
|
|
|
|
Agents SHOULD define trust thresholds for different action
|
|
categories:
|
|
|
|
{
|
|
"thresholds": {
|
|
"read_data": 0.3,
|
|
"execute_task": 0.5,
|
|
"modify_config": 0.7,
|
|
"delegate_auth": 0.9
|
|
}
|
|
}
|
|
|
|
When an agent requests an action, the serving agent checks the
|
|
requester's trust score against the threshold for that action
|
|
type. If the score is below the threshold, the request is
|
|
denied with a 403 response including a DATS-specific error:
|
|
|
|
{
|
|
"error": "trust_insufficient",
|
|
"required_score": 0.7,
|
|
"current_score": 0.54,
|
|
"action": "modify_config"
|
|
}
|
|
|
|
The response SHOULD NOT reveal the exact current score in
|
|
production deployments to prevent score probing. Instead, it
|
|
MAY return only the "trust_insufficient" error.
|
|
|
|
Automatic revocation: when an agent's trust score drops below
|
|
a configured floor (default: 0.2), the trusting agent SHOULD
|
|
revoke all outstanding delegations and emit a trust revocation
|
|
event. This provides automatic containment of agents that
|
|
have become unreliable.
|
|
|
|
8. Security Considerations
|
|
|
|
Trust scores are sensitive metadata. Agents MUST NOT expose
|
|
their full trust tables to peers. Only pairwise trust
|
|
assertions (Section 6) should be shared, and only
|
|
intentionally.
|
|
|
|
Trust assertion JWTs MUST be signed using algorithms from
|
|
RFC 7518 (e.g., ES256, EdDSA). Agents MUST verify signatures
|
|
before processing trust assertions.
|
|
|
|
Score manipulation attacks: a malicious agent could
|
|
intentionally behave well for many interactions to build trust,
|
|
then exploit high trust for a damaging action. Mitigation:
|
|
policy_violation events apply double penalties, and
|
|
deployments SHOULD set trust thresholds high for critical
|
|
actions regardless of accumulated trust.
|
|
|
|
Sybil attacks: an attacker could create many agents to
|
|
generate fake positive trust assertions. Mitigation: agents
|
|
SHOULD weight propagated trust by their own direct trust in
|
|
the asserting agent (Section 6 attenuation) and SHOULD
|
|
require agents to be registered in a trusted directory (e.g.,
|
|
ANS) before accepting trust assertions.
|
|
|
|
All trust-related communications MUST use TLS 1.3 [RFC8446].
|
|
|
|
9. IANA Considerations
|
|
|
|
This document requests IANA establish the following:
|
|
|
|
1. Registration of JWT claims "dats_score",
|
|
"dats_interactions", "dats_confidence", and "dats_hops"
|
|
in the JSON Web Token Claims registry per RFC 7519.
|
|
|
|
2. A "DATS Trust Event Type" registry under Specification
|
|
Required policy. Initial entries: "task_success",
|
|
"task_partial_success", "task_failure", "task_timeout",
|
|
"policy_violation", "attestation_invalid",
|
|
"rollback_triggered".
|
|
|
|
Author's Address
|
|
|
|
Generated by IETF Draft Analyzer
|
|
2026-03-01
|