Internet-Draft                                           AI/Agent WG
Intended status: Standards Track                          March 2026
Expires: September 15, 2026


         Dynamic Agent Trust Scoring (DATS)
         draft-dats-dynamic-agent-trust-scoring-00

Abstract

   This document defines the Dynamic Agent Trust Scoring (DATS)
   protocol, a mechanism for AI agents to build, assess, and
   revoke trust relationships based on observed behavior over
   time.  Static authentication (certificates, API keys) verifies
   identity but says nothing about whether an agent is reliable,
   accurate, or well-behaved.  DATS augments identity-based auth
   with a numeric trust score that adjusts dynamically based on
   interaction outcomes.  The protocol defines trust score
   computation, propagation between agents, decay over inactivity,
   and threshold-based access policies.  DATS is intentionally
   simple: a single score per agent-pair, standard adjustment
   events, and a JWT-based transport for trust assertions.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   This document is intended to have Standards Track status.
   Distribution of this memo is unlimited.

Table of Contents

   1.  Introduction
   2.  Terminology
   3.  Problem Statement
   4.  Trust Score Model
   5.  Trust Events and Adjustments
   6.  Trust Propagation
   7.  Threshold-Based Access Policies
   8.  Security Considerations
   9.  IANA Considerations

1.  Introduction

   The IETF has 98 drafts addressing agent identity and
   authentication, providing strong mechanisms for verifying who
   an agent is.  But identity alone is insufficient for long-
   running autonomous systems.  A properly authenticated agent
   may still produce bad results, violate expectations, or
   degrade over time.  Static certificates cannot capture this.

   DATS adds a behavioral dimension to agent trust.  It answers
   the question: "I know who you are, but should I rely on you?"
   The model is deliberately simple — a single floating-point
   score between 0.0 and 1.0 per agent relationship — because
   complex reputation systems tend to be gamed or ignored.

   The protocol is inspired by:
   - TCP congestion control: trust increases slowly (additive)
     and decreases quickly (multiplicative) on failure.
   - TLS certificate transparency: trust assertions are logged
     for auditability.
   - Web of trust (PGP): trust can propagate through
     intermediaries, with attenuation.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described
   in RFC 2119 [RFC2119].

   Trust Score: A floating-point value in [0.0, 1.0] representing
   one agent's assessed reliability of another, based on observed
   interaction outcomes.

   Trust Event: An observable interaction outcome that causes a
   trust score adjustment.  Events are either positive (task
   completed successfully) or negative (task failed, timeout,
   policy violation).

   Trust Decay: The automatic reduction of trust scores over
   periods of inactivity, reflecting the principle that trust
   requires ongoing evidence.

   Trust Assertion: A signed statement by one agent about another
   agent's trust score, transportable as a JWT claim.

3.  Problem Statement

   Agent A delegates a task to Agent B.  Agent B completes it
   correctly.  Agent A delegates again.  After 100 successful
   interactions, Agent B starts returning subtly incorrect results
   (model drift, adversarial manipulation, or simple degradation).
   Agent A has no standard way to:

   1. Track B's reliability over time.
   2. Reduce B's privileges based on degraded performance.
   3. Share its experience with Agent C, who is considering
      delegating to Agent B.
   4. Automatically revoke B's access when trust drops below
      acceptable levels.

   Existing attestation drafts (STAMP, DAAP) provide
   cryptographic proof of specific actions but not ongoing
   behavioral assessment.  DATS fills this gap.

4.  Trust Score Model

   Each agent maintains a trust table: a mapping from peer agent
   IDs to trust scores.

      {
        "urn:uuid:agent-b": {
          "score": 0.82,
          "interactions": 147,
          "last_updated": "2026-03-01T11:30:00Z",
          "last_event": "task_success"
        }
      }

   Initial trust for an unknown agent is a deployment-configured
   default.  A value of 0.5 is RECOMMENDED as a neutral starting
   point, but deployments MAY use lower values (e.g., 0.1) for
   zero-trust environments.

   Trust scores are updated using an additive-increase,
   multiplicative-decrease (AIMD) algorithm:

   On positive event:
      score = min(1.0, score + alpha)

   On negative event:
      score = max(0.0, score * beta)

   Default parameters: alpha = 0.01, beta = 0.8.

   This means trust builds slowly (100 successes to go from 0.5
   to ~1.0) but drops quickly (a single failure reduces an 0.82
   score to 0.66).  This asymmetry is intentional: in autonomous
   systems, the cost of trusting a bad agent exceeds the cost of
   being slow to trust a good one.

   Agents MAY tune alpha and beta per relationship or per action
   type, but MUST use the AIMD structure.

5.  Trust Events and Adjustments

   The following standard trust events are defined:

   | Event                | Direction | Default Weight |
   |----------------------|-----------|----------------|
   | task_success         | positive  | 1x alpha       |
   | task_partial_success | positive  | 0.5x alpha     |
   | task_failure         | negative  | 1x beta        |
   | task_timeout         | negative  | 1x beta        |
   | policy_violation     | negative  | applied twice  |
   | attestation_invalid  | negative  | applied twice  |
   | rollback_triggered   | negative  | 1x beta        |

   "applied twice" means the multiplicative decrease is applied
   two times in succession (score * beta * beta), reflecting the
   severity of policy violations versus simple failures.

   Trust decay: if no interaction occurs for a configurable
   period (default: 7 days), the trust score decays:

      score = max(initial_default, score - decay_rate)

   Default decay_rate: 0.01 per day.  This ensures that stale
   trust relationships gradually return to the default level
   rather than persisting indefinitely.

   Agents MUST record all trust events in a local audit log.

6.  Trust Propagation

   Agent A may share its trust assessment of Agent B with Agent C
   through a signed trust assertion.  The assertion is a JWT
   (RFC 7519) with the following claims:

      {
        "iss": "urn:uuid:agent-a",
        "sub": "urn:uuid:agent-b",
        "iat": 1709294400,
        "exp": 1709380800,
        "dats_score": 0.82,
        "dats_interactions": 147,
        "dats_confidence": "high"
      }

   "dats_confidence" is based on interaction count: "low" (<10),
   "medium" (10-99), "high" (100+).

   When Agent C receives this assertion, it MAY incorporate it
   into its own trust score for Agent B using attenuation:

      c_score_for_b = max(c_score_for_b,
                          a_score_for_b * trust_of_a * attenuation)

   Where:
   - a_score_for_b is Agent A's reported score for B (0.82)
   - trust_of_a is Agent C's trust score for Agent A
   - attenuation is a constant (default: 0.5) preventing
     unbounded trust propagation

   Trust assertions are advisory.  Agents MUST NOT blindly adopt
   propagated scores.  An agent's own direct observations always
   take precedence over propagated trust.

   To prevent trust laundering (colluding agents inflating each
   other's scores), agents SHOULD limit propagation depth to 1
   hop by default.  The "dats_hops" claim tracks propagation
   depth; agents MUST NOT propagate assertions where dats_hops
   exceeds their configured maximum.

7.  Threshold-Based Access Policies

   Agents SHOULD define trust thresholds for different action
   categories:

      {
        "thresholds": {
          "read_data":      0.3,
          "execute_task":   0.5,
          "modify_config":  0.7,
          "delegate_auth":  0.9
        }
      }

   When an agent requests an action, the serving agent checks the
   requester's trust score against the threshold for that action
   type.  If the score is below the threshold, the request is
   denied with a 403 response including a DATS-specific error:

      {
        "error": "trust_insufficient",
        "required_score": 0.7,
        "current_score": 0.54,
        "action": "modify_config"
      }

   The response SHOULD NOT reveal the exact current score in
   production deployments to prevent score probing.  Instead, it
   MAY return only the "trust_insufficient" error.

   Automatic revocation: when an agent's trust score drops below
   a configured floor (default: 0.2), the trusting agent SHOULD
   revoke all outstanding delegations and emit a trust revocation
   event.  This provides automatic containment of agents that
   have become unreliable.

8.  Security Considerations

   Trust scores are sensitive metadata.  Agents MUST NOT expose
   their full trust tables to peers.  Only pairwise trust
   assertions (Section 6) should be shared, and only
   intentionally.

   Trust assertion JWTs MUST be signed using algorithms from
   RFC 7518 (e.g., ES256, EdDSA).  Agents MUST verify signatures
   before processing trust assertions.

   Score manipulation attacks: a malicious agent could
   intentionally behave well for many interactions to build trust,
   then exploit high trust for a damaging action.  Mitigation:
   policy_violation events apply double penalties, and
   deployments SHOULD set trust thresholds high for critical
   actions regardless of accumulated trust.

   Sybil attacks: an attacker could create many agents to
   generate fake positive trust assertions.  Mitigation: agents
   SHOULD weight propagated trust by their own direct trust in
   the asserting agent (Section 6 attenuation) and SHOULD
   require agents to be registered in a trusted directory (e.g.,
   ANS) before accepting trust assertions.

   All trust-related communications MUST use TLS 1.3 [RFC8446].

9.  IANA Considerations

   This document requests IANA establish the following:

   1. Registration of JWT claims "dats_score",
      "dats_interactions", "dats_confidence", and "dats_hops"
      in the JSON Web Token Claims registry per RFC 7519.

   2. A "DATS Trust Event Type" registry under Specification
      Required policy.  Initial entries: "task_success",
      "task_partial_success", "task_failure", "task_timeout",
      "policy_violation", "attestation_invalid",
      "rollback_triggered".

Author's Address

   Generated by IETF Draft Analyzer
   2026-03-01