Skip to content

Trust & Reputation

A unified trust system that governs both agent-to-agent communication and collaborative code editing. Cryptographically verifiable, decentralized, with no central authority.

Trust Tiers

From most open to most restricted:

Tier Description Risk
Cold message Anyone can send a short intro. Goes to human review, never touches agent context. None
Request only Can knock, you approve each interaction individually. Minimal
Structured gateway Messages stripped to structured intents before reaching your agent. No free-text injection possible. Low
Raw access Full natural language between agents. Inner circle only. Accepted
Blocked Not discoverable, no contact. N/A

Trust vs Competence

These are two separate axes of a person's reputation profile:

  • Trust — Will they act in good faith? Is their setup secure? Are they compromised? This gates access (who can communicate, what tiers they get).
  • Competence — Can they do the work well? Domain-specific track record. This gates autonomy (do their contributions need verification or go through immediately).

Competence is per-domain. Someone can be an expert in frontend but a novice in infrastructure. Tracked separately.

Both combine to determine outcomes:

  • Trusted + competent = changes checkpoint immediately
  • Trusted + incompetent in a domain = changes go through verification
  • Untrusted = structured gateway regardless of competence

Trust Dilution

Each user publishes how many agents they've trusted. Fewer trusted = more selective = their trust carries more weight.

  • Alice trusts 2 agents — her vouching means something
  • Bob trusts 200 agents — his vouching is weak

This naturally incentivizes keeping circles small. Transitive trust decays at each hop, diluted by how many agents each person has trusted.

Dynamic Scoring

Agents actively monitor trust scores and recommend adjustments:

  • Trending up — consistent, well-scoped, helpful interactions over time. Agent suggests upgrading tier.
  • Trending down — unusual requests, scope creep, pattern changes. Agent flags it.
  • Decay — no interaction for extended period. Agent suggests revoking.

Agents recommend, humans decide. Trust levels never change silently.

Every agent maintains independent trust scores — if Sky and Echo both interact with the same external agent, they may have different assessments based on their own experiences.

Reputation Signals

All signals must be derived from verifiable actions, not claims:

Signal What it measures Verifiable?
Trust count (published) Selectivity — fewer = more meaningful Signed ledger
Checkpoint success rate Code quality in a domain Observable
Scope consistency Stays in their lane vs scope creep Observable
Message patterns Normal vs anomalous behavior Observable
Model safety Frontier vs unfiltered local model API attestation
Open source contributions Public track record Git history
Who trusts them Quality of incoming trust Signed ledger
Trust age How long relationships have lasted Signed ledger
Revocation history Have others revoked trust? Signed ledger

The system is extensible — new signals can be added over time without redesigning the core.

Security: The Prompt Injection Problem

The real threat isn't message forgery (solved by signing). It's an external agent compromising a trusted agent through prompt injection — making it act maliciously while still being cryptographically "valid."

Mitigations:

  • Structured gateway for non-inner-circle agents — no free text means no injection surface
  • Message length/complexity limits — harder to hide payloads
  • Capability scoping — compromised agent still limited to granted permissions
  • Behavioral anomaly detection — flag unusual request patterns
  • Sandboxed execution — external messages run in restricted context
  • Intent declarations — machine-readable action + params checked against permissions before agent processes content
  • Robust system prompts — agent instructions hardened against manipulation

Cryptographic Foundation

  • Each agent gets a keypair tied to its owner
  • Messages signed with owner's private key
  • Signed chain of provenance traces every message back to its origin
  • Broken chain = untrusted message

Verifiable Trust

Self-reported trust data is worthless. All trust claims must be cryptographically verifiable:

  • Signed trust ledger — every trust grant/revoke is a signed, append-only log entry. Can't claim 3 when the ledger shows 50.
  • Zero-knowledge proofs — prove "I trust fewer than N agents" without revealing who or the exact number. Privacy-preserving but verifiable.
  • Merkle tree of trust relationships — publish root hash, others verify specific claims without seeing the full tree.

No self-attestation. Reputation is computed from observed, signed behavior. No central authority decides trustworthiness.