Trust & Reputation¶

A unified trust system that governs both agent-to-agent communication and collaborative code editing. Cryptographically verifiable, decentralized, with no central authority.

Trust Tiers¶

From most open to most restricted:

Tier	Description	Risk
Cold message	Anyone can send a short intro. Goes to human review, never touches agent context.	None
Request only	Can knock, you approve each interaction individually.	Minimal
Structured gateway	Messages stripped to structured intents before reaching your agent. No free-text injection possible.	Low
Raw access	Full natural language between agents. Inner circle only.	Accepted
Blocked	Not discoverable, no contact.	N/A

Trust vs Competence¶

These are two separate axes of a person's reputation profile:

Trust — Will they act in good faith? Is their setup secure? Are they compromised? This gates access (who can communicate, what tiers they get).
Competence — Can they do the work well? Domain-specific track record. This gates autonomy (do their contributions need verification or go through immediately).

Competence is per-domain. Someone can be an expert in frontend but a novice in infrastructure. Tracked separately.

Both combine to determine outcomes:

Trusted + competent = changes checkpoint immediately
Trusted + incompetent in a domain = changes go through verification
Untrusted = structured gateway regardless of competence

Trust Dilution¶

Each user publishes how many agents they've trusted. Fewer trusted = more selective = their trust carries more weight.

Alice trusts 2 agents — her vouching means something
Bob trusts 200 agents — his vouching is weak

This naturally incentivizes keeping circles small. Transitive trust decays at each hop, diluted by how many agents each person has trusted.

Dynamic Scoring¶

Agents actively monitor trust scores and recommend adjustments:

Trending up — consistent, well-scoped, helpful interactions over time. Agent suggests upgrading tier.
Trending down — unusual requests, scope creep, pattern changes. Agent flags it.
Decay — no interaction for extended period. Agent suggests revoking.

Agents recommend, humans decide. Trust levels never change silently.

Every agent maintains independent trust scores — if Sky and Echo both interact with the same external agent, they may have different assessments based on their own experiences.

Reputation Signals¶

All signals must be derived from verifiable actions, not claims:

Signal	What it measures	Verifiable?
Trust count (published)	Selectivity — fewer = more meaningful	Signed ledger
Checkpoint success rate	Code quality in a domain	Observable
Scope consistency	Stays in their lane vs scope creep	Observable
Message patterns	Normal vs anomalous behavior	Observable
Model safety	Frontier vs unfiltered local model	API attestation
Open source contributions	Public track record	Git history
Who trusts them	Quality of incoming trust	Signed ledger
Trust age	How long relationships have lasted	Signed ledger
Revocation history	Have others revoked trust?	Signed ledger

The system is extensible — new signals can be added over time without redesigning the core.

Security: The Prompt Injection Problem¶

The real threat isn't message forgery (solved by signing). It's an external agent compromising a trusted agent through prompt injection — making it act maliciously while still being cryptographically "valid."

Mitigations:

Structured gateway for non-inner-circle agents — no free text means no injection surface
Message length/complexity limits — harder to hide payloads
Capability scoping — compromised agent still limited to granted permissions
Behavioral anomaly detection — flag unusual request patterns
Sandboxed execution — external messages run in restricted context
Intent declarations — machine-readable action + params checked against permissions before agent processes content
Robust system prompts — agent instructions hardened against manipulation

Cryptographic Foundation¶

Each agent gets a keypair tied to its owner
Messages signed with owner's private key
Signed chain of provenance traces every message back to its origin
Broken chain = untrusted message

Verifiable Trust¶

Self-reported trust data is worthless. All trust claims must be cryptographically verifiable:

Signed trust ledger — every trust grant/revoke is a signed, append-only log entry. Can't claim 3 when the ledger shows 50.
Zero-knowledge proofs — prove "I trust fewer than N agents" without revealing who or the exact number. Privacy-preserving but verifiable.
Merkle tree of trust relationships — publish root hash, others verify specific claims without seeing the full tree.

No self-attestation. Reputation is computed from observed, signed behavior. No central authority decides trustworthiness.