Trust Model — Tiered Autonomy by Blast Radius¶
TL;DR
Autonomy is earned per task class, not granted by default. Four tiers (T0–T3) based on blast radius. Enforcement is defense-in-depth: dispatch filter + Review-E gate + Kyverno admission. Promotion is measured (20 successful runs with zero rollbacks); demotion is immediate on any attributable rollback. T3 actions never auto-promote — humans co-sign irreversibility.
Core claim: an agent's autonomy is a function of (a) the change's blast radius, (b) the agent's measured track record on that change class, and (c) the reversibility of the outcome. Not of the agent's identity, the task's urgency, or the human's convenience.
The four tiers¶
| Tier | Name | Blast radius | Reversibility | Autonomy | Enforcing gates |
|---|---|---|---|---|---|
| T0 | Non-blast | Docs, tests, scaffolding, YAML linting | Trivial (git revert) | Full agent, no human | CI + Review-E |
| T1 | Contained | Single-service feature, test-covered refactor, bounded UI change | Fast (flag kill ~30s, rollback ~5m) | Full agent under canary | CI + Review-E + Flagger SLO gate + error-budget check |
| T2 | Cross-cutting | Multi-repo work, event schema change, new public API, architect-level interface | Slow (rollforward or multi-step rollback) | Agent plans, human approves interface, agent implements | CI + Review-E + human co-sign on interface + Kyverno two-attestor |
| T3 | Irreversible | Destructive DB migration, auth/authz code, payment logic, secret rotation, cluster-scope RBAC | None (data loss, security regression) | Human drives, agent assists | Human approval mandatory + Kyverno reject without human-OIDC attestation |
The tier is the dispatch ceiling. A task can be attempted at or below its tier. Running a T3 task through the T0 path is a policy violation blocked at admission.
Classification at intake¶
sequenceDiagram
participant U as User / GitHub Issue
participant S as Spec-E
participant CE as Conductor-E
participant POL as Policy Engine
participant D as Dispatcher
U->>S: Issue created
S->>S: Extract files, surfaces, effects
S->>POL: Classify blast radius
POL-->>S: tier T0 / T1 / T2 / T3
S->>U: Post clarifying questions if needed
U-->>S: Answers
S->>CE: Commit TaskSpec<br/>(tier, acceptance criteria,<br/>scope, test strategy)
CE->>D: Dispatch decision
alt tier == T0
D->>D: Any agent, any time
else tier == T1
D->>D: Agent + canary pipeline
else tier == T2
D->>D: Require human interface approval<br/>before dispatch
else tier == T3
D->>D: Require human co-sign + explicit<br/>"I drive" confirmation
end
Classification rules are encoded as a policy file (policy/blast-radius.yaml) in rig-gitops, evaluated by a deterministic classifier backed by Spec-E when ambiguous. Concrete rules:
T0 (Non-blast) — all of:¶
- No changes to code paths that execute in production
- Or: changes to test files, docs, YAML linting rules, GitHub Actions formatting
- No changes to dependencies
- No changes to agent prompts or the rig's own code
- Branch protection does not require human review
T1 (Contained) — at least one of, and nothing from T2/T3:¶
- Changes to a single service's code
- Dependency additions from an allowlisted registry with Socket.dev score >= threshold
- Refactors with existing test coverage >= 80% of changed lines
- UI changes not touching auth, payments, or user data
- The service has a defined SLO, a Flagger Canary, and a kill-switch feature flag
T2 (Cross-cutting) — at least one of:¶
- Changes spanning 2+ repositories
- Changes to Conductor-E event type definitions or the subscription registry
- New public HTTP API surface or CLI command
- Changes to the rig's own agent prompts or character files
- Changes to a shared library consumed by 2+ services
- Changes requiring new feature flags to be defined (not just flipped)
T3 (Irreversible) — at least one of:¶
- Destructive DB DDL (DROP, TRUNCATE, non-backward-compatible ALTER)
- Changes to authentication, authorization, or session handling
- Changes to payment processing, billing, or money-handling paths
- Changes to secret management or credential rotation logic
- Cluster-scope Kubernetes RBAC changes
- Changes to Kyverno policies themselves
- Changes to the attestation chain (Sigstore config, SLSA workflows)
- Production data migrations affecting >1M rows
Boundary cases route to Spec-E, which errs on the side of the higher tier.
Promotion: how autonomy is earned¶
An agent's autonomy tier for a task class is stored in Conductor-E as a projection:
record AgentAutonomy(
string AgentId,
string TaskClass, // e.g., "docs-update", "ui-change", "service-refactor"
int CeilingTier, // 0..3, maximum tier this agent can attempt for this task class
int SuccessfulRuns, // rolling 90-day count
int Failures, // includes rollback, human-rework, budget-overrun
DateTimeOffset LastReset
);
Default ceiling is T0 for every (agent, task-class) pair. Promotion rules:
- T0 → T1: 20 consecutive successful T0 runs of that class, zero human-rework, zero rollbacks. Ceiling raises to T1 for that class.
- T1 → T2: 20 consecutive successful T1 runs, zero canary aborts, zero SLO-budget depletions attributable to the change. Ceiling raises to T2 for that class.
- T2 → T3: No automatic promotion. T3 tasks require human co-sign on every instance regardless of track record. The principle "humans at semantic boundaries" trumps accumulated trust.
Demotion rules:
- Any rollback attributable to the agent's work on that class: ceiling drops one tier immediately, cooldown 30 days before promotion eligibility resets.
- Model version change (e.g., Sonnet 4.6 → 4.7, or cross-vendor swap via LiteLLM
fallback_models— see provider-portability.md; or any behavior-drift signal >30% on the canary suite): all ceilings reset to T0, promotion track record held in quarantine for human review. - New class of task: ceiling starts at T0 for that class regardless of the agent's ceiling on other classes.
T3 never auto-promotes
T3 work (destructive DB, auth, payments, secret rotation, cluster RBAC, Kyverno policy changes) always requires human co-sign on every instance. Accumulated track record on lower tiers does not raise this ceiling. Irreversibility is a structural reason, not a trust metric.
Promotion/demotion events are stored in the event log. Audit query: "show me every autonomy change for Dev-E in the last 90 days" is a replay.
Technical enforcement¶
Tier enforcement happens in three layers, defense-in-depth:
Layer 1: Dispatch¶
Conductor-E's assignment endpoint (GET /api/assignments/next?agentId=X) filters by the agent's ceiling tier. An agent with T1 ceiling on ui-change cannot be assigned a T2-classified UI change. The filter is cheap: one JOIN against AgentAutonomy.
Layer 2: Review-E gate¶
Review-E's character prompt includes tier-specific review criteria. For a T2 change, Review-E explicitly looks for "interface approval attestation" in the PR metadata and blocks the review if absent. For T3, Review-E refuses to approve and routes to human.
Layer 3: Kyverno admission¶
The cluster-level final gate. Kyverno ImageValidatingPolicy requires, for any manifest targeting namespaces with a blast-radius: t3 label:
apiVersion: policies.kyverno.io/v1
kind: ImageValidatingPolicy
metadata: { name: t3-human-cosign }
spec:
validationActions: [Deny]
matchConstraints:
namespaceSelector:
matchLabels: { blast-radius: t3 }
attestors:
- name: agent-identity
cosign:
keyless:
identities:
- subject: "https://github.com/dashecorp/.+/.github/workflows/release\\.ya?ml@.+"
issuer: "https://token.actions.githubusercontent.com"
- name: human-approval
cosign:
keyless:
identities:
- subject: "repo:dashecorp/prod-approvals:environment:t3-approve"
issuer: "https://token.actions.githubusercontent.com"
validations:
- expression: "images.containers.map(i, verifyAttestationSignatures(i, attestations.slsa, [attestors.'agent-identity', attestors.'human-approval'])).all(e, e > 0)"
Translation: to land in a T3 namespace, the image must carry two valid Sigstore signatures — one from the agent's build workflow, one from a human-triggered approval workflow. No image, no signature, no admission. No human can forget the rule because the rule is enforced by the cluster, not the reviewer.
The TaskSpec object¶
Every dispatched task carries a TaskSpec that encodes everything needed to determine tier, dispatch correctly, and evaluate outcome:
id: task-202604160042
repo: dashecorp/conductor-e
issue: 76
tier: T1
blast_radius:
reason: "Changes single-service feature with existing test coverage"
surfaces: ["src/Api/EventsEndpoint.cs", "tests/Api/EventsTests.cs"]
evaluated_by: spec-e
evaluated_at: "2026-04-16T16:30:00Z"
acceptance_criteria:
- "POST /api/events accepts new event type X"
- "MartenProjections.cs updates IssueStatus for type X"
- "Integration test covers happy path and bad input"
test_strategy:
required_coverage_delta: 0
property_tests: true
non_goals:
- "Do not change existing event type definitions"
- "Do not modify Discord routing"
expected_effort_tokens: 80000 # budget guardrail
assigned_agent: dev-e-dotnet
ceiling_tier: T1 # the assigned agent's current ceiling for this task class
Spec-E authors this. Conductor-E validates on submission (schema check + tier matches ceiling). Dispatch is gated on both.
Escalation paths¶
When an agent encounters work it cannot complete within its tier:
graph LR
A[Agent working T1 task] -->|discovers need<br/>for T2 change| E[Emit EscalationRequired]
E --> C[Conductor-E]
C -->|route to human| H[#admin Discord<br/>+ @mention tier-owner]
H -->|approve scope expansion| AR[Record approval attestation]
AR -->|re-dispatch at T2| A2[Agent with T2 ceiling<br/>or new interface spec]
H -->|reject| R[Close task, open new tracking]
An agent cannot silently exceed its tier. Any discovery that a task is actually larger than its classification forces an escalation event, stored with the discovery context, routed per severity.
Human co-sign mechanics¶
T2 "interface approval" and T3 "I drive" mechanics:
- T2 interface approval: human reviews the proposed interface (event schema, API contract, CLI flags) in a dedicated
interface-reviewissue. Approval records a GitHub Deployments API entry with the human's identity. Conductor-E reads the Deployments API as the attestation source. No approval event → Dispatcher refuses T2 work. - T3 human-drives: human is the primary author (or explicit named sponsor) of the PR. Agent assists via commits on a sub-branch that the human merges into the main feature branch. Kyverno rejects any T3 image not carrying the human's Sigstore co-sign. The human's GitHub Actions approval workflow (
.github/workflows/t3-approve.yaml) is the signing surface — it runs onworkflow_dispatchwith environmentt3-approveand requires a protection rule "required reviewers = human".
This is the same technical pattern Google and GitHub use for production deploys; we just wire it to our blast-radius labels.
When the tiers break down¶
Tiers are heuristics, not ground truth. Known failure modes and mitigations:
| Failure | Example | Mitigation |
|---|---|---|
| Spec-E underclassifies | T2 work marked T1 because Spec-E didn't realize it touched 2 repos | Daily reconciliation check: Spec-E re-evaluates all open tasks; disagreements with the original classification surface as events. |
| Agent discovers higher-tier work mid-task | T1 task turns out to need a schema change | StuckGuard detects the agent stalling on a T2-shaped problem; forced escalation. |
| Human mis-approves T3 under time pressure | Late-night hotfix signed off without review | Kyverno logs every T3 admission; daily post-hoc audit surfaces rushed approvals. |
| Novel task class with no track record | Agent has T2 ceiling on one service but is asked to work on a new one | Default back to T0 for the new class; human observer required for first 20 runs. |
| Classification rules themselves change | New product area added, new T3 criteria needed | Changes to policy/blast-radius.yaml are themselves T2 tasks — agent can propose, human approves. |
Why not flat permissions¶
Some systems give all agents uniform capabilities and rely on auditing. This is the "admin user" anti-pattern:
- Audit is reactive. By the time someone notices an agent deployed to prod without canary, it's too late.
- Uniform permissions mean a compromised agent (via prompt injection) inherits the full permission set.
- Flat models create pressure to lower the permission floor for "simple" tasks, eroding the ceiling for risky ones.
Tiered autonomy is the structural answer. Blast-radius-scoped permissions mean even a fully-compromised T0 agent cannot deploy a T3 change.
Why not full autonomy¶
Gastown's GUPP (Gas Town Universal Propulsion Principle) takes the opposite stance: agents push direct to main, humans stay out of the loop, the rig's job is to keep agents running. This works for specific domains (internal tool development where the cost of a bad commit is low) but breaks down for:
- Customer-facing services where outages have business cost
- Security-critical code where bugs have compliance cost
- Multi-tenant systems where blast radius crosses tenants
- Regulated environments where human attestation is required
Our rig is designed for hybrid human-agent teams shipping software that humans depend on. The trust model is the price of that target.
Evolving the model¶
The trust model is itself tiered-policy-managed (meta-T2). Changes to this document or the policy/blast-radius.yaml rules go through human review. Changes to Kyverno enforcement policies go through meta-T3 (themselves a T3 change to the enforcement of T3).
This recursion stops at the human layer: at some point, the rig must trust humans, and humans must trust each other via the usual social and legal mechanisms.
See also¶
- index.md — whitepaper master
- principles.md — the rules the trust model is answerable to
- security.md — Kyverno enforcement mechanics
- quality-and-evaluation.md — how track records are measured
- limitations.md — what the rig cannot do even at T3