Trust Model — Tiered Autonomy by Blast Radius¶

TL;DR

Autonomy is earned per task class, not granted by default. Four tiers (T0–T3) based on blast radius. Enforcement is defense-in-depth: dispatch filter + Review-E gate + Kyverno admission. Promotion is measured (20 successful runs with zero rollbacks); demotion is immediate on any attributable rollback. T3 actions never auto-promote — humans co-sign irreversibility.

Core claim: an agent's autonomy is a function of (a) the change's blast radius, (b) the agent's measured track record on that change class, and (c) the reversibility of the outcome. Not of the agent's identity, the task's urgency, or the human's convenience.

The four tiers¶

Tier	Name	Blast radius	Reversibility	Autonomy	Enforcing gates
T0	Non-blast	Docs, tests, scaffolding, YAML linting	Trivial (git revert)	Full agent, no human	CI + Review-E
T1	Contained	Single-service feature, test-covered refactor, bounded UI change	Fast (flag kill ~30s, rollback ~5m)	Full agent under canary	CI + Review-E + Flagger SLO gate + error-budget check
T2	Cross-cutting	Multi-repo work, event schema change, new public API, architect-level interface	Slow (rollforward or multi-step rollback)	Agent plans, human approves interface, agent implements	CI + Review-E + human co-sign on interface + Kyverno two-attestor
T3	Irreversible	Destructive DB migration, auth/authz code, payment logic, secret rotation, cluster-scope RBAC	None (data loss, security regression)	Human drives, agent assists	Human approval mandatory + Kyverno reject without human-OIDC attestation

The tier is the dispatch ceiling. A task can be attempted at or below its tier. Running a T3 task through the T0 path is a policy violation blocked at admission.

Classification at intake¶

sequenceDiagram
    participant U as User / GitHub Issue
    participant S as Spec-E
    participant CE as rig-conductor
    participant POL as Policy Engine
    participant D as Dispatcher

    U->>S: Issue created
    S->>S: Extract files, surfaces, effects
    S->>POL: Classify blast radius
    POL-->>S: tier T0 / T1 / T2 / T3
    S->>U: Post clarifying questions if needed
    U-->>S: Answers
    S->>CE: Commit TaskSpec<br/>(tier, acceptance criteria,<br/>scope, test strategy)
    CE->>D: Dispatch decision
    alt tier == T0
        D->>D: Any agent, any time
    else tier == T1
        D->>D: Agent + canary pipeline
    else tier == T2
        D->>D: Require human interface approval<br/>before dispatch
    else tier == T3
        D->>D: Require human co-sign + explicit<br/>"I drive" confirmation
    end

Classification rules are encoded as a policy file (policy/blast-radius.yaml) in rig-gitops, evaluated by a deterministic classifier backed by Spec-E when ambiguous. Concrete rules:

T0 (Non-blast) — all of:¶

No changes to code paths that execute in production
Or: changes to test files, docs, YAML linting rules, GitHub Actions formatting
No changes to dependencies
No changes to agent prompts or the rig's own code
Branch protection does not require human review

T1 (Contained) — at least one of, and nothing from T2/T3:¶

Changes to a single service's code
Dependency additions from an allowlisted registry with Socket.dev score >= threshold
Refactors with existing test coverage >= 80% of changed lines
UI changes not touching auth, payments, or user data
The service has a defined SLO, a Flagger Canary, and a kill-switch feature flag

T2 (Cross-cutting) — at least one of:¶

Changes spanning 2+ repositories
Changes to rig-conductor event type definitions or the subscription registry
New public HTTP API surface or CLI command
Changes to the rig's own agent prompts or character files
Changes to a shared library consumed by 2+ services
Changes requiring new feature flags to be defined (not just flipped)

T3 (Irreversible) — at least one of:¶

Destructive DB DDL (DROP, TRUNCATE, non-backward-compatible ALTER)
Changes to authentication, authorization, or session handling
Changes to payment processing, billing, or money-handling paths
Changes to secret management or credential rotation logic
Cluster-scope Kubernetes RBAC changes
Changes to Kyverno policies themselves
Changes to the attestation chain (Sigstore config, SLSA workflows)
Production data migrations affecting >1M rows

Boundary cases route to Spec-E, which errs on the side of the higher tier.

Promotion: how autonomy is earned¶

An agent's autonomy tier for a task class is stored in rig-conductoras a projection:

record AgentAutonomy(
    string AgentId,
    string TaskClass,       // e.g., "docs-update", "ui-change", "service-refactor"
    int CeilingTier,        // 0..3, maximum tier this agent can attempt for this task class
    int SuccessfulRuns,     // rolling 90-day count
    int Failures,           // includes rollback, human-rework, budget-overrun
    DateTimeOffset LastReset
);

Default ceiling is T0 for every (agent, task-class) pair. Promotion rules:

T0 → T1: 20 consecutive successful T0 runs of that class, zero human-rework, zero rollbacks. Ceiling raises to T1 for that class.
T1 → T2: 20 consecutive successful T1 runs, zero canary aborts, zero SLO-budget depletions attributable to the change. Ceiling raises to T2 for that class.
T2 → T3: No automatic promotion. T3 tasks require human co-sign on every instance regardless of track record. The principle "humans at semantic boundaries" trumps accumulated trust.

Demotion rules:

Any rollback attributable to the agent's work on that class: ceiling drops one tier immediately, cooldown 30 days before promotion eligibility resets.
Model version change (e.g., Sonnet 4.6 → 4.7, or cross-vendor swap via LiteLLM fallback_models — see provider-portability.md; or any behavior-drift signal >30% on the canary suite): all ceilings reset to T0, promotion track record held in quarantine for human review.
New class of task: ceiling starts at T0 for that class regardless of the agent's ceiling on other classes.

T3 never auto-promotes

T3 work (destructive DB, auth, payments, secret rotation, cluster RBAC, Kyverno policy changes) always requires human co-sign on every instance. Accumulated track record on lower tiers does not raise this ceiling. Irreversibility is a structural reason, not a trust metric.

Promotion/demotion events are stored in the event log. Audit query: "show me every autonomy change for Dev-E in the last 90 days" is a replay.

Technical enforcement¶

Tier enforcement happens in three layers, defense-in-depth:

Layer 1: Dispatch¶

rig-conductor's assignment endpoint (GET /api/assignments/next?agentId=X) filters by the agent's ceiling tier. An agent with T1 ceiling on ui-change cannot be assigned a T2-classified UI change. The filter is cheap: one JOIN against AgentAutonomy.

Layer 2: Review-E gate¶

Review-E's character prompt includes tier-specific review criteria. For a T2 change, Review-E explicitly looks for "interface approval attestation" in the PR metadata and blocks the review if absent. For T3, Review-E refuses to approve and routes to human.

Layer 3: Kyverno admission¶

The cluster-level final gate. Kyverno ImageValidatingPolicy requires, for any manifest targeting namespaces with a blast-radius: t3 label:

apiVersion: policies.kyverno.io/v1
kind: ImageValidatingPolicy
metadata: { name: t3-human-cosign }
spec:
  validationActions: [Deny]
  matchConstraints:
    namespaceSelector:
      matchLabels: { blast-radius: t3 }
  attestors:
  - name: agent-identity
    cosign:
      keyless:
        identities:
        - subject: "https://github.com/dashecorp/.+/.github/workflows/release\\.ya?ml@.+"
          issuer: "https://token.actions.githubusercontent.com"
  - name: human-approval
    cosign:
      keyless:
        identities:
        - subject: "repo:dashecorp/prod-approvals:environment:t3-approve"
          issuer: "https://token.actions.githubusercontent.com"
  validations:
  - expression: "images.containers.map(i, verifyAttestationSignatures(i, attestations.slsa, [attestors.'agent-identity', attestors.'human-approval'])).all(e, e > 0)"

Translation: to land in a T3 namespace, the image must carry two valid Sigstore signatures — one from the agent's build workflow, one from a human-triggered approval workflow. No image, no signature, no admission. No human can forget the rule because the rule is enforced by the cluster, not the reviewer.

The TaskSpec object¶

Every dispatched task carries a TaskSpec that encodes everything needed to determine tier, dispatch correctly, and evaluate outcome:

id: task-202604160042
repo: dashecorp/rig-conductor
issue: 76
tier: T1
blast_radius:
  reason: "Changes single-service feature with existing test coverage"
  surfaces: ["src/Api/EventsEndpoint.cs", "tests/Api/EventsTests.cs"]
  evaluated_by: spec-e
  evaluated_at: "2026-04-16T16:30:00Z"
acceptance_criteria:
  - "POST /api/events accepts new event type X"
  - "MartenProjections.cs updates IssueStatus for type X"
  - "Integration test covers happy path and bad input"
test_strategy:
  required_coverage_delta: 0
  property_tests: true
non_goals:
  - "Do not change existing event type definitions"
  - "Do not modify Discord routing"
expected_effort_tokens: 80000  # budget guardrail
assigned_agent: dev-e-dotnet
ceiling_tier: T1  # the assigned agent's current ceiling for this task class

Spec-E authors this. rig-conductorvalidates on submission (schema check + tier matches ceiling). Dispatch is gated on both.

Escalation paths¶

When an agent encounters work it cannot complete within its tier:

graph LR
    A[Agent working T1 task] -->|discovers need<br/>for T2 change| E[Emit EscalationRequired]
    E --> C[rig-conductor]
    C -->|route to human| H[#admin Discord<br/>+ @mention tier-owner]
    H -->|approve scope expansion| AR[Record approval attestation]
    AR -->|re-dispatch at T2| A2[Agent with T2 ceiling<br/>or new interface spec]
    H -->|reject| R[Close task, open new tracking]

An agent cannot silently exceed its tier. Any discovery that a task is actually larger than its classification forces an escalation event, stored with the discovery context, routed per severity.

Human co-sign mechanics¶

T2 "interface approval" and T3 "I drive" mechanics:

T2 interface approval: human reviews the proposed interface (event schema, API contract, CLI flags) in a dedicated interface-review issue. Approval records a GitHub Deployments API entry with the human's identity. rig-conductor reads the Deployments API as the attestation source. No approval event → Dispatcher refuses T2 work.
T3 human-drives: human is the primary author (or explicit named sponsor) of the PR. Agent assists via commits on a sub-branch that the human merges into the main feature branch. Kyverno rejects any T3 image not carrying the human's Sigstore co-sign. The human's GitHub Actions approval workflow (.github/workflows/t3-approve.yaml) is the signing surface — it runs on workflow_dispatch with environment t3-approve and requires a protection rule "required reviewers = human".

This is the same technical pattern Google and GitHub use for production deploys; we just wire it to our blast-radius labels.

When the tiers break down¶

Tiers are heuristics, not ground truth. Known failure modes and mitigations:

Failure	Example	Mitigation
Spec-E underclassifies	T2 work marked T1 because Spec-E didn't realize it touched 2 repos	Daily reconciliation check: Spec-E re-evaluates all open tasks; disagreements with the original classification surface as events.
Agent discovers higher-tier work mid-task	T1 task turns out to need a schema change	StuckGuard detects the agent stalling on a T2-shaped problem; forced escalation.
Human mis-approves T3 under time pressure	Late-night hotfix signed off without review	Kyverno logs every T3 admission; daily post-hoc audit surfaces rushed approvals.
Novel task class with no track record	Agent has T2 ceiling on one service but is asked to work on a new one	Default back to T0 for the new class; human observer required for first 20 runs.
Classification rules themselves change	New product area added, new T3 criteria needed	Changes to `policy/blast-radius.yaml` are themselves T2 tasks — agent can propose, human approves.

Why not flat permissions¶

Some systems give all agents uniform capabilities and rely on auditing. This is the "admin user" anti-pattern:

Audit is reactive. By the time someone notices an agent deployed to prod without canary, it's too late.
Uniform permissions mean a compromised agent (via prompt injection) inherits the full permission set.
Flat models create pressure to lower the permission floor for "simple" tasks, eroding the ceiling for risky ones.

Tiered autonomy is the structural answer. Blast-radius-scoped permissions mean even a fully-compromised T0 agent cannot deploy a T3 change.

Why not full autonomy¶

Gastown's GUPP (Gas Town Universal Propulsion Principle) takes the opposite stance: agents push direct to main, humans stay out of the loop, the rig's job is to keep agents running. This works for specific domains (internal tool development where the cost of a bad commit is low) but breaks down for:

Customer-facing services where outages have business cost
Security-critical code where bugs have compliance cost
Multi-tenant systems where blast radius crosses tenants
Regulated environments where human attestation is required

Our rig is designed for hybrid human-agent teams shipping software that humans depend on. The trust model is the price of that target.

Evolving the model¶

The trust model is itself tiered-policy-managed (meta-T2). Changes to this document or the policy/blast-radius.yaml rules go through human review. Changes to Kyverno enforcement policies go through meta-T3 (themselves a T3 change to the enforcement of T3).

This recursion stops at the human layer: at some point, the rig must trust humans, and humans must trust each other via the usual social and legal mechanisms.