Skip to content

Trust Model — Tiered Autonomy by Blast Radius

TL;DR

Autonomy is earned per task class, not granted by default. Four tiers (T0–T3) based on blast radius. Enforcement is defense-in-depth: dispatch filter + Review-E gate + Kyverno admission. Promotion is measured (20 successful runs with zero rollbacks); demotion is immediate on any attributable rollback. T3 actions never auto-promote — humans co-sign irreversibility.

Core claim: an agent's autonomy is a function of (a) the change's blast radius, (b) the agent's measured track record on that change class, and (c) the reversibility of the outcome. Not of the agent's identity, the task's urgency, or the human's convenience.

The four tiers

Tier Name Blast radius Reversibility Autonomy Enforcing gates
T0 Non-blast Docs, tests, scaffolding, YAML linting Trivial (git revert) Full agent, no human CI + Review-E
T1 Contained Single-service feature, test-covered refactor, bounded UI change Fast (flag kill ~30s, rollback ~5m) Full agent under canary CI + Review-E + Flagger SLO gate + error-budget check
T2 Cross-cutting Multi-repo work, event schema change, new public API, architect-level interface Slow (rollforward or multi-step rollback) Agent plans, human approves interface, agent implements CI + Review-E + human co-sign on interface + Kyverno two-attestor
T3 Irreversible Destructive DB migration, auth/authz code, payment logic, secret rotation, cluster-scope RBAC None (data loss, security regression) Human drives, agent assists Human approval mandatory + Kyverno reject without human-OIDC attestation

The tier is the dispatch ceiling. A task can be attempted at or below its tier. Running a T3 task through the T0 path is a policy violation blocked at admission.

Classification at intake

sequenceDiagram
    participant U as User / GitHub Issue
    participant S as Spec-E
    participant CE as Conductor-E
    participant POL as Policy Engine
    participant D as Dispatcher

    U->>S: Issue created
    S->>S: Extract files, surfaces, effects
    S->>POL: Classify blast radius
    POL-->>S: tier T0 / T1 / T2 / T3
    S->>U: Post clarifying questions if needed
    U-->>S: Answers
    S->>CE: Commit TaskSpec<br/>(tier, acceptance criteria,<br/>scope, test strategy)
    CE->>D: Dispatch decision
    alt tier == T0
        D->>D: Any agent, any time
    else tier == T1
        D->>D: Agent + canary pipeline
    else tier == T2
        D->>D: Require human interface approval<br/>before dispatch
    else tier == T3
        D->>D: Require human co-sign + explicit<br/>"I drive" confirmation
    end

Classification rules are encoded as a policy file (policy/blast-radius.yaml) in rig-gitops, evaluated by a deterministic classifier backed by Spec-E when ambiguous. Concrete rules:

T0 (Non-blast) — all of:

  • No changes to code paths that execute in production
  • Or: changes to test files, docs, YAML linting rules, GitHub Actions formatting
  • No changes to dependencies
  • No changes to agent prompts or the rig's own code
  • Branch protection does not require human review

T1 (Contained) — at least one of, and nothing from T2/T3:

  • Changes to a single service's code
  • Dependency additions from an allowlisted registry with Socket.dev score >= threshold
  • Refactors with existing test coverage >= 80% of changed lines
  • UI changes not touching auth, payments, or user data
  • The service has a defined SLO, a Flagger Canary, and a kill-switch feature flag

T2 (Cross-cutting) — at least one of:

  • Changes spanning 2+ repositories
  • Changes to Conductor-E event type definitions or the subscription registry
  • New public HTTP API surface or CLI command
  • Changes to the rig's own agent prompts or character files
  • Changes to a shared library consumed by 2+ services
  • Changes requiring new feature flags to be defined (not just flipped)

T3 (Irreversible) — at least one of:

  • Destructive DB DDL (DROP, TRUNCATE, non-backward-compatible ALTER)
  • Changes to authentication, authorization, or session handling
  • Changes to payment processing, billing, or money-handling paths
  • Changes to secret management or credential rotation logic
  • Cluster-scope Kubernetes RBAC changes
  • Changes to Kyverno policies themselves
  • Changes to the attestation chain (Sigstore config, SLSA workflows)
  • Production data migrations affecting >1M rows

Boundary cases route to Spec-E, which errs on the side of the higher tier.

Promotion: how autonomy is earned

An agent's autonomy tier for a task class is stored in Conductor-E as a projection:

record AgentAutonomy(
    string AgentId,
    string TaskClass,       // e.g., "docs-update", "ui-change", "service-refactor"
    int CeilingTier,        // 0..3, maximum tier this agent can attempt for this task class
    int SuccessfulRuns,     // rolling 90-day count
    int Failures,           // includes rollback, human-rework, budget-overrun
    DateTimeOffset LastReset
);

Default ceiling is T0 for every (agent, task-class) pair. Promotion rules:

  • T0 → T1: 20 consecutive successful T0 runs of that class, zero human-rework, zero rollbacks. Ceiling raises to T1 for that class.
  • T1 → T2: 20 consecutive successful T1 runs, zero canary aborts, zero SLO-budget depletions attributable to the change. Ceiling raises to T2 for that class.
  • T2 → T3: No automatic promotion. T3 tasks require human co-sign on every instance regardless of track record. The principle "humans at semantic boundaries" trumps accumulated trust.

Demotion rules:

  • Any rollback attributable to the agent's work on that class: ceiling drops one tier immediately, cooldown 30 days before promotion eligibility resets.
  • Model version change (e.g., Sonnet 4.6 → 4.7, or cross-vendor swap via LiteLLM fallback_models — see provider-portability.md; or any behavior-drift signal >30% on the canary suite): all ceilings reset to T0, promotion track record held in quarantine for human review.
  • New class of task: ceiling starts at T0 for that class regardless of the agent's ceiling on other classes.

T3 never auto-promotes

T3 work (destructive DB, auth, payments, secret rotation, cluster RBAC, Kyverno policy changes) always requires human co-sign on every instance. Accumulated track record on lower tiers does not raise this ceiling. Irreversibility is a structural reason, not a trust metric.

Promotion/demotion events are stored in the event log. Audit query: "show me every autonomy change for Dev-E in the last 90 days" is a replay.

Technical enforcement

Tier enforcement happens in three layers, defense-in-depth:

Layer 1: Dispatch

Conductor-E's assignment endpoint (GET /api/assignments/next?agentId=X) filters by the agent's ceiling tier. An agent with T1 ceiling on ui-change cannot be assigned a T2-classified UI change. The filter is cheap: one JOIN against AgentAutonomy.

Layer 2: Review-E gate

Review-E's character prompt includes tier-specific review criteria. For a T2 change, Review-E explicitly looks for "interface approval attestation" in the PR metadata and blocks the review if absent. For T3, Review-E refuses to approve and routes to human.

Layer 3: Kyverno admission

The cluster-level final gate. Kyverno ImageValidatingPolicy requires, for any manifest targeting namespaces with a blast-radius: t3 label:

apiVersion: policies.kyverno.io/v1
kind: ImageValidatingPolicy
metadata: { name: t3-human-cosign }
spec:
  validationActions: [Deny]
  matchConstraints:
    namespaceSelector:
      matchLabels: { blast-radius: t3 }
  attestors:
  - name: agent-identity
    cosign:
      keyless:
        identities:
        - subject: "https://github.com/dashecorp/.+/.github/workflows/release\\.ya?ml@.+"
          issuer: "https://token.actions.githubusercontent.com"
  - name: human-approval
    cosign:
      keyless:
        identities:
        - subject: "repo:dashecorp/prod-approvals:environment:t3-approve"
          issuer: "https://token.actions.githubusercontent.com"
  validations:
  - expression: "images.containers.map(i, verifyAttestationSignatures(i, attestations.slsa, [attestors.'agent-identity', attestors.'human-approval'])).all(e, e > 0)"

Translation: to land in a T3 namespace, the image must carry two valid Sigstore signatures — one from the agent's build workflow, one from a human-triggered approval workflow. No image, no signature, no admission. No human can forget the rule because the rule is enforced by the cluster, not the reviewer.

The TaskSpec object

Every dispatched task carries a TaskSpec that encodes everything needed to determine tier, dispatch correctly, and evaluate outcome:

id: task-202604160042
repo: dashecorp/conductor-e
issue: 76
tier: T1
blast_radius:
  reason: "Changes single-service feature with existing test coverage"
  surfaces: ["src/Api/EventsEndpoint.cs", "tests/Api/EventsTests.cs"]
  evaluated_by: spec-e
  evaluated_at: "2026-04-16T16:30:00Z"
acceptance_criteria:
  - "POST /api/events accepts new event type X"
  - "MartenProjections.cs updates IssueStatus for type X"
  - "Integration test covers happy path and bad input"
test_strategy:
  required_coverage_delta: 0
  property_tests: true
non_goals:
  - "Do not change existing event type definitions"
  - "Do not modify Discord routing"
expected_effort_tokens: 80000  # budget guardrail
assigned_agent: dev-e-dotnet
ceiling_tier: T1  # the assigned agent's current ceiling for this task class

Spec-E authors this. Conductor-E validates on submission (schema check + tier matches ceiling). Dispatch is gated on both.

Escalation paths

When an agent encounters work it cannot complete within its tier:

graph LR
    A[Agent working T1 task] -->|discovers need<br/>for T2 change| E[Emit EscalationRequired]
    E --> C[Conductor-E]
    C -->|route to human| H[#admin Discord<br/>+ @mention tier-owner]
    H -->|approve scope expansion| AR[Record approval attestation]
    AR -->|re-dispatch at T2| A2[Agent with T2 ceiling<br/>or new interface spec]
    H -->|reject| R[Close task, open new tracking]

An agent cannot silently exceed its tier. Any discovery that a task is actually larger than its classification forces an escalation event, stored with the discovery context, routed per severity.

Human co-sign mechanics

T2 "interface approval" and T3 "I drive" mechanics:

  • T2 interface approval: human reviews the proposed interface (event schema, API contract, CLI flags) in a dedicated interface-review issue. Approval records a GitHub Deployments API entry with the human's identity. Conductor-E reads the Deployments API as the attestation source. No approval event → Dispatcher refuses T2 work.
  • T3 human-drives: human is the primary author (or explicit named sponsor) of the PR. Agent assists via commits on a sub-branch that the human merges into the main feature branch. Kyverno rejects any T3 image not carrying the human's Sigstore co-sign. The human's GitHub Actions approval workflow (.github/workflows/t3-approve.yaml) is the signing surface — it runs on workflow_dispatch with environment t3-approve and requires a protection rule "required reviewers = human".

This is the same technical pattern Google and GitHub use for production deploys; we just wire it to our blast-radius labels.

When the tiers break down

Tiers are heuristics, not ground truth. Known failure modes and mitigations:

Failure Example Mitigation
Spec-E underclassifies T2 work marked T1 because Spec-E didn't realize it touched 2 repos Daily reconciliation check: Spec-E re-evaluates all open tasks; disagreements with the original classification surface as events.
Agent discovers higher-tier work mid-task T1 task turns out to need a schema change StuckGuard detects the agent stalling on a T2-shaped problem; forced escalation.
Human mis-approves T3 under time pressure Late-night hotfix signed off without review Kyverno logs every T3 admission; daily post-hoc audit surfaces rushed approvals.
Novel task class with no track record Agent has T2 ceiling on one service but is asked to work on a new one Default back to T0 for the new class; human observer required for first 20 runs.
Classification rules themselves change New product area added, new T3 criteria needed Changes to policy/blast-radius.yaml are themselves T2 tasks — agent can propose, human approves.

Why not flat permissions

Some systems give all agents uniform capabilities and rely on auditing. This is the "admin user" anti-pattern:

  • Audit is reactive. By the time someone notices an agent deployed to prod without canary, it's too late.
  • Uniform permissions mean a compromised agent (via prompt injection) inherits the full permission set.
  • Flat models create pressure to lower the permission floor for "simple" tasks, eroding the ceiling for risky ones.

Tiered autonomy is the structural answer. Blast-radius-scoped permissions mean even a fully-compromised T0 agent cannot deploy a T3 change.

Why not full autonomy

Gastown's GUPP (Gas Town Universal Propulsion Principle) takes the opposite stance: agents push direct to main, humans stay out of the loop, the rig's job is to keep agents running. This works for specific domains (internal tool development where the cost of a bad commit is low) but breaks down for:

  • Customer-facing services where outages have business cost
  • Security-critical code where bugs have compliance cost
  • Multi-tenant systems where blast radius crosses tenants
  • Regulated environments where human attestation is required

Our rig is designed for hybrid human-agent teams shipping software that humans depend on. The trust model is the price of that target.

Evolving the model

The trust model is itself tiered-policy-managed (meta-T2). Changes to this document or the policy/blast-radius.yaml rules go through human review. Changes to Kyverno enforcement policies go through meta-T3 (themselves a T3 change to the enforcement of T3).

This recursion stops at the human layer: at some point, the rig must trust humans, and humans must trust each other via the usual social and legal mechanisms.

See also