Skip to content

Operator override channel

Why

Today operators occasionally have to break a chicken-and-egg halt by running ad-hoc GitHub commands directly — e.g. gh pr merge --admin — which bypasses branch protection without leaving a structured trace. That's wrong on two axes:

  1. The override path is undiscoverable — the operator has to guess which flag to use.
  2. The override leaves no audit event in the conductor projection, so "how often does the rig need a human" isn't measurable.

The override channel is a deliberate operator-facing surface with an explicit grammar and a typed OperatorOverride event for every bypass.

Design principles

  1. Overrides are first-class. A merge bypass without an event is a bug, not a feature.
  2. Operators are namespaced. Discord identity → conductor projection field. We always record who overrode.
  3. Rationale is mandatory. reason="..." is required for every command. Empty rationale = command rejected at parse time.
  4. Frequency is a signal. If /override unblock-coi fires more than 2× / week, that's a P1 indicator that the CoI predicate is broken upstream.

Grammar

Discord #admin (gated by Discord role) — or POST /api/admin/overrides:

Command Action Event payload
/override merge <repo>#<pr> reason="..." Admin-merge a PR despite gate failures OperatorOverride { Kind: Merge, Target: pr, Reason, Operator, MergeSha }
/override review-vote <repo>#<pr> approve\|request-changes reason="..." Cast a binding review on the operator's behalf when the agent abstained (rc#610) OperatorOverride { Kind: ReviewVote, Decision, ... }
/override redispatch <repo>#<n> reason="..." Re-emit ISSUE_ASSIGNED on a stuck issue (label-toggle) OperatorOverride { Kind: Redispatch, ... }
/override unblock-coi <repo>#<pr> reason="..." Apply coi-cleared label so Review-E re-dispatches with explicit clearance (rar#181) OperatorOverride { Kind: CoiCleared, ... }
/override close-zombie <repo>#<pr> reason="..." Close a stale PR with rationale OperatorOverride { Kind: CloseZombie, ... }

All commands:

  • Require an explicit reason="..." (single or double quotes).
  • Resolve Operator from Discord user id → GitHub login when known.
  • Append OperatorOverride to the repo-scoped sentinel stream {repo}!overrides.
  • Mirror the event onto the issue or PR stream so the per-issue Discord thread reflects the override.
  • Return a Discord ack (and HTTP eventId) so the operator has a traceable handle.

HTTP API

Authorization (rc#612 re-review #1)

POST /api/admin/overrides is identity-gated server-side, not by the upstream proxy. The endpoint requires:

  1. X-Operator-Discord-Id header — non-empty. The body cannot supply this value; the server only trusts the transport-supplied header.
  2. The header value must be on the configured allowlist: OperatorOverrides:AllowedDiscordIds (k8s Secret/ConfigMap, never committed). Empty allowlist = endpoint disabled (fail-closed 403).

Failure modes:

Status Meaning
401 X-Operator-Discord-Id header missing or whitespace
403 Allowlist empty (override channel disabled) OR header value not on allowlist

The recorded OperatorDiscordId on the emitted OperatorOverride event is always the verified header value — this is the audit-trail invariant that the per-operator frequency metric relies on.

Request

Two equivalent input shapes — pick whichever your transport prefers:

POST /api/admin/overrides
X-Operator-Discord-Id: 1234567890
X-Operator-GitHub-Login: stig.johnny
Content-Type: application/json

{ "command": "/override merge dashecorp/rig-conductor#42 reason=\"break halt rc#608\"" }
POST /api/admin/overrides
X-Operator-Discord-Id: 1234567890
Content-Type: application/json

{
  "kind": "Merge",
  "repo": "dashecorp/rig-conductor",
  "target": 42,
  "reason": "break halt rc#608"
}

Both return:

{
  "ok": true,
  "kind": "Merge",
  "eventId": "dashecorp/rig-conductor:42:Merge:638509...",
  "ack": "✅ admin-merged `dashecorp/rig-conductor#42` by **stig.johnny** — sha `abc1234`. Reason: _break halt rc#608_. Event: `…`"
}

On failure (ok: false) the event is never emitted — a missing event always means the action did not happen. This is by design: operators should be able to trust the projection as the source of truth.

Discoverability

Operators discover the channel via: - BRAIN.md → "Operator overrides" section pointer. - docs/operator-overrides.md (this file) — listed in the docs index. - Discord /help override (slash-command auto-help) when running through Discord.

Event schema

public enum OperatorOverrideKind { Merge, ReviewVote, Redispatch, CoiCleared, CloseZombie }

public record OperatorOverride(
    OperatorOverrideKind Kind,
    string Repo,
    int Target,                       // PR # for most, issue # for Redispatch
    string Reason,                    // mandatory, non-empty
    string OperatorDiscordId,         // canonical authority
    string? OperatorGitHubLogin,      // resolved when known
    string? Decision = null,          // "approve"|"request-changes" — ReviewVote only
    string? MergeSha = null,          // set on Kind=Merge after success
    DateTimeOffset At = default
);

Stream: {repo}!overrides (repo-scoped sentinel — mirrors {repo}!ci). Projection: OperatorOverrideRecord keyed on {Repo}:{Target}:{Kind}:{ticks} so replays are idempotent.

Dashboard panel

The dashboard's Operator overrides (last 7 days) panel renders rows from GET /api/admin/overrides?days=7 — sortable by Kind, Operator, target repo. The same endpoint returns a perKind rollup that the dashboard surfaces as a per-week metric: count of overrides per kind per week.

If coi-cleared (or any single kind) crosses 2× / week, treat it as a P1 indicator that the upstream predicate is broken — file a meta/halt-classes investigation.

Acceptance

  • [x] OperatorOverride event type added to ConductorE.Core.Domain.Events with Kind, Target, Reason, OperatorDiscordId, OperatorGitHubLogin.
  • [x] OperatorOverrideParser (pure) parses the Discord grammar; rationale is mandatory.
  • [x] OperatorOverrideService performs the underlying action via IOperatorOverrideActions and emits the audit event on success only.
  • [x] OperatorOverrideProjection + OperatorOverrideRecord read model.
  • [x] HTTP endpoint POST /api/admin/overrides accepts both raw command and structured shapes.
  • [x] HTTP endpoint GET /api/admin/overrides powers the dashboard panel + per-kind rollup.
  • [x] Dashboard panel "Operator overrides (last 7 days)" — sortable.
  • [x] Each command's success message includes the resulting event id.
  • [x] gh pr merge --admin still works (override channel is the documented preferred path).
  • [x] Server-side identity verification: OperatorOverrideAuthGuard checks the X-Operator-Discord-Id header against OperatorOverrides:AllowedDiscordIds, fail-closed when the allowlist is empty (rc#612 re-review #1).

Non-goals

  • Replacing branch protection with override-only flow — branch protection stays the default; overrides are bypasses.
  • Multi-operator quorum overrides — deferred to v2.
  • Replaying historical admin merges as overrides on startup — the projection resumes from the current event sequence.

Refs

  • 2026-04-30 incident: rc#608 chicken-and-egg merge required operator admin-merge.
  • rar#181 — CoI false-positive root cause that motivated unblock-coi.
  • rc#610 — re-review on COMMENTED-with-abstention that motivated review-vote.
  • rc#609 — halt detection that surfaces stuck issues for redispatch.