Operator override channel¶
Why¶
Today operators occasionally have to break a chicken-and-egg halt by running ad-hoc
GitHub commands directly — e.g. gh pr merge --admin — which bypasses branch
protection without leaving a structured trace. That's wrong on two axes:
- The override path is undiscoverable — the operator has to guess which flag to use.
- The override leaves no audit event in the conductor projection, so "how often does the rig need a human" isn't measurable.
The override channel is a deliberate operator-facing surface with an explicit
grammar and a typed OperatorOverride event for every bypass.
Design principles¶
- Overrides are first-class. A merge bypass without an event is a bug, not a feature.
- Operators are namespaced. Discord identity → conductor projection field. We always record who overrode.
- Rationale is mandatory.
reason="..."is required for every command. Empty rationale = command rejected at parse time. - Frequency is a signal. If
/override unblock-coifires more than 2× / week, that's a P1 indicator that the CoI predicate is broken upstream.
Grammar¶
Discord #admin (gated by Discord role) — or POST /api/admin/overrides:
| Command | Action | Event payload |
|---|---|---|
/override merge <repo>#<pr> reason="..." |
Admin-merge a PR despite gate failures | OperatorOverride { Kind: Merge, Target: pr, Reason, Operator, MergeSha } |
/override review-vote <repo>#<pr> approve\|request-changes reason="..." |
Cast a binding review on the operator's behalf when the agent abstained (rc#610) | OperatorOverride { Kind: ReviewVote, Decision, ... } |
/override redispatch <repo>#<n> reason="..." |
Re-emit ISSUE_ASSIGNED on a stuck issue (label-toggle) |
OperatorOverride { Kind: Redispatch, ... } |
/override unblock-coi <repo>#<pr> reason="..." |
Apply coi-cleared label so Review-E re-dispatches with explicit clearance (rar#181) |
OperatorOverride { Kind: CoiCleared, ... } |
/override close-zombie <repo>#<pr> reason="..." |
Close a stale PR with rationale | OperatorOverride { Kind: CloseZombie, ... } |
All commands:
- Require an explicit
reason="..."(single or double quotes). - Resolve
Operatorfrom Discord user id → GitHub login when known. - Append
OperatorOverrideto the repo-scoped sentinel stream{repo}!overrides. - Mirror the event onto the issue or PR stream so the per-issue Discord thread reflects the override.
- Return a Discord ack (and HTTP
eventId) so the operator has a traceable handle.
HTTP API¶
Authorization (rc#612 re-review #1)¶
POST /api/admin/overrides is identity-gated server-side, not by the upstream
proxy. The endpoint requires:
X-Operator-Discord-Idheader — non-empty. The body cannot supply this value; the server only trusts the transport-supplied header.- The header value must be on the configured allowlist:
OperatorOverrides:AllowedDiscordIds(k8sSecret/ConfigMap, never committed). Empty allowlist = endpoint disabled (fail-closed 403).
Failure modes:
| Status | Meaning |
|---|---|
401 |
X-Operator-Discord-Id header missing or whitespace |
403 |
Allowlist empty (override channel disabled) OR header value not on allowlist |
The recorded OperatorDiscordId on the emitted OperatorOverride event is
always the verified header value — this is the audit-trail invariant that the
per-operator frequency metric relies on.
Request¶
Two equivalent input shapes — pick whichever your transport prefers:
POST /api/admin/overrides
X-Operator-Discord-Id: 1234567890
X-Operator-GitHub-Login: stig.johnny
Content-Type: application/json
{ "command": "/override merge dashecorp/rig-conductor#42 reason=\"break halt rc#608\"" }
POST /api/admin/overrides
X-Operator-Discord-Id: 1234567890
Content-Type: application/json
{
"kind": "Merge",
"repo": "dashecorp/rig-conductor",
"target": 42,
"reason": "break halt rc#608"
}
Both return:
{
"ok": true,
"kind": "Merge",
"eventId": "dashecorp/rig-conductor:42:Merge:638509...",
"ack": "✅ admin-merged `dashecorp/rig-conductor#42` by **stig.johnny** — sha `abc1234`. Reason: _break halt rc#608_. Event: `…`"
}
On failure (ok: false) the event is never emitted — a missing event always
means the action did not happen. This is by design: operators should be able to
trust the projection as the source of truth.
Discoverability¶
Operators discover the channel via:
- BRAIN.md → "Operator overrides" section pointer.
- docs/operator-overrides.md (this file) — listed in the docs index.
- Discord /help override (slash-command auto-help) when running through Discord.
Event schema¶
public enum OperatorOverrideKind { Merge, ReviewVote, Redispatch, CoiCleared, CloseZombie }
public record OperatorOverride(
OperatorOverrideKind Kind,
string Repo,
int Target, // PR # for most, issue # for Redispatch
string Reason, // mandatory, non-empty
string OperatorDiscordId, // canonical authority
string? OperatorGitHubLogin, // resolved when known
string? Decision = null, // "approve"|"request-changes" — ReviewVote only
string? MergeSha = null, // set on Kind=Merge after success
DateTimeOffset At = default
);
Stream: {repo}!overrides (repo-scoped sentinel — mirrors {repo}!ci).
Projection: OperatorOverrideRecord keyed on {Repo}:{Target}:{Kind}:{ticks}
so replays are idempotent.
Dashboard panel¶
The dashboard's Operator overrides (last 7 days) panel renders rows from
GET /api/admin/overrides?days=7 — sortable by Kind, Operator, target repo.
The same endpoint returns a perKind rollup that the dashboard surfaces as a
per-week metric: count of overrides per kind per week.
If coi-cleared (or any single kind) crosses 2× / week, treat it as a P1
indicator that the upstream predicate is broken — file a meta/halt-classes
investigation.
Acceptance¶
- [x]
OperatorOverrideevent type added toConductorE.Core.Domain.EventswithKind,Target,Reason,OperatorDiscordId,OperatorGitHubLogin. - [x]
OperatorOverrideParser(pure) parses the Discord grammar; rationale is mandatory. - [x]
OperatorOverrideServiceperforms the underlying action viaIOperatorOverrideActionsand emits the audit event on success only. - [x]
OperatorOverrideProjection+OperatorOverrideRecordread model. - [x] HTTP endpoint
POST /api/admin/overridesaccepts both raw command and structured shapes. - [x] HTTP endpoint
GET /api/admin/overridespowers the dashboard panel + per-kind rollup. - [x] Dashboard panel "Operator overrides (last 7 days)" — sortable.
- [x] Each command's success message includes the resulting event id.
- [x]
gh pr merge --adminstill works (override channel is the documented preferred path). - [x] Server-side identity verification:
OperatorOverrideAuthGuardchecks theX-Operator-Discord-Idheader againstOperatorOverrides:AllowedDiscordIds, fail-closed when the allowlist is empty (rc#612 re-review #1).
Non-goals¶
- Replacing branch protection with override-only flow — branch protection stays the default; overrides are bypasses.
- Multi-operator quorum overrides — deferred to v2.
- Replaying historical admin merges as overrides on startup — the projection resumes from the current event sequence.
Refs¶
- 2026-04-30 incident: rc#608 chicken-and-egg merge required operator admin-merge.
- rar#181 — CoI false-positive root cause that motivated
unblock-coi. - rc#610 — re-review on
COMMENTED-with-abstention that motivatedreview-vote. - rc#609 — halt detection that surfaces stuck issues for
redispatch.