Quota-aware review dispatch¶
Tracking: rc#942.
The gap¶
The 2026-05-15 planner-driven dashe-website audit dispatched 9 review tasks in a burst. Both review-e pods (review-e claude + review-e-codex) were registered, but the conductor's dispatch sites hard-coded the agent id to review-e, sending every review to the claude stream. With KEDA balancing pod count by stream length, the codex pod scaled down — meaningless given no work flowed to its stream. The claude pod burned its ChatGPT Team 5h primary quota at PR ~6 and the remaining reviews stranded until quota reset.
AgentQuotaReported already projected quotaFiveHourPct and quotaWeeklyPct per agent (post-rar#404 wire fix). The data was visible; nothing in the dispatch path consulted it.
What changed¶
A pure Core policy plus a thin Api adapter, used at every review-dispatch site.
Core policy: QuotaAwareReviewRouter¶
src/ConductorE.Core/UseCases/QuotaAwareReviewRouter.cs.
public static string? SelectReviewer(
IReadOnlyList<AgentStatus> allAgents,
DateTimeOffset? now = null);
Returns the agent id (review-e or review-e-codex) to dispatch to, or null when no candidate is eligible. Decision order:
| Step | Behavior |
|---|---|
| Filter | Keep only candidates with LastHeartbeat ≥ now - HeartbeatWindow (2 min) and !CodexQuotaGuard.IsExhausted (any non-null quota field ≥ 100%). |
| Zero eligible | Return null — caller handles. |
| One eligible | Return that id. |
| Multiple eligible | OrderBy(QuotaFiveHourPct ?? 0).ThenBy(QuotaWeeklyPct ?? 0) — pick the most headroom. Stable order so ties resolve to review-e (preserves the legacy default). |
Null quota fields count as 0% — same inert-by-design rule as CodexQuotaGuard. The policy is safe to ship before codex CLI surfaces real values: in the all-null world it picks review-e, matching pre-rc#942 behavior.
Api adapter: ReviewDispatchRouter¶
src/ConductorE.Api/Services/ReviewDispatchRouter.cs.
Reads all agents via the port, delegates to the pure policy, and returns the chosen id — or falls back to QuotaAwareReviewRouter.ClaudeReviewerId (review-e) when the policy returns null. The fallback preserves the legacy behavior so a saturated-or-offline reviewer fleet does not silently drop dispatches.
Dispatch sites converted¶
Four sites previously hard-coded "review-e":
| Site | Trigger |
|---|---|
Program.cs:1079 |
review_requested webhook |
Program.cs:1442 |
pull_request opened / ready_for_review / synchronize webhook |
ReviewScanService.cs:182 |
State-based reconciler dispatch for in_review PRs |
ReviewScanService.cs:317 |
GitHub-API scan for PRs the webhook missed |
Each site now resolves IAgentQuery, calls ReviewDispatchRouter.SelectAsync, and threads the selected id through both PublishAssignmentAsync and the REVIEW_ASSIGNED event emission (plus dashboard + console logging). The execution-log exclusivity check (CanDispatchAsync) is also moved to test against the selected id — preserves the "don't dispatch to an agent that's still chewing on this issue" semantic when the selected reviewer flips between siblings on different bursts.
Test plan¶
| Tier | Coverage |
|---|---|
| Pure unit | 18 QuotaAwareReviewRouterTests covering: both-null-quota → claude, headroom-prefers-codex / claude, saturated-excluded, weekly-exhaustion-counts, heartbeat-liveness, both-offline → null, only-claude / only-codex registered, missing-from-list, clock injection. |
| Adapter unit | 5 ReviewDispatchRouterTests with FakeAgentQuery: policy returns codex/claude verbatim, policy null → claude fallback, empty list → claude fallback, port queried exactly once. |
| E2e | Implicit — the four converted dispatch sites have existing webhook + reconciler integration tests that exercise the full flow. The only behavior change at each site is selectedReviewer = await ReviewDispatchRouter.SelectAsync(agentQuery) replacing the literal "review-e". With no agents in the test stubs the policy returns null and the adapter falls back to review-e, so existing tests pass unchanged. A dedicated webhook→Redis e2e for the "codex picked because of more headroom" path is the marginal next slice; for this PR the policy + adapter tests cover the selection logic and the existing integration tests cover the dispatch I/O. Per the TDD/DDD hard rule: skipping the dedicated full-stack e2e here is documented and load-bearing only for the codex-selection path, not for any new I/O surface. |
Run locally:
dotnet test tests/ConductorE.Core.Tests --filter QuotaAwareReviewRouter
dotnet test tests/ConductorE.Api.Tests --filter ReviewDispatchRouter
Out of scope (follow-ups)¶
ProviderExhaustedemission for the all-saturated case. When the policy returnsnullthe adapter falls back toreview-esilently. A follow-up should emitPROVIDER_EXHAUSTED(the existing event type, seeCodexQuotaGuard.Evaluate) so the operator dashboard surfaces the saturation event. Currently visible only via dispatch-skip log lines.- KEDA per-provider scaling tuning. Per-provider streams already exist (
assignments:review-e+assignments:review-e-codex), but the KEDA scaler config in rig-gitops was tuned for the pre-rc#942 always-route-to-claude world. With routing now data-driven, KEDA may need different per-stream thresholds to avoid one pod oscillating to 0 while the other is hot. - Dev-E quota-aware dispatch. Same shape applies to
dev-e-{stack}vsdev-e-{stack}-codex. Tracked separately —TierCodexRouteris the closest existing analogue and may want consolidation withQuotaAwareReviewRouterinto a generic two-sibling selector. - Re-review path consolidation.
ReconciliationService.ReReviewDispatchAsync(rc#608/610/944) is another dispatch surface that emits review work — not yet converted. The recovery paths historically usedagentId = "review-e"as the discriminator for "any reviewer"; converting these is a one-line change once the discriminator semantics are unified.
Pairs with¶
TierCodexRouter(rc#773) — sibling pattern for dev-e tier-first routing.CodexQuotaGuard(rc#767) — sharesIsExhausteddefinition and the inert-by-design null-quota rule.FallbackDispatcher.HeartbeatWindow— shared liveness constant across all dispatch guards.