Patterns & Principles¶
Architecture Patterns¶
1. Clean Architecture¶
Dependencies point inward only. The Core project has zero external dependencies — no Marten, no ASP.NET, no framework code.
| Layer | Project | Depends On |
|---|---|---|
| Domain | ConductorE.Core/Domain/ |
Nothing |
| Ports | ConductorE.Core/Ports/ |
Domain |
| Use Cases | ConductorE.Core/UseCases/ |
Domain + Ports |
| Adapters | ConductorE.Api/Adapters/ |
Ports + Marten |
| Controllers | ConductorE.Api/Program.cs |
Use Cases + Ports |
If we swap PostgreSQL for another store, only the adapters change. Core stays untouched.
2. Event Sourcing¶
Every action emits an immutable event. Events are the source of truth — current state is derived from replaying events.
- Append-only: events are never modified or deleted
- Inline projections: read models (IssueStatus, AgentStatus) update synchronously with event appends
- String streams:
repo#issueNumberfor issues,agentIdfor agents - Full audit trail: replay events to see exactly what happened, when, and by whom
We use Marten on PostgreSQL.
3. Ports & Adapters¶
Use cases depend on interfaces (ports), not implementations. Adapters implement ports and live in the outer layer.
This enables:
- Unit testing with
FakeEventStore(no database needed) - Swapping infrastructure without changing business logic
- Clear boundaries between domain and framework code
4. Adapter Pattern¶
Platform-specific concerns are abstracted behind unified interfaces. Applied in Rig Agent Runtime for multi-platform messaging:
Same agent code works on Discord or Slack — change messaging.platform in config, no code changes.
5. Configuration over Code¶
Agents are defined by character.json configuration, not by writing new code. Rig Agent Runtime is the shared runtime — one Docker image serves all agents.
{
"name": "rig-conductor",
"messaging": { "platform": "discord" },
"llm": { "model": "claude-haiku-4-5-20251001" },
"tools": [...]
}
New agent = new config file + optional backend API. No code changes to the runtime.
Design Principles¶
1. SOLID¶
| Principle | Application |
|---|---|
| Single Responsibility | Each use case does one thing. SubmitEvent maps and appends — nothing else. |
| Open/Closed | Add new event types by adding records to Domain, no existing code modified. |
| Liskov Substitution | FakeEventStore substitutes MartenEventStore in tests. |
| Interface Segregation | IEventStore, IIssueQuery, IAgentQuery — three focused interfaces, not one mega-interface. |
| Dependency Inversion | Use cases depend on IEventStore (abstraction), not MartenEventStore (implementation). |
2. YAGNI¶
Build only what's needed now. Don't add abstractions, features, or configurability for hypothetical future requirements.
- Three lines of similar code is better than a premature abstraction
- No feature flags or backwards-compatibility shims
- If it's not in the current issue, it doesn't go in the PR
3. TDD + DDD (hard rule)¶
Test-first and policy-first. Operator-set 2026-05-18.
For every behavior PR:
- Identify the domain invariant the change enforces ("review-e dispatches require an explicit prNumber"; "phantom IssueStatus rows have at least one event and none are IssueApproved"; etc.).
- Write the pure Core policy at
src/ConductorE.Core/<Area>/Policies/<Name>Policy.csorsrc/ConductorE.Core/Domain/<Name>.cs— plain functions over plain inputs, no DI, no I/O, clock injected. - Write the policy unit tests FIRST in
tests/ConductorE.Core.Tests— covering the contract and edge cases. They will fail (red). - Implement the policy to make the tests pass (green).
- Then write the Api adapter (thin I/O shell calling the policy) and the e2e test that exercises the adapter end-to-end via
ConductorEApiFactoryor equivalent.
Test layers:
- Unit (Core) — pure domain logic, no infrastructure. Fast (<100ms).
- Projection-contract — events through
POST /api/events→ assert on the materialised projection shape. SeeIssueStatusProjectionContractTests(rc#1080). Pins read-model contracts watchers and dashboards rely on. - Adapter unit — Api-side services with stubbed ports.
- E2e — full webhook → projection → watcher path via
ConductorEApiFactorytestcontainer.
PR body must state explicitly:
- "Policy in Core, tests written first" — for behavior PRs.
- "No-behavior refactor, no new tests needed" — for renames, mechanical refactors, or dependency bumps.
Run dotnet test before every push. CI runs both projects on every PR.
Past evidence the rule pays off: rc#1046 (review-e codex-crash watcher), rc#1071 (review-e-spurious-pr widening), rc#1075 (phantom-cleanup refactor), rc#1080 (projection-contract test layer) all shipped policy-first with the pure unit tests written before any production code. Skipping the discipline cost a 30-min production outage on 2026-05-18 (rar#456 prNumber shadow crash) when I shipped code-first then tests-after.
4. Separation of Concerns¶
The agent that produces a thing cannot approve that thing. This is structural, not cultural.
| Agent | Can Do | Cannot Do |
|---|---|---|
| Dev-E | Write code, create PRs | Approve its own PRs |
| Review-E | Review code, approve/reject | Write implementation code |
| rig-conductor | Assign work, escalate | Write code or review code |
Operational Rules¶
1. Fix Forward¶
When production breaks, fix forward. Never auto-rollback.
Production breaks → Agent attempts fix → If failed, reassign
→ If failed again → Escalate to CTO
→ CTO decides: fix forward or rollback (human decision only)
Rollback is never automatic. That's always a CTO decision.
2. Two Strikes Then Human¶
If an agent fails the same issue twice (two different attempts), escalate to human. No third automatic attempt.
| Strike | Action |
|---|---|
| 1st failure | AGENT_STUCK → reassign to different agent, fresh branch |
| 2nd failure | ESCALATED → post to Discord #admin, wait for human |
3. Diagram-First¶
Create C4 diagrams before coding complex systems. If the diagram is complex, the code will be complex — simplify the diagram first.
| Level | When to Create |
|---|---|
| L1 Context | Before starting a new system |
| L2 Containers | Before adding services or databases |
| L3 Components | Before refactoring internal architecture |
| L4 Flow | Before implementing complex interactions |
Use PlantUML for C4 diagrams. Mermaid for everything else (flows, state machines, timelines).
4. Event-Driven Coordination¶
Agents communicate through events, not direct calls. The event store is the shared nervous system.
Dev-E emits WORK_STARTED → Event Store → rig-conductor reads → assigns next
Dev-E emits PR_CREATED → Event Store → rig-conductor reads → monitors review
Review-E approves → GitHub → rig-conductor reads → auto-merge
No agent calls another agent directly. All coordination flows through events.
5. Three-path reconciler recovery¶
ReconciliationService runs every 5 min and runs three independent recovery paths after each main reconciliation tick. Each path detects a different stall pattern, applies a 30-min throttle to avoid thundering-herd, and emits RE_REVIEW_REQUESTED with a path-specific reason before re-publishing to signal:review-e.
| Path | Reason | Detects | Tracking |
|---|---|---|---|
| Abstention | prior_review_was_abstention |
PR in in_review whose only review-e review is COMMENTED (no binding verdict) |
rc#608 / rc#610 |
| Timeout | prior_review_timed_out |
PR with ≥2 ReviewFailed(reason="timeout") events — review-e CLI completed but never posted a review |
rc#765 |
| Quota recovery | prior_provider_quota_recovered |
PR in state=failed where the last AgentStuck.Reason matches a quota-saturation signature (codex 429, claude rate-cap) AND the agent's QuotaFiveHourPct has dropped below 80% |
rc#944 / PR #1094 |
Pure policies in ConductorE.Core/Domain/ decide the recovery; the service is the thin I/O shell that walks streams, queries agents, and emits events.
Defensive guards shared across all three:
- COI guard — review-e-authored PRs are never re-dispatched to review-e (GitHub 422).
- Throttle — same 30-min window via AbstainedReviewReDispatchThrottle.
- Idempotency key — RE_REVIEW_REQUESTED:<repo>#<issue>:<pr>:<path-discriminator>:<minute> so a same-tick duplicate is deduped at the event-store layer.
See docs/2026-05-18-quota-recovery-reconciliation.md for the quota-recovery path details.
Quota-aware dispatch (proactive)¶
ReconciliationService's quota recovery (above) is reactive — it salvages stalled PRs after one provider saturates. QuotaAwareReviewRouter is the proactive twin — at each review dispatch, pick the candidate (review-e vs review-e-codex) with the most quota headroom before the assignment lands on a stream.
Pure policy in ConductorE.Core/UseCases/QuotaAwareReviewRouter.cs. Thin adapter ReviewDispatchRouter.SelectAsync(IAgentQuery) used by every review-dispatch site (webhook + reconciler scan). Falls back to review-e when no candidate is alive + non-saturated, preserving the legacy default. See docs/2026-05-18-quota-aware-review-dispatch.md.
6. Stream-side reclamation¶
ReconciliationService (§5) handles state-level recovery — issues stuck in known bad states. StreamReclaimService is the sibling transport-level recovery — Redis-streams entries stuck in a consumer's PEL because the consumer pod went silent (crashed, OOMd, hung in a slow CLI) without XACKing.
Runs every 60 s. For each known agent's assignments:<agentId> stream:
| Step | Behavior |
|---|---|
XPENDING |
List pending entries across all consumers in the agents group. |
| Per-entry policy | StreamReclaimPolicy.ShouldReclaim — reclaim iff entry idle > 5 min AND the assigned consumer's agent has no heartbeat within 10 min. |
| Target selection | StreamReclaimPolicy.PickReclaimTarget — pick a consumer in the same group whose agent has a fresh heartbeat; prefer freshest among healthy candidates. |
XCLAIM |
Force-move the entry to the target consumer. The target's next XREADGROUP picks it up. |
Pairs with a detector watcher: StreamConsumerWithoutHeartbeatWatcher in the rc#947 SelfImprovementService framework. The detector files gap-analysis issues; this service takes action. Both stay active so a regression in either surface remains visible. Tracking: rc#959.
See docs/2026-05-18-stream-side-reaper.md for the full design + tuning knobs.