Skip to content

Patterns & Principles

Architecture Patterns

1. Clean Architecture

Dependencies point inward only. The Core project has zero external dependencies — no Marten, no ASP.NET, no framework code.

Layer Project Depends On
Domain ConductorE.Core/Domain/ Nothing
Ports ConductorE.Core/Ports/ Domain
Use Cases ConductorE.Core/UseCases/ Domain + Ports
Adapters ConductorE.Api/Adapters/ Ports + Marten
Controllers ConductorE.Api/Program.cs Use Cases + Ports

If we swap PostgreSQL for another store, only the adapters change. Core stays untouched.

2. Event Sourcing

Every action emits an immutable event. Events are the source of truth — current state is derived from replaying events.

  • Append-only: events are never modified or deleted
  • Inline projections: read models (IssueStatus, AgentStatus) update synchronously with event appends
  • String streams: repo#issueNumber for issues, agentId for agents
  • Full audit trail: replay events to see exactly what happened, when, and by whom

We use Marten on PostgreSQL.

3. Ports & Adapters

Use cases depend on interfaces (ports), not implementations. Adapters implement ports and live in the outer layer.

Use Case → IEventStore (port) → MartenEventStore (adapter) → PostgreSQL

This enables:

  • Unit testing with FakeEventStore (no database needed)
  • Swapping infrastructure without changing business logic
  • Clear boundaries between domain and framework code

4. Adapter Pattern

Platform-specific concerns are abstracted behind unified interfaces. Applied in Rig Agent Runtime for multi-platform messaging:

Discord Adapter ─┐
                  ├─→ Message Handler (platform-agnostic) → Agent Loop
Slack Adapter ───┘

Same agent code works on Discord or Slack — change messaging.platform in config, no code changes.

5. Configuration over Code

Agents are defined by character.json configuration, not by writing new code. Rig Agent Runtime is the shared runtime — one Docker image serves all agents.

{
  "name": "rig-conductor",
  "messaging": { "platform": "discord" },
  "llm": { "model": "claude-haiku-4-5-20251001" },
  "tools": [...]
}

New agent = new config file + optional backend API. No code changes to the runtime.

Design Principles

1. SOLID

Principle Application
Single Responsibility Each use case does one thing. SubmitEvent maps and appends — nothing else.
Open/Closed Add new event types by adding records to Domain, no existing code modified.
Liskov Substitution FakeEventStore substitutes MartenEventStore in tests.
Interface Segregation IEventStore, IIssueQuery, IAgentQuery — three focused interfaces, not one mega-interface.
Dependency Inversion Use cases depend on IEventStore (abstraction), not MartenEventStore (implementation).

2. YAGNI

Build only what's needed now. Don't add abstractions, features, or configurability for hypothetical future requirements.

  • Three lines of similar code is better than a premature abstraction
  • No feature flags or backwards-compatibility shims
  • If it's not in the current issue, it doesn't go in the PR

3. TDD + DDD (hard rule)

Test-first and policy-first. Operator-set 2026-05-18.

For every behavior PR:

  1. Identify the domain invariant the change enforces ("review-e dispatches require an explicit prNumber"; "phantom IssueStatus rows have at least one event and none are IssueApproved"; etc.).
  2. Write the pure Core policy at src/ConductorE.Core/<Area>/Policies/<Name>Policy.cs or src/ConductorE.Core/Domain/<Name>.cs — plain functions over plain inputs, no DI, no I/O, clock injected.
  3. Write the policy unit tests FIRST in tests/ConductorE.Core.Tests — covering the contract and edge cases. They will fail (red).
  4. Implement the policy to make the tests pass (green).
  5. Then write the Api adapter (thin I/O shell calling the policy) and the e2e test that exercises the adapter end-to-end via ConductorEApiFactory or equivalent.

Test layers:

  • Unit (Core) — pure domain logic, no infrastructure. Fast (<100ms).
  • Projection-contract — events through POST /api/events → assert on the materialised projection shape. See IssueStatusProjectionContractTests (rc#1080). Pins read-model contracts watchers and dashboards rely on.
  • Adapter unit — Api-side services with stubbed ports.
  • E2e — full webhook → projection → watcher path via ConductorEApiFactory testcontainer.

PR body must state explicitly:

  • "Policy in Core, tests written first" — for behavior PRs.
  • "No-behavior refactor, no new tests needed" — for renames, mechanical refactors, or dependency bumps.

Run dotnet test before every push. CI runs both projects on every PR.

Past evidence the rule pays off: rc#1046 (review-e codex-crash watcher), rc#1071 (review-e-spurious-pr widening), rc#1075 (phantom-cleanup refactor), rc#1080 (projection-contract test layer) all shipped policy-first with the pure unit tests written before any production code. Skipping the discipline cost a 30-min production outage on 2026-05-18 (rar#456 prNumber shadow crash) when I shipped code-first then tests-after.

4. Separation of Concerns

The agent that produces a thing cannot approve that thing. This is structural, not cultural.

Agent Can Do Cannot Do
Dev-E Write code, create PRs Approve its own PRs
Review-E Review code, approve/reject Write implementation code
rig-conductor Assign work, escalate Write code or review code

Operational Rules

1. Fix Forward

When production breaks, fix forward. Never auto-rollback.

Production breaks → Agent attempts fix → If failed, reassign
  → If failed again → Escalate to CTO
  → CTO decides: fix forward or rollback (human decision only)

Rollback is never automatic. That's always a CTO decision.

2. Two Strikes Then Human

If an agent fails the same issue twice (two different attempts), escalate to human. No third automatic attempt.

Strike Action
1st failure AGENT_STUCK → reassign to different agent, fresh branch
2nd failure ESCALATED → post to Discord #admin, wait for human

3. Diagram-First

Create C4 diagrams before coding complex systems. If the diagram is complex, the code will be complex — simplify the diagram first.

Level When to Create
L1 Context Before starting a new system
L2 Containers Before adding services or databases
L3 Components Before refactoring internal architecture
L4 Flow Before implementing complex interactions

Use PlantUML for C4 diagrams. Mermaid for everything else (flows, state machines, timelines).

4. Event-Driven Coordination

Agents communicate through events, not direct calls. The event store is the shared nervous system.

Dev-E emits WORK_STARTED → Event Store → rig-conductor reads → assigns next
Dev-E emits PR_CREATED → Event Store → rig-conductor reads → monitors review
Review-E approves → GitHub → rig-conductor reads → auto-merge

No agent calls another agent directly. All coordination flows through events.

5. Three-path reconciler recovery

ReconciliationService runs every 5 min and runs three independent recovery paths after each main reconciliation tick. Each path detects a different stall pattern, applies a 30-min throttle to avoid thundering-herd, and emits RE_REVIEW_REQUESTED with a path-specific reason before re-publishing to signal:review-e.

Path Reason Detects Tracking
Abstention prior_review_was_abstention PR in in_review whose only review-e review is COMMENTED (no binding verdict) rc#608 / rc#610
Timeout prior_review_timed_out PR with ≥2 ReviewFailed(reason="timeout") events — review-e CLI completed but never posted a review rc#765
Quota recovery prior_provider_quota_recovered PR in state=failed where the last AgentStuck.Reason matches a quota-saturation signature (codex 429, claude rate-cap) AND the agent's QuotaFiveHourPct has dropped below 80% rc#944 / PR #1094

Pure policies in ConductorE.Core/Domain/ decide the recovery; the service is the thin I/O shell that walks streams, queries agents, and emits events.

Defensive guards shared across all three: - COI guard — review-e-authored PRs are never re-dispatched to review-e (GitHub 422). - Throttle — same 30-min window via AbstainedReviewReDispatchThrottle. - Idempotency keyRE_REVIEW_REQUESTED:<repo>#<issue>:<pr>:<path-discriminator>:<minute> so a same-tick duplicate is deduped at the event-store layer.

See docs/2026-05-18-quota-recovery-reconciliation.md for the quota-recovery path details.

Quota-aware dispatch (proactive)

ReconciliationService's quota recovery (above) is reactive — it salvages stalled PRs after one provider saturates. QuotaAwareReviewRouter is the proactive twin — at each review dispatch, pick the candidate (review-e vs review-e-codex) with the most quota headroom before the assignment lands on a stream.

Pure policy in ConductorE.Core/UseCases/QuotaAwareReviewRouter.cs. Thin adapter ReviewDispatchRouter.SelectAsync(IAgentQuery) used by every review-dispatch site (webhook + reconciler scan). Falls back to review-e when no candidate is alive + non-saturated, preserving the legacy default. See docs/2026-05-18-quota-aware-review-dispatch.md.

6. Stream-side reclamation

ReconciliationService (§5) handles state-level recovery — issues stuck in known bad states. StreamReclaimService is the sibling transport-level recovery — Redis-streams entries stuck in a consumer's PEL because the consumer pod went silent (crashed, OOMd, hung in a slow CLI) without XACKing.

Runs every 60 s. For each known agent's assignments:<agentId> stream:

Step Behavior
XPENDING List pending entries across all consumers in the agents group.
Per-entry policy StreamReclaimPolicy.ShouldReclaim — reclaim iff entry idle > 5 min AND the assigned consumer's agent has no heartbeat within 10 min.
Target selection StreamReclaimPolicy.PickReclaimTarget — pick a consumer in the same group whose agent has a fresh heartbeat; prefer freshest among healthy candidates.
XCLAIM Force-move the entry to the target consumer. The target's next XREADGROUP picks it up.

Pairs with a detector watcher: StreamConsumerWithoutHeartbeatWatcher in the rc#947 SelfImprovementService framework. The detector files gap-analysis issues; this service takes action. Both stay active so a regression in either surface remains visible. Tracking: rc#959.

See docs/2026-05-18-stream-side-reaper.md for the full design + tuning knobs.