Provider & Runtime Portability — Not Just Anthropic¶
TL;DR
The rig is vendor-neutral at four layers: coordination (Conductor-E events are provider-agnostic), gateway (LiteLLM routes to 100+ providers), instrumentation (OpenTelemetry GenAI semantic conventions), and instructions (AGENTS.md is a cross-tool standard read by Claude Code, Codex CLI, Gemini CLI, Cursor, and Aider). Claude Code + Anthropic is the default runtime for Dev-E / Review-E / Spec-E because it's what the team ships today, not because the architecture requires it. Any provider with an OpenAI-compatible API or a LiteLLM adapter works.
Why this document exists
Earlier drafts of the whitepaper referenced Claude, Anthropic, and Sonnet / Opus / Haiku as if they were the only option. They are not. The underlying architecture is multi-vendor by design — but the prose hid it. This doc corrects that framing and makes the portability story explicit. If you read "Claude Sonnet 4.6" anywhere else in the whitepaper, read it as "the configured default model, currently Sonnet 4.6" — not as a required pick.
The four portability layers¶
Each layer has been chosen specifically so no single vendor is load-bearing:
graph TB
classDef layer fill:#e3f2fd,stroke:#1565c0,color:#000
classDef vendor fill:#fff3e0,stroke:#e65100,color:#000
classDef defense fill:#e8f5e9,stroke:#2e7d32,color:#000
L1[1. Coordination layer<br/>Conductor-E events + projections]:::layer
L2[2. Gateway layer<br/>LiteLLM proxy]:::layer
L3[3. Instrumentation layer<br/>OpenTelemetry GenAI conventions]:::layer
L4[4. Instruction layer<br/>AGENTS.md cross-tool standard]:::layer
V1[Anthropic<br/>Claude Opus / Sonnet / Haiku]:::vendor
V2[OpenAI<br/>GPT-5.2 / mini / o3]:::vendor
V3[Google<br/>Gemini 3.1 Pro / Flash]:::vendor
V4[Local<br/>Ollama: llama3.2 / custom]:::vendor
V5[Aggregator<br/>OpenRouter]:::vendor
D1[Events carry no vendor-specific shape]:::defense
D2[LiteLLM virtual keys route per-model]:::defense
D3[OTel traces portable across backends]:::defense
D4[AGENTS.md works in Claude Code / Codex CLI / Gemini CLI / Cursor / Aider]:::defense
L1 --> D1
L2 --> D2
L3 --> D3
L4 --> D4
D2 --> V1
D2 --> V2
D2 --> V3
D2 --> V4
D2 --> V5
Layer 1 — Coordination (Conductor-E events)¶
The event record types in ConductorE.Core.Domain.Events — IssueAssigned, WorkStarted, PrCreated, ReviewPassed, GuardBlocked, TokenUsage, and the rest — carry no provider-specific field. TokenUsage has model and provider as plain strings ("anthropic/claude-sonnet-4-6", "openai/gpt-5.2", "google/gemini-3.1-pro", "ollama/llama3.2"). Projections compute per-provider totals without changing schema.
Layer 2 — Gateway (LiteLLM)¶
LiteLLM is the single-process proxy in front of every agent. It speaks the OpenAI API wire format and translates to 100+ backends. From the agent's point of view, calling https://llm-proxy.rig.svc/v1/chat/completions looks like talking to OpenAI — the proxy decides where the request actually goes based on the virtual-key config.
Concrete LiteLLM config for a multi-vendor rig:
# litellm-config.yaml (managed by Flux)
model_list:
- model_name: sonnet-4-6
litellm_params:
model: anthropic/claude-sonnet-4-6
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: opus-4-7
litellm_params:
model: anthropic/claude-opus-4-7
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: haiku-4-5
litellm_params:
model: anthropic/claude-haiku-4-5
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gpt-5-2
litellm_params:
model: openai/gpt-5.2
api_key: os.environ/OPENAI_API_KEY
- model_name: gemini-3-1-pro
litellm_params:
model: gemini/gemini-3.1-pro
api_key: os.environ/GEMINI_API_KEY
- model_name: llama-local
litellm_params:
model: ollama/llama3.2
api_base: http://ollama.rig.svc:11434
virtual_keys:
# Primary path
- key_alias: dev-e-primary
models: [sonnet-4-6, opus-4-7, haiku-4-5]
fallback_models: [gpt-5-2, gemini-3-1-pro]
max_budget: 20.00
budget_duration: 1d
# Alternative agent class — default to OpenAI
- key_alias: dev-e-gpt
models: [gpt-5-2]
fallback_models: [sonnet-4-6]
max_budget: 20.00
budget_duration: 1d
# Local-first for cost-sensitive tasks
- key_alias: spec-e-local
models: [llama-local]
fallback_models: [haiku-4-5]
max_budget: 3.00
budget_duration: 1d
The fallback_models list is the resilience story: if the primary provider 429s (rate limit) or 529s (overloaded), LiteLLM automatically retries on the next model. Agent code never handles it.
Layer 3 — Instrumentation (OpenTelemetry GenAI conventions)¶
This is the single highest-leverage portability defense (tool-choices.md called it out). Instrument agent code with the OTel GenAI semantic conventions (gen_ai.system, gen_ai.request.model, gen_ai.usage.*) rather than vendor-specific SDK hooks. Spans emitted this way land in any OTel-compatible backend (Langfuse, Phoenix, Grafana Cloud, Helicone, Datadog, New Relic) without re-instrumenting.
Effect: switching observability backend is "point the OTel exporter at a new URL" — not "rewrite every trace call." Switching LLM provider is "change gen_ai.system from anthropic to openai" — the dashboard keeps working because the query is keyed on gen_ai.request.model, not the provider brand.
Layer 4 — Instructions (AGENTS.md cross-tool)¶
The AGENTS.md file in each repo is readable by:
| Runtime | Reads AGENTS.md? | Hooks system |
|---|---|---|
| Claude Code (Anthropic) | Yes (via CLAUDE.md import) | PreToolUse, PostToolUse, Stop, SessionStart, UserPromptSubmit |
| Codex CLI (OpenAI) | Yes (native AGENTS.md support) |
.codex/hooks.json with PreToolUse/PostToolUse/SessionStart/Stop/UserPromptSubmit |
| Gemini CLI (Google) | Yes (via GEMINI.md or AGENTS.md import) |
Native hooks system, similar contract |
| Cursor (Anysphere) | Yes (Cursor Rules can pull from AGENTS.md) | Cursor Rules + MCP |
| Aider | Yes (explicit read of AGENTS.md) | — (pair-mode single-shot, hooks less relevant) |
This is the reason AGENTS.md is in the whitepaper, not CLAUDE.md: the instruction contract is cross-runtime by design. Humans and agents read the same rules regardless of which tool the human is using that day.
Supported providers (shipping today)¶
| Provider | Models | Best fit | Caveats |
|---|---|---|---|
| Anthropic | Opus 4.7 / Sonnet 4.6 / Haiku 4.5 | Default for Dev-E, Review-E, Architect-E, Spec-E | Shared Max plan across agents — see cost-framework.md for rate-limit-aware dispatch |
| OpenAI | GPT-5.2 / GPT-5-mini / o3 | Secondary / fallback; strong on tool-use reliability | Separate pay-per-token account; different rate-limit model |
| Gemini 3.1 Pro / Flash | Secondary; competitive on SWE-bench Pro (80.6% vs Sonnet 79.6%); large context window | Tool-use API differs subtly from Anthropic's | |
| Ollama (local) | llama3.2, llama3.1, custom quantized (our ibuild-e:3b) |
Spec-E clarifiers, pre-flight cost prediction, low-sensitivity classification | Quality ceiling lower; only feasible when task doesn't need large context |
| OpenRouter | Aggregates all of the above | Emergency fallback if a primary account is blocked | 5.5% markup on credit purchases — not cheapest steady-state |
Supported agent runtimes¶
Runtime is orthogonal to provider. Any supported runtime can talk to any supported provider via LiteLLM.
| Runtime | Vendor tie | Use case in the rig |
|---|---|---|
| Claude Code CLI | Anthropic (API only, models via LiteLLM) | Default for Dev-E, Review-E, Spec-E, Architect-E, repair-dispatch |
| Codex CLI | OpenAI (API only, models via LiteLLM) | Alternative Dev-E flavor, hooks are similar to Claude Code |
| Gemini CLI | Google (API only, models via LiteLLM) | Alternative for Architect-E (large context strengths) |
| Cursor | Anysphere (IDE-integrated) | Human pair-mode primary; agents via Cursor Cloud |
| Aider | None | Human pair-mode alternative, model-agnostic |
| Custom via Anthropic/OpenAI SDK | — | For bespoke agents where we control the loop |
The AGENTS.md + OpenTelemetry conventions combo means a single TaskSpec can dispatch to any runtime without the TaskSpec itself knowing which is used.
Per-agent-per-task-class model selection¶
Default mapping (configurable via HelmRelease values or human override in pair mode):
| Agent role | Default model | Alternative | Why |
|---|---|---|---|
| Spec-E (intake refinement) | Haiku 4.5 | Gemini Flash, llama3.2 local | Many small calls, cost-sensitive |
| Dev-E (issue-dispatch) | Sonnet 4.6 | GPT-5.2, Gemini 3.1 Pro | Default balance of capability and cost |
| Dev-E (repair-dispatch) | Sonnet 4.6 | Opus 4.7 for complex diagnosis | Same agent, sometimes needs bigger model for ambiguous traces |
| Review-E | Sonnet 4.6 | Opus 4.7 on T2/T3 PRs | Judgment sensitivity |
| Architect-E | Opus 4.7 | Gemini 3.1 Pro (long-context strengths) | High-stakes interface decisions |
| LLM-as-judge (quality sampling) | Opus 4.7 on Sonnet output | GPT-5.2 judging Claude (cross-family check) | Avoid one-provider confirmation bias |
None of these are hard-coded. Swap a row, redeploy the HelmRelease, and the agent shifts provider on the next scale-up.
Vendor rate-limit handling¶
Every major provider has a rate-limit story, none identical:
| Provider | Behaviors |
|---|---|
| Anthropic | Three simultaneous limits (5h rolling, 7d weekly, TPM/RPM). Dashboard shows one. 429 at 72% "reported" util is common |
| OpenAI | TPM + RPM per model, retry-after header reliable |
| RPM + TPD, less predictable 429s during launches | |
| Ollama local | CPU/GPU bound, not rate-limited but slow |
The LiteLLM proxy handles them uniformly: any 429/529 → fallback. Per-provider budget envelopes in Conductor-E projections make cost attribution visible: "Dev-E's Anthropic spend is tracking 80% of budget but OpenAI fallback is at 12% — maybe we're overusing primary."
The lock-in assessment, honestly¶
From tool-choices.md, refined for the vendor angle:
- Anthropic specifically: HIGH lock-in. Claude prompts are not portable without rewriting; Anthropic's tool-use format differs from OpenAI's, which differs from Gemini's. Even with OTel conventions keeping traces portable, the prompt layer is vendor-specific. If we deliberately wrote prompts for Sonnet, they won't match OpenAI behavior out of the box.
- OpenAI lock-in: MEDIUM. GPT-style structured outputs, function-calling schemas, and prompt conventions are different enough that ports need testing. Less total lock-in than Anthropic because LiteLLM + OpenAI-format is the lingua franca.
- Google lock-in: MEDIUM-LOW for now. Small footprint in the rig. Gemini CLI hooks are less mature than Claude Code's.
- Local (Ollama) lock-in: NONE. The model is a file.
Net: the architecture is portable; the prompts are the sticky layer. Every agent's system prompt is versioned in git (per tool-choices.md and drift-detection.md), which means provider migration is a mechanical task: re-author system prompts against the target provider, A/B the eval suite against both, flip the HelmRelease model_name.
What this means for the whitepaper¶
Everywhere the whitepaper says "Claude Sonnet 4.6", read "the currently-configured default, Sonnet 4.6, swappable". Everywhere it says "Anthropic's API", read "the LLM provider (Anthropic by default, but the gateway is LiteLLM)".
Specific places that previously implied Anthropic-only:
- cost-framework.md — Anthropic-heavy examples; the LiteLLM config above shows the multi-vendor shape
- observability.md — Claude Code native OTel highlighted; Codex CLI and Gemini CLI have the same, the OTel conventions make it irrelevant
- safety.md — CaMeL separation applies to any LLM; examples used Anthropic-specific CVEs because that's where the attacks happened, but the defense pattern is provider-agnostic
- security.md — egress allowlist includes
api.anthropic.com; extend withapi.openai.com,generativelanguage.googleapis.com, local Ollama when used - development-process.md — team topology lists "Sonnet-backed" as a tag, not a requirement
Open questions for multi-vendor operation¶
Things the whitepaper doesn't yet resolve:
- Prompt portability testing cadence. When should we dual-run prompts across providers to check behavior parity? Today the eval harness only runs one model per agent config. A "cross-provider regression" step is not in the weekly cadence yet.
- CaMeL structured-output equivalence. Anthropic tool-use via Instructor is well-tested. OpenAI's structured outputs work differently. Gemini's differ again. The quarantine-plane extraction pattern is the same, but the per-provider polyfills aren't written.
- Prompt caching economics. Anthropic's cache hits are ~10% of input token cost. OpenAI's prompt caching (beta) has different discount curves. Gemini's context caching is separate again. The cost framework should route toward the provider with the cheapest cache economics for the hot path.
- Tool-use call reliability per provider. Claude Sonnet 4.6 is well-tested on the rig's specific tool set. GPT-5.2 and Gemini 3.1 Pro would need their own reliability baseline before we could hand them T2 work.
Each of these is an ADR candidate once we have operating data with more than one provider.
See also¶
- index.md — whitepaper master
- tool-choices.md — LiteLLM vs. alternatives
- tool-choices.md — Langfuse / Phoenix / etc, all OTel-compatible
- cost-framework.md — per-agent-per-model budget enforcement
- observability.md — OpenTelemetry GenAI conventions
- safety.md — CaMeL, vendor-neutral prompt-injection defense
- limitations.md — Anthropic listed as HIGH lock-in in the honest summary