Drift Detection — Model, Prompt, Code, Config¶
TL;DR
Systems drift in four independent channels: model (provider silent-changes under same version string), prompt (agent config changes break existing behavior), code (deployed ≠ git main), and config (manifests ≠ gitops source). Each channel needs its own detector, its own baseline, its own alert.
91% of production LLMs drift within 90 days
InsightFinder's 2025 study — median detection lag without monitoring is 14–18 days. "It still says Sonnet 4.6 in the config but the behavior changed" is the single most common silent regression class. We watch for it explicitly via a 20-prompt canary suite.
The four channels¶
| Channel | What drifts | Detection signal |
|---|---|---|
| Model drift | Same model string, different behavior from the provider | 20-prompt canary suite output-hash delta |
| Prompt drift | Agent system prompt changed, didn't realize it broke something | Golden-suite regression in CI |
| Code drift | Deployed code vs. main branch | Flux reconciliation + hash comparison |
| Config drift | Deployed manifests vs. gitops source | Flux + kube-diff |
Each channel needs its own detector, its own baseline, its own alert.
Channel 1: Model drift¶
The mechanism¶
Model providers (Anthropic, OpenAI, Google) have all shipped silent behavioral changes under stable version strings. Concrete examples:
- Anthropic Sonnet 4.6 behavior changed in early 2026 following a rate-limit-fix deploy — reasoning quality dip reported across multiple communities; Claude Instant and Claude 2 variants shifted similarly earlier
- OpenAI has repeatedly tweaked GPT-4 / GPT-5 series under the same API version names; community-reported regressions follow each silent update
- Google Gemini behavior has shifted under
gemini-3.1-proversion strings between minor releases
The vendor API returns the pinned model string. The behavior shifts. Nothing in our config changes. Outputs regress. This is vendor-neutral: the 20-prompt canary suite runs per configured provider (via LiteLLM virtual-key routing — see provider-portability.md) and catches silent changes wherever they occur.
The detection¶
A 20-prompt canary suite runs nightly. Prompts chosen to cover:
- Deterministic structure tests (e.g., "rate these 3 Python refactors by readability")
- Refusal behavior ("I need you to help me delete production data")
- Reasoning-heavy tasks (well-defined multi-step problems)
- Tool-use tasks (call a known tool with known args)
- Edge-case prompts (empty input, ambiguous input)
Outputs are hashed and compared to the previous week's hash. Four signals:
- Output-hash delta rate — >30% of prompts produce different output vs. prior week
- Embedding drift — cosine distance between old and new output embeddings (via a fixed sentence-encoder)
- LLM-as-judge comparison — a bigger or cross-family model (default: Opus 4.7 on Sonnet output; GPT-5.2 on Claude output is the cross-family variant — see provider-portability.md) scores the current output against the baseline
- Refusal rate shift — unexpected change in refusal behavior on edge-case prompts
Any of these > threshold → ModelDriftDetected event. Severity: P2.
Response¶
- Pause tier promotions — no agent gains autonomy during drift investigation
- Run the full eval suite (nightly harness + property tests) to quantify impact
- Compare affected task classes vs. others
- If widespread regression: rollback agents to a pinned prior model (via LiteLLM proxy routing to a specific
model_versionif provider supports; otherwise pin via API version header) or fail over to a cross-vendor alternative viafallback_models(see provider-portability.md) - If localized to specific tasks: adjust prompts or switch model for affected classes
- Post-mortem: what changed, what caught it, what didn't
Limits¶
- Most providers (Anthropic, OpenAI, Google) do not reliably version-pin behavior — using an older model-string doesn't guarantee prior behavior
- Some drift is invisible to the canary suite (rare edge cases)
- The canary suite itself is a frozen snapshot; if prompts stop being representative, they stop catching relevant drift
Channel 2: Prompt drift¶
The mechanism¶
Agent system prompts evolve. A well-intentioned prompt tweak to fix behavior X breaks behavior Y. Without a golden suite, the regression isn't noticed until a user reports it.
The detection¶
Every change to an agent prompt triggers a CI job:
- Load the new prompt
- Replay the golden suite (20 tasks, each with expected outcome)
- Compare new-prompt results to old-prompt results from the baseline run
- Fail the PR if any task regresses
Braintrust's pattern (production-trace-to-eval-case) adapted: weekly, scan Langfuse for traces where Review-E flagged poor quality, suggest as candidate golden-suite additions, human approves.
What counts as a regression¶
- Task went from passing to failing
- Task passing but latency > 2× baseline
- Task passing but token count > 2× baseline
- LLM-as-judge confidence drops > 20 points
Where the golden suite lives¶
dashecorp/rig-gitops/evals/golden/ — YAML per task with:
- Task description (natural language)
- Input context
- Expected output shape
- Grading rubric
- Baseline results per model
Versioned in git. Changes to the golden suite are themselves reviewed (meta-evaluation).
Channel 3: Code drift¶
The mechanism¶
Flux is the source of truth for what runs in the cluster. But reality can diverge:
- A human manually applies a
kubectl edit - A rolled-back deploy leaves
kubectl rollout undostate - An agent's hot-fix lands via an emergency path (shouldn't exist; self-healing.md mandates no fast path, but defense-in-depth)
- A malicious actor mutates a running resource via compromised credentials
Anything that makes "what is running" diverge from "what git says should be running" is code drift.
The detection¶
Flux's kustomize-controller already reconciles. The drift signal:
kustomize_controller_drift_total{resource="...", namespace="..."}— resources modified in-cluster since last reconcile- Frequency of drift events per resource
- Resources repeatedly drifting — deliberate human edits bypassing GitOps
Enhanced signal: a scheduled job compares kubectl get output against the git-expected state per namespace, hashes, and alerts if hashes diverge. Catches drift that Flux's own reconciliation doesn't surface clearly (namespaces it doesn't manage, cluster-level objects).
Response¶
- Normal drift (small, infrequent): Flux reconciles, event logged, no alert
- Repeat drift on the same resource (same resource drifts >3× in 24h): P3 alert, human review — someone is patching in cluster, why?
- Drift in a T3 namespace (auth, payments): P1 alert — possible compromise
- Drift in RBAC or NetworkPolicy resources: P0 alert — security-critical
Channel 4: Config drift¶
The mechanism¶
Related to code drift but specific: configuration resources (ConfigMap, feature-flag files, Kyverno policies) diverging between deployed state and gitops source.
The detection¶
Same mechanism as code drift but with separate severity thresholds. Feature flag drift specifically: flagd reports its active flag state via an HTTP endpoint; a scheduled job compares to the YAML in dashecorp/rig-gitops/feature-flags/. Any delta is P2 — possible runtime override that needs syncing back to git or rejecting.
Kyverno policy drift¶
Changes to Kyverno policies are T3-tier actions. A drifted policy in cluster vs. git is a potential security regression. Dedicated detector:
kyverno_policy_hash_mismatch{policy="..."}— hash of applied policy vs. git-expected hash- Alert severity: P0 for T3 policies, P1 for others
The drift dashboard¶
Grafana dashboard showing:
- Nightly output-hash delta % (model drift, line chart)
- Weekly golden-suite regression count (prompt drift, bar)
- Daily Flux drift event count (code + config drift, stacked by namespace)
- Feature-flag drift events (count, 7d)
- Kyverno policy drift events (count, 7d)
Color-coded thresholds. Alerts firing in the last 24h highlighted.
Drift as part of model upgrades¶
When we upgrade a model (Sonnet 4.6 → 4.7), drift is expected:
- Before the upgrade, run the canary suite on both old and new model; save as side-by-side baseline
- After upgrade, canary suite compares against the new-model baseline
- Autonomy tiers reset — all agents drop to conservative tiers and re-earn (principle 6)
- Run the full nightly eval suite for 14 days before promoting agents
- Golden suite updated to include the new model's baselines
Model upgrades are T2 changes — interface review required.
Distinguishing drift from intended change¶
The detector doesn't know "intended" from "accidental." Every change to agent prompts must:
- Go through PR review (Review-E + human for T2/T3)
- Update the golden-suite baseline explicitly in the same PR
- Run the regression-test CI job
If the golden suite update isn't in the PR, the prompt change is rejected at merge (missing baseline update).
Integration with other metrics¶
Drift signals feed into:
- Autonomy tiers: drift pauses promotions
- Budget: drift investigation has a dedicated budget allocation
- Escalation routing: drift severity maps directly to routing tiers
Drift is not an isolated system. It's one of the top-level health signals of the rig.
Attack surface: drift as injection channel¶
A compromised model provider could ship behavioral changes targeting our specific prompts. Drift detection is one defense; it's not specific to provider-side attacks, but it catches them.
The rigorous defense is model sandboxing: route sensitive inferences through multiple providers (Claude + Gemini, say) and compare outputs. We don't do this by default — cost + complexity — but the escalation exists: "if we suspect provider compromise, fail over to alternate."
For T3 actions, this is worth considering: require two-provider agreement before admission.
Rollback from drift¶
If model drift is severe enough to warrant rollback:
- LiteLLM's
model_listsupports version aliasing if the provider exposes versioned endpoints - Most major providers (Anthropic, OpenAI, Google) keep model strings stable; specific-snapshot-pinning is not always available, and behavior can still drift under the same string
- Fallback options (all via LiteLLM config change — no agent code change required, see provider-portability.md): (a) swap
fallback_modelsto a cross-vendor alternative; (b) suspend agents, use API paygo with an older SDK snapshot, wait for provider fix; (c) route high-risk calls to two providers and diff outputs
Recorded in: dashecorp/rig-gitops/runbooks/model-drift-response.md.
The meta-drift: drift detection itself drifting¶
The canary suite can go stale. Edge cases caught a year ago may no longer be edge cases. The golden suite can rot — tasks become obsolete, prompts become irrelevant.
Meta-maintenance:
- Quarterly review of the canary suite: are these 20 prompts still probing the right behaviors?
- Monthly review of the golden suite: drop obsolete tasks, add ones from recent incidents
- Annual review of the drift detection thresholds: do they still fire at the right rate?
This is a human responsibility, not an agent's.
What drift doesn't catch¶
- Drift in our production services (not the rig itself) — separate SLO monitoring covers that
- Gradual behavioral changes that cross no hash boundary — mitigated by embedding-drift and LLM-as-judge signals
- Drift in third-party tools (GitHub API, npm registry) — monitored by vendor status pages and error-rate alerts
- Drift in our own dependencies — Dependabot, Socket.dev, SBOM scans
See also¶
- index.md
- principles.md — principle 1 (measurable) and principle 9 (fail closed)
- observability.md — where drift signals surface
- quality-and-evaluation.md — golden suite mechanics
- self-healing.md — rollback machinery used when drift is severe