Skip to content

Drift Detection — Model, Prompt, Code, Config

TL;DR

Systems drift in four independent channels: model (provider silent-changes under same version string), prompt (agent config changes break existing behavior), code (deployed ≠ git main), and config (manifests ≠ gitops source). Each channel needs its own detector, its own baseline, its own alert.

91% of production LLMs drift within 90 days

InsightFinder's 2025 study — median detection lag without monitoring is 14–18 days. "It still says Sonnet 4.6 in the config but the behavior changed" is the single most common silent regression class. We watch for it explicitly via a 20-prompt canary suite.

The four channels

Channel What drifts Detection signal
Model drift Same model string, different behavior from the provider 20-prompt canary suite output-hash delta
Prompt drift Agent system prompt changed, didn't realize it broke something Golden-suite regression in CI
Code drift Deployed code vs. main branch Flux reconciliation + hash comparison
Config drift Deployed manifests vs. gitops source Flux + kube-diff

Each channel needs its own detector, its own baseline, its own alert.

Channel 1: Model drift

The mechanism

Model providers (Anthropic, OpenAI, Google) have all shipped silent behavioral changes under stable version strings. Concrete examples:

  • Anthropic Sonnet 4.6 behavior changed in early 2026 following a rate-limit-fix deploy — reasoning quality dip reported across multiple communities; Claude Instant and Claude 2 variants shifted similarly earlier
  • OpenAI has repeatedly tweaked GPT-4 / GPT-5 series under the same API version names; community-reported regressions follow each silent update
  • Google Gemini behavior has shifted under gemini-3.1-pro version strings between minor releases

The vendor API returns the pinned model string. The behavior shifts. Nothing in our config changes. Outputs regress. This is vendor-neutral: the 20-prompt canary suite runs per configured provider (via LiteLLM virtual-key routing — see provider-portability.md) and catches silent changes wherever they occur.

The detection

A 20-prompt canary suite runs nightly. Prompts chosen to cover:

  • Deterministic structure tests (e.g., "rate these 3 Python refactors by readability")
  • Refusal behavior ("I need you to help me delete production data")
  • Reasoning-heavy tasks (well-defined multi-step problems)
  • Tool-use tasks (call a known tool with known args)
  • Edge-case prompts (empty input, ambiguous input)

Outputs are hashed and compared to the previous week's hash. Four signals:

  1. Output-hash delta rate — >30% of prompts produce different output vs. prior week
  2. Embedding drift — cosine distance between old and new output embeddings (via a fixed sentence-encoder)
  3. LLM-as-judge comparison — a bigger or cross-family model (default: Opus 4.7 on Sonnet output; GPT-5.2 on Claude output is the cross-family variant — see provider-portability.md) scores the current output against the baseline
  4. Refusal rate shift — unexpected change in refusal behavior on edge-case prompts

Any of these > threshold → ModelDriftDetected event. Severity: P2.

Response

  1. Pause tier promotions — no agent gains autonomy during drift investigation
  2. Run the full eval suite (nightly harness + property tests) to quantify impact
  3. Compare affected task classes vs. others
  4. If widespread regression: rollback agents to a pinned prior model (via LiteLLM proxy routing to a specific model_version if provider supports; otherwise pin via API version header) or fail over to a cross-vendor alternative via fallback_models (see provider-portability.md)
  5. If localized to specific tasks: adjust prompts or switch model for affected classes
  6. Post-mortem: what changed, what caught it, what didn't

Limits

  • Most providers (Anthropic, OpenAI, Google) do not reliably version-pin behavior — using an older model-string doesn't guarantee prior behavior
  • Some drift is invisible to the canary suite (rare edge cases)
  • The canary suite itself is a frozen snapshot; if prompts stop being representative, they stop catching relevant drift

Channel 2: Prompt drift

The mechanism

Agent system prompts evolve. A well-intentioned prompt tweak to fix behavior X breaks behavior Y. Without a golden suite, the regression isn't noticed until a user reports it.

The detection

Every change to an agent prompt triggers a CI job:

  1. Load the new prompt
  2. Replay the golden suite (20 tasks, each with expected outcome)
  3. Compare new-prompt results to old-prompt results from the baseline run
  4. Fail the PR if any task regresses

Braintrust's pattern (production-trace-to-eval-case) adapted: weekly, scan Langfuse for traces where Review-E flagged poor quality, suggest as candidate golden-suite additions, human approves.

What counts as a regression

  • Task went from passing to failing
  • Task passing but latency > 2× baseline
  • Task passing but token count > 2× baseline
  • LLM-as-judge confidence drops > 20 points

Where the golden suite lives

dashecorp/rig-gitops/evals/golden/ — YAML per task with: - Task description (natural language) - Input context - Expected output shape - Grading rubric - Baseline results per model

Versioned in git. Changes to the golden suite are themselves reviewed (meta-evaluation).

Channel 3: Code drift

The mechanism

Flux is the source of truth for what runs in the cluster. But reality can diverge:

  • A human manually applies a kubectl edit
  • A rolled-back deploy leaves kubectl rollout undo state
  • An agent's hot-fix lands via an emergency path (shouldn't exist; self-healing.md mandates no fast path, but defense-in-depth)
  • A malicious actor mutates a running resource via compromised credentials

Anything that makes "what is running" diverge from "what git says should be running" is code drift.

The detection

Flux's kustomize-controller already reconciles. The drift signal:

  • kustomize_controller_drift_total{resource="...", namespace="..."} — resources modified in-cluster since last reconcile
  • Frequency of drift events per resource
  • Resources repeatedly drifting — deliberate human edits bypassing GitOps

Enhanced signal: a scheduled job compares kubectl get output against the git-expected state per namespace, hashes, and alerts if hashes diverge. Catches drift that Flux's own reconciliation doesn't surface clearly (namespaces it doesn't manage, cluster-level objects).

Response

  • Normal drift (small, infrequent): Flux reconciles, event logged, no alert
  • Repeat drift on the same resource (same resource drifts >3× in 24h): P3 alert, human review — someone is patching in cluster, why?
  • Drift in a T3 namespace (auth, payments): P1 alert — possible compromise
  • Drift in RBAC or NetworkPolicy resources: P0 alert — security-critical

Channel 4: Config drift

The mechanism

Related to code drift but specific: configuration resources (ConfigMap, feature-flag files, Kyverno policies) diverging between deployed state and gitops source.

The detection

Same mechanism as code drift but with separate severity thresholds. Feature flag drift specifically: flagd reports its active flag state via an HTTP endpoint; a scheduled job compares to the YAML in dashecorp/rig-gitops/feature-flags/. Any delta is P2 — possible runtime override that needs syncing back to git or rejecting.

Kyverno policy drift

Changes to Kyverno policies are T3-tier actions. A drifted policy in cluster vs. git is a potential security regression. Dedicated detector:

  • kyverno_policy_hash_mismatch{policy="..."} — hash of applied policy vs. git-expected hash
  • Alert severity: P0 for T3 policies, P1 for others

The drift dashboard

Grafana dashboard showing:

  • Nightly output-hash delta % (model drift, line chart)
  • Weekly golden-suite regression count (prompt drift, bar)
  • Daily Flux drift event count (code + config drift, stacked by namespace)
  • Feature-flag drift events (count, 7d)
  • Kyverno policy drift events (count, 7d)

Color-coded thresholds. Alerts firing in the last 24h highlighted.

Drift as part of model upgrades

When we upgrade a model (Sonnet 4.6 → 4.7), drift is expected:

  1. Before the upgrade, run the canary suite on both old and new model; save as side-by-side baseline
  2. After upgrade, canary suite compares against the new-model baseline
  3. Autonomy tiers reset — all agents drop to conservative tiers and re-earn (principle 6)
  4. Run the full nightly eval suite for 14 days before promoting agents
  5. Golden suite updated to include the new model's baselines

Model upgrades are T2 changes — interface review required.

Distinguishing drift from intended change

The detector doesn't know "intended" from "accidental." Every change to agent prompts must:

  1. Go through PR review (Review-E + human for T2/T3)
  2. Update the golden-suite baseline explicitly in the same PR
  3. Run the regression-test CI job

If the golden suite update isn't in the PR, the prompt change is rejected at merge (missing baseline update).

Integration with other metrics

Drift signals feed into:

  • Autonomy tiers: drift pauses promotions
  • Budget: drift investigation has a dedicated budget allocation
  • Escalation routing: drift severity maps directly to routing tiers

Drift is not an isolated system. It's one of the top-level health signals of the rig.

Attack surface: drift as injection channel

A compromised model provider could ship behavioral changes targeting our specific prompts. Drift detection is one defense; it's not specific to provider-side attacks, but it catches them.

The rigorous defense is model sandboxing: route sensitive inferences through multiple providers (Claude + Gemini, say) and compare outputs. We don't do this by default — cost + complexity — but the escalation exists: "if we suspect provider compromise, fail over to alternate."

For T3 actions, this is worth considering: require two-provider agreement before admission.

Rollback from drift

If model drift is severe enough to warrant rollback:

  • LiteLLM's model_list supports version aliasing if the provider exposes versioned endpoints
  • Most major providers (Anthropic, OpenAI, Google) keep model strings stable; specific-snapshot-pinning is not always available, and behavior can still drift under the same string
  • Fallback options (all via LiteLLM config change — no agent code change required, see provider-portability.md): (a) swap fallback_models to a cross-vendor alternative; (b) suspend agents, use API paygo with an older SDK snapshot, wait for provider fix; (c) route high-risk calls to two providers and diff outputs

Recorded in: dashecorp/rig-gitops/runbooks/model-drift-response.md.

The meta-drift: drift detection itself drifting

The canary suite can go stale. Edge cases caught a year ago may no longer be edge cases. The golden suite can rot — tasks become obsolete, prompts become irrelevant.

Meta-maintenance:

  • Quarterly review of the canary suite: are these 20 prompts still probing the right behaviors?
  • Monthly review of the golden suite: drop obsolete tasks, add ones from recent incidents
  • Annual review of the drift detection thresholds: do they still fire at the right rate?

This is a human responsibility, not an agent's.

What drift doesn't catch

  • Drift in our production services (not the rig itself) — separate SLO monitoring covers that
  • Gradual behavioral changes that cross no hash boundary — mitigated by embedding-drift and LLM-as-judge signals
  • Drift in third-party tools (GitHub API, npm registry) — monitored by vendor status pages and error-rate alerts
  • Drift in our own dependencies — Dependabot, Socket.dev, SBOM scans

See also