Skip to content
Arxo Arxo

Remediation Playbook

Use this playbook to convert llm.* findings into concrete engineering work.

  1. Safety-critical paths
  2. Reliability and runtime controls
  3. Governance and rollout controls
  4. Cost and optimization
  • Watch: llm.pii_leakage_risk
  • Fix: Add PII classification/redaction and explicit prompt field allowlists.
  • Validate: llm.pii_leakage_risk == 0

Prompt Injection and Instruction Boundaries

Section titled “Prompt Injection and Instruction Boundaries”
  • Watch: llm.prompt_injection_surface, llm.instruction_boundary_violation
  • Fix: Separate system/user/tool channels; sanitize or delimit untrusted text.
  • Validate: both keys trend downward; confidence remains stable
  • Watch: llm.insecure_output_handling, llm.structured_output_enforcement_gap
  • Fix: Enforce strict schema validation before output reaches SQL, shell, UI, or workflow actions.
  • Validate: lower output-handling gaps and fewer critical findings
  • Watch: llm.sensitive_info_in_telemetry, llm.system_prompt_leakage
  • Fix: Redact prompt/response payloads and never log sensitive instruction content.
  • Validate: both metrics trend toward 0

Fallback, Rate Limits, Streaming, Idempotency

Section titled “Fallback, Rate Limits, Streaming, Idempotency”
  • Watch: llm.fallback_absence, llm.rate_limit_absence, llm.streaming_risk, llm.cache_idempotency_gap
  • Fix: Add bounded retries with jitter, timeout budgets, fallback models, idempotency keys, and response caching.
  • Validate: resilience metrics trend down and incident retry/429 rates drop
  • Watch: llm.context_budget_absence, llm.unbounded_consumption, llm.cost_tracking_gap
  • Fix: enforce token budgets, retrieval limits (top_k, windowing), request quotas, and cost attribution.
  • Validate: lower context/cost/consumption gaps
  • Watch: llm.observability_gap, llm.genai_otel_semconv_gap
  • Fix: add traces/logs around LLM calls with model, usage, latency, and status fields.
  • Validate: lower observability and OTel gaps, better triage speed
  • Watch: llm.eval_harness_absence, llm.eval_presence_score, llm.eval_quality_score
  • Fix: add golden datasets, adversarial and stochastic checks, and CI gating for critical flows.
  • Validate: absence decreases; quality sub-scores increase
  • Watch: llm.model_version_unpinned, llm.prompt_hardcoding_score, llm.template_governance_gap
  • Fix: pin model versions; move prompts to versioned templates with ownership and changelog.
  • Validate: versioning/governance gaps decrease; prompt score increases
  • Watch: llm.model_rollout_guardrail_gap
  • Fix: implement shadow/canary rollout, objective eval gates, and automatic rollback triggers.
  • Validate: rollout guardrail gap trends toward 0
  • Watch: llm.mcp_authz_gap, llm.mcp_tool_contract_gap, additive MCP keys (llm.mcp_oauth21_gap, llm.mcp_pkce_gap, llm.mcp_tool_output_schema_gap, etc.)
  • Fix: require authN/authZ on MCP surfaces and strict runtime schema contracts for tool I/O.
  • Validate: aggregate and additive MCP gaps trend down

Supply Chain, Poisoning, and Retrieval Boundaries

Section titled “Supply Chain, Poisoning, and Retrieval Boundaries”
  • Watch: llm.supply_chain_risk, llm.data_model_poisoning_exposure, llm.vector_embedding_weakness, llm.embedding_drift_risk
  • Fix: pin/verify dependencies, validate ingestion pipelines, enforce tenant metadata filters, and align embedding/index versions.
  • Validate: risk keys trend toward 0

Data-Quality Check Before Tightening Gates

Section titled “Data-Quality Check Before Tightening Gates”

If these diagnostics are degraded, prioritize restoring analysis fidelity first:

  • llm.blast_radius_available = 0
  • llm.pii_taint_used = 0
  • high llm.call_sites_unresolved_count

Version note: this page is aligned with metric version 1.2.0.