Skip to content

Remediation Playbook

Remediation Playbook

Use this playbook to convert llm.* findings into concrete engineering work.

Fix Order (Recommended)

Safety-critical paths
Reliability and runtime controls
Governance and rollout controls
Cost and optimization

1) Safety-Critical Paths

PII Leakage

Watch: llm.pii_leakage_risk
Fix: Add PII classification/redaction and explicit prompt field allowlists.
Validate: llm.pii_leakage_risk == 0

Prompt Injection and Instruction Boundaries

Watch: llm.prompt_injection_surface, llm.instruction_boundary_violation
Fix: Separate system/user/tool channels; sanitize or delimit untrusted text.
Validate: both keys trend downward; confidence remains stable

Insecure Output Handling

Watch: llm.insecure_output_handling, llm.structured_output_enforcement_gap
Fix: Enforce strict schema validation before output reaches SQL, shell, UI, or workflow actions.
Validate: lower output-handling gaps and fewer critical findings

Sensitive Telemetry

Watch: llm.sensitive_info_in_telemetry, llm.system_prompt_leakage
Fix: Redact prompt/response payloads and never log sensitive instruction content.
Validate: both metrics trend toward 0

2) Reliability and Runtime Controls

Fallback, Rate Limits, Streaming, Idempotency

Watch: llm.fallback_absence, llm.rate_limit_absence, llm.streaming_risk, llm.cache_idempotency_gap
Fix: Add bounded retries with jitter, timeout budgets, fallback models, idempotency keys, and response caching.
Validate: resilience metrics trend down and incident retry/429 rates drop

Context and Consumption Controls

Watch: llm.context_budget_absence, llm.unbounded_consumption, llm.cost_tracking_gap
Fix: enforce token budgets, retrieval limits (top_k, windowing), request quotas, and cost attribution.
Validate: lower context/cost/consumption gaps

Observability

Watch: llm.observability_gap, llm.genai_otel_semconv_gap
Fix: add traces/logs around LLM calls with model, usage, latency, and status fields.
Validate: lower observability and OTel gaps, better triage speed

3) Governance and Rollout Controls

Eval and Regression Coverage

Watch: llm.eval_harness_absence, llm.eval_presence_score, llm.eval_quality_score
Fix: add golden datasets, adversarial and stochastic checks, and CI gating for critical flows.
Validate: absence decreases; quality sub-scores increase

Versioning and Prompt Governance

Watch: llm.model_version_unpinned, llm.prompt_hardcoding_score, llm.template_governance_gap
Fix: pin model versions; move prompts to versioned templates with ownership and changelog.
Validate: versioning/governance gaps decrease; prompt score increases

Model Rollout Safety

Watch: llm.model_rollout_guardrail_gap
Fix: implement shadow/canary rollout, objective eval gates, and automatic rollback triggers.
Validate: rollout guardrail gap trends toward 0

4) Tooling and Supply Chain Controls

MCP Authentication and Contracts

Watch: llm.mcp_authz_gap, llm.mcp_tool_contract_gap, additive MCP keys (llm.mcp_oauth21_gap, llm.mcp_pkce_gap, llm.mcp_tool_output_schema_gap, etc.)
Fix: require authN/authZ on MCP surfaces and strict runtime schema contracts for tool I/O.
Validate: aggregate and additive MCP gaps trend down

Supply Chain, Poisoning, and Retrieval Boundaries

Watch: llm.supply_chain_risk, llm.data_model_poisoning_exposure, llm.vector_embedding_weakness, llm.embedding_drift_risk
Fix: pin/verify dependencies, validate ingestion pipelines, enforce tenant metadata filters, and align embedding/index versions.
Validate: risk keys trend toward 0

Data-Quality Check Before Tightening Gates

If these diagnostics are degraded, prioritize restoring analysis fidelity first:

llm.blast_radius_available = 0
llm.pii_taint_used = 0
high llm.call_sites_unresolved_count

Version note: this page is aligned with metric version 1.2.0.