Remediation Playbook
Remediation Playbook
Section titled “Remediation Playbook”Use this playbook to convert findings into concrete fixes.
Loop Guards
Section titled “Loop Guards”- Symptom metric:
agent_architecture.loop_guard_absence - Typical cause: Agent loop/invoke paths without
max_steps, iteration cap, or timeout budget. - Minimal fix: Add step budget and wall-clock timeout in orchestration entrypoints and propagate through nested loops.
- Validation check:
loop_guard_absencedecreases andeffective_step_budget_ratioincreases.
Memory Bounds
Section titled “Memory Bounds”- Symptom metric:
agent_architecture.memory_unbounded - Typical cause: Conversation/state memory without TTL, size cap, summarization, or retention policy.
- Minimal fix: Set token/window limits, TTL/eviction for state stores, and periodic summarization for long threads.
- Validation check:
memory_unboundeddecreases; limit scores (context_memory_limits_score,tool_state_limits_score,long_term_memory_retention_score) increase.
Tool Policy and Schema Validation
Section titled “Tool Policy and Schema Validation”- Symptom metric:
agent_architecture.tool_policy_absence,agent_architecture.schema_validation_gap,agent_architecture.tool_result_validation_gap - Typical cause: Unscoped tools and untyped tool inputs/outputs.
- Minimal fix: Add allowlists/scope constraints and enforce input/output schemas (including error shapes).
- Validation check: Gaps decrease,
scoped_tool_ratioand schema coverage metrics increase.
Retry Storm and Fanout Controls
Section titled “Retry Storm and Fanout Controls”- Symptom metric:
agent_architecture.retry_storm_risk,agent_architecture.fanout_control_absence,agent_architecture.deadlock_risk - Typical cause: Nested retries without backoff/jitter and unconstrained parallel fanout.
- Minimal fix: Add exponential backoff + jitter, cap retries, add concurrency limiter, and explicit join/barrier patterns.
- Validation check: Retry and concurrency risk metrics decrease.
Observability and Eval Harness
Section titled “Observability and Eval Harness”- Symptom metric:
agent_architecture.agent_observability_gap,agent_architecture.agent_eval_absence - Typical cause: Missing step-level traces and missing regression/eval trajectories.
- Minimal fix: Emit trace/span IDs for each agent step; add golden trajectory tests and adversarial/stochastic eval runs.
- Validation check:
step_trace_completeness_score,trajectory_eval_coverage,adversarial_eval_present, andstochastic_runs_presentimprove.
Instruction Boundaries and State Isolation
Section titled “Instruction Boundaries and State Isolation”- Symptom metric:
agent_architecture.instruction_boundary_violation,agent_architecture.state_isolation_risk,agent_architecture.idempotency_gap - Typical cause: Mixed trust boundaries (system/user/tool outputs), shared mutable state across sessions, non-idempotent side-effect tools.
- Minimal fix: Separate prompt roles explicitly, scope state by session/user, and enforce idempotency keys for side-effectful tools.
- Validation check: Boundary/isolation/idempotency risk metrics decrease and
overall_agent_healthtrends upward.
MCP Governance and Tool Execution Safety
Section titled “MCP Governance and Tool Execution Safety”- Symptom metric:
agent_architecture.mcp_auth_gap,agent_architecture.mcp_oauth_resource_binding_gap,agent_architecture.mcp_tool_annotation_gap,agent_architecture.mcp_structured_output_gap,agent_architecture.tool_sandbox_enforcement_gap,agent_architecture.tool_approval_bypass_risk - Typical cause: MCP surfaces without auth/resource binding, incomplete tool annotations/structured outputs, or process-capable tools without sandbox and approval gates.
- Minimal fix: Add explicit auth + resource/audience binding for MCP auth flows, annotate MCP tools with safety metadata, enforce structured output contracts, sandbox process-capable tools, and gate high-risk actions behind explicit approval.
- Validation check: MCP and tool execution gap metrics decrease; approval/sandbox coverage signals increase.
Durable Execution
Section titled “Durable Execution”- Symptom metric:
agent_architecture.checkpoint_durability_gap,agent_architecture.interrupt_resume_contract_gap - Typical cause: Multi-step runs without persisted checkpoints or resumable interrupt contracts.
- Minimal fix: Add checkpoint write/read boundaries around long-running steps and define explicit interrupt/resume semantics for workflow state transitions.
- Validation check: Durability and interrupt/resume gaps decrease, improving reliability posture.
OTel GenAI Coverage and Trace Eval
Section titled “OTel GenAI Coverage and Trace Eval”- Symptom metric:
agent_architecture.otel_genai_semconv_gap,agent_architecture.otel_genai_event_coverage_gap,agent_architecture.trace_eval_regression_risk - Typical cause: Missing semantic convention fields/events and weak trace-quality regression checks.
- Minimal fix: Emit required OTel GenAI semantic attributes/events per step and add trace-eval checks to CI with pass/fail thresholds.
- Validation check: OTel gap metrics and trace eval regression risk decrease;
trace_eval_coveragerises.
A2A Protocol and Handoff Safety
Section titled “A2A Protocol and Handoff Safety”- Symptom metric:
agent_architecture.a2a_agent_card_gap,agent_architecture.a2a_task_state_machine_gap,agent_architecture.a2a_webhook_auth_gap,agent_architecture.handoff_input_filter_gap,agent_architecture.guardrail_hook_absence,agent_architecture.handoff_cycle_risk - Typical cause: Missing A2A contracts, unauthenticated handoff webhooks, or handoff paths without filtering/guardrails and cycle control.
- Minimal fix: Define agent cards and task lifecycle contracts, authenticate handoff webhooks, filter handoff inputs, enforce guardrail hooks, and cap/inspect handoff recursion paths.
- Validation check: A2A/handoff risk metrics decrease and severity distribution shifts away from High/Critical.