Remediation Playbook

Use this playbook to convert findings into concrete fixes.

Loop Guards

Symptom metric: agent_architecture.loop_guard_absence
Typical cause: Agent loop/invoke paths without max_steps, iteration cap, or timeout budget.
Minimal fix: Add step budget and wall-clock timeout in orchestration entrypoints and propagate through nested loops.
Validation check: loop_guard_absence decreases and effective_step_budget_ratio increases.

Symptom metric: agent_architecture.memory_unbounded
Typical cause: Conversation/state memory without TTL, size cap, summarization, or retention policy.
Minimal fix: Set token/window limits, TTL/eviction for state stores, and periodic summarization for long threads.
Validation check: memory_unbounded decreases; limit scores (context_memory_limits_score, tool_state_limits_score, long_term_memory_retention_score) increase.

Symptom metric: agent_architecture.tool_policy_absence, agent_architecture.schema_validation_gap, agent_architecture.tool_result_validation_gap
Typical cause: Unscoped tools and untyped tool inputs/outputs.
Minimal fix: Add allowlists/scope constraints and enforce input/output schemas (including error shapes).
Validation check: Gaps decrease, scoped_tool_ratio and schema coverage metrics increase.

Symptom metric: agent_architecture.retry_storm_risk, agent_architecture.fanout_control_absence, agent_architecture.deadlock_risk
Typical cause: Nested retries without backoff/jitter and unconstrained parallel fanout.
Minimal fix: Add exponential backoff + jitter, cap retries, add concurrency limiter, and explicit join/barrier patterns.
Validation check: Retry and concurrency risk metrics decrease.

Symptom metric: agent_architecture.agent_observability_gap, agent_architecture.agent_eval_absence
Typical cause: Missing step-level traces and missing regression/eval trajectories.
Minimal fix: Emit trace/span IDs for each agent step; add golden trajectory tests and adversarial/stochastic eval runs.
Validation check: step_trace_completeness_score, trajectory_eval_coverage, adversarial_eval_present, and stochastic_runs_present improve.

Symptom metric: agent_architecture.instruction_boundary_violation, agent_architecture.state_isolation_risk, agent_architecture.idempotency_gap
Typical cause: Mixed trust boundaries (system/user/tool outputs), shared mutable state across sessions, non-idempotent side-effect tools.
Minimal fix: Separate prompt roles explicitly, scope state by session/user, and enforce idempotency keys for side-effectful tools.
Validation check: Boundary/isolation/idempotency risk metrics decrease and overall_agent_health trends upward.

Symptom metric: agent_architecture.mcp_auth_gap, agent_architecture.mcp_oauth_resource_binding_gap, agent_architecture.mcp_tool_annotation_gap, agent_architecture.mcp_structured_output_gap, agent_architecture.tool_sandbox_enforcement_gap, agent_architecture.tool_approval_bypass_risk
Typical cause: MCP surfaces without auth/resource binding, incomplete tool annotations/structured outputs, or process-capable tools without sandbox and approval gates.
Minimal fix: Add explicit auth + resource/audience binding for MCP auth flows, annotate MCP tools with safety metadata, enforce structured output contracts, sandbox process-capable tools, and gate high-risk actions behind explicit approval.
Validation check: MCP and tool execution gap metrics decrease; approval/sandbox coverage signals increase.

Symptom metric: agent_architecture.checkpoint_durability_gap, agent_architecture.interrupt_resume_contract_gap
Typical cause: Multi-step runs without persisted checkpoints or resumable interrupt contracts.
Minimal fix: Add checkpoint write/read boundaries around long-running steps and define explicit interrupt/resume semantics for workflow state transitions.
Validation check: Durability and interrupt/resume gaps decrease, improving reliability posture.

Symptom metric: agent_architecture.otel_genai_semconv_gap, agent_architecture.otel_genai_event_coverage_gap, agent_architecture.trace_eval_regression_risk
Typical cause: Missing semantic convention fields/events and weak trace-quality regression checks.
Minimal fix: Emit required OTel GenAI semantic attributes/events per step and add trace-eval checks to CI with pass/fail thresholds.
Validation check: OTel gap metrics and trace eval regression risk decrease; trace_eval_coverage rises.

Symptom metric: agent_architecture.a2a_agent_card_gap, agent_architecture.a2a_task_state_machine_gap, agent_architecture.a2a_webhook_auth_gap, agent_architecture.handoff_input_filter_gap, agent_architecture.guardrail_hook_absence, agent_architecture.handoff_cycle_risk
Typical cause: Missing A2A contracts, unauthenticated handoff webhooks, or handoff paths without filtering/guardrails and cycle control.
Minimal fix: Define agent cards and task lifecycle contracts, authenticate handoff webhooks, filter handoff inputs, enforce guardrail hooks, and cap/inspect handoff recursion paths.
Validation check: A2A/handoff risk metrics decrease and severity distribution shifts away from High/Critical.