Agent Architecture
Agent Architecture
Section titled “Agent Architecture”The agent_architecture metric evaluates agent and orchestration architecture in codebases that use tool-calling agents and multi-step workflows.
Last verified against engine metric version 3.0.0.
Overview and Why It Matters
Section titled “Overview and Why It Matters”Agent systems fail in ways that are different from classic service code:
- Unbounded loops and retries can trigger runaway cost and unstable behavior.
- Weak tool governance can expose dangerous capabilities.
- Missing observability and eval harnesses make regressions invisible.
- Poor coordination and state handling can cause deadlocks or cross-session leaks.
This metric provides actionable scores and findings so teams can gate these risks in CI.
What It Measures
Section titled “What It Measures”Reliability
Section titled “Reliability”- Loop guard coverage and budget propagation
- Memory bounds and retention controls
- Retry/backoff behavior
Governance
Section titled “Governance”- Tool policy presence and scope controls
- Input/output schema validation
- Tool result validation
Observability and Eval
Section titled “Observability and Eval”- Step-level trace completeness
- Agent eval harness coverage
- Adversarial and stochastic run presence
Coordination and Concurrency
Section titled “Coordination and Concurrency”- Routing and planner/executor coordination risks
- Fanout/deadlock/callback depth risks
- Instruction boundary and state isolation risks
Composite Scores and Interpretation
Section titled “Composite Scores and Interpretation”| Metric | Range | Interpretation |
|---|---|---|
agent_architecture.agent_reliability_score | 0..100 | Higher is better. Reliability posture across loop guards, memory, retries, observability, eval. |
agent_architecture.governance_readiness | 0..100 | Higher is better. Governance posture across tool policy, schema validation, tool result validation. |
agent_architecture.overall_agent_health | 0..1 | Uses the weaker axis: min(agent_reliability_score/100, governance_readiness/100). |
Practical reading:
- High reliability + low governance: stable behavior but unsafe controls.
- Low reliability + high governance: safer controls but fragile operations.
- High both: healthy and production-ready posture.
Config Quick Reference
Section titled “Config Quick Reference”metrics: - id: agent_architecture enabled: true config: require_loop_guards: true require_tool_policy: true require_eval_harness: true languages: ["python", "typescript", "rust"] eval_path_patterns: ["tests/agents", "evals", "agent_specs"] governance_weights: tool_policy: 0.40 schema_validation: 0.35 tool_result_validation: 0.25 reliability_weights: loop_guard: 0.25 memory: 0.20 retry: 0.20 observability: 0.20 eval: 0.15Config Semantics
Section titled “Config Semantics”languages: extension-based filter over detected agent call sites. Empty means no filter.eval_path_patterns: extra directories to scan for eval/trace harnesses in addition to defaults.governance_weightsandreliability_weights: normalized if non-zero; if a provided group sums to0, defaults are restored.require_loop_guards,require_tool_policy,require_eval_harness: when enabled, severity is escalated toCriticalif the mapped risk score is> 0.2.
Policy Quick Start
Section titled “Policy Quick Start”metrics: - id: agent_architecture
policy: invariants: - metric: agent_architecture.loop_guard_absence op: "<=" value: 0.20 message: "Agent loops must have max-steps or timeout guards" - metric: agent_architecture.governance_readiness op: ">=" value: 80 message: "Tool governance and schema controls must be strong" - metric: agent_architecture.agent_reliability_score op: ">=" value: 75 message: "Agent reliability baseline not met"