Skip to content
Arxo Arxo

Agent Architecture

The agent_architecture metric evaluates agent and orchestration architecture in codebases that use tool-calling agents and multi-step workflows.

Last verified against engine metric version 3.0.0.

Agent systems fail in ways that are different from classic service code:

  • Unbounded loops and retries can trigger runaway cost and unstable behavior.
  • Weak tool governance can expose dangerous capabilities.
  • Missing observability and eval harnesses make regressions invisible.
  • Poor coordination and state handling can cause deadlocks or cross-session leaks.

This metric provides actionable scores and findings so teams can gate these risks in CI.

  • Loop guard coverage and budget propagation
  • Memory bounds and retention controls
  • Retry/backoff behavior
  • Tool policy presence and scope controls
  • Input/output schema validation
  • Tool result validation
  • Step-level trace completeness
  • Agent eval harness coverage
  • Adversarial and stochastic run presence
  • Routing and planner/executor coordination risks
  • Fanout/deadlock/callback depth risks
  • Instruction boundary and state isolation risks
MetricRangeInterpretation
agent_architecture.agent_reliability_score0..100Higher is better. Reliability posture across loop guards, memory, retries, observability, eval.
agent_architecture.governance_readiness0..100Higher is better. Governance posture across tool policy, schema validation, tool result validation.
agent_architecture.overall_agent_health0..1Uses the weaker axis: min(agent_reliability_score/100, governance_readiness/100).

Practical reading:

  • High reliability + low governance: stable behavior but unsafe controls.
  • Low reliability + high governance: safer controls but fragile operations.
  • High both: healthy and production-ready posture.
metrics:
- id: agent_architecture
enabled: true
config:
require_loop_guards: true
require_tool_policy: true
require_eval_harness: true
languages: ["python", "typescript", "rust"]
eval_path_patterns: ["tests/agents", "evals", "agent_specs"]
governance_weights:
tool_policy: 0.40
schema_validation: 0.35
tool_result_validation: 0.25
reliability_weights:
loop_guard: 0.25
memory: 0.20
retry: 0.20
observability: 0.20
eval: 0.15
  • languages: extension-based filter over detected agent call sites. Empty means no filter.
  • eval_path_patterns: extra directories to scan for eval/trace harnesses in addition to defaults.
  • governance_weights and reliability_weights: normalized if non-zero; if a provided group sums to 0, defaults are restored.
  • require_loop_guards, require_tool_policy, require_eval_harness: when enabled, severity is escalated to Critical if the mapped risk score is > 0.2.
metrics:
- id: agent_architecture
policy:
invariants:
- metric: agent_architecture.loop_guard_absence
op: "<="
value: 0.20
message: "Agent loops must have max-steps or timeout guards"
- metric: agent_architecture.governance_readiness
op: ">="
value: 80
message: "Tool governance and schema controls must be strong"
- metric: agent_architecture.agent_reliability_score
op: ">="
value: 75
message: "Agent reliability baseline not met"