Agent Architecture

The agent_architecture metric evaluates agent and orchestration architecture in codebases that use tool-calling agents and multi-step workflows.

Last verified against engine metric version 3.0.0.

Overview and Why It Matters

Agent systems fail in ways that are different from classic service code:

Unbounded loops and retries can trigger runaway cost and unstable behavior.
Weak tool governance can expose dangerous capabilities.
Missing observability and eval harnesses make regressions invisible.
Poor coordination and state handling can cause deadlocks or cross-session leaks.

This metric provides actionable scores and findings so teams can gate these risks in CI.

What It Measures

Reliability

Loop guard coverage and budget propagation
Memory bounds and retention controls
Retry/backoff behavior

Governance

Tool policy presence and scope controls
Input/output schema validation
Tool result validation

Observability and Eval

Step-level trace completeness
Agent eval harness coverage
Adversarial and stochastic run presence

Coordination and Concurrency

Routing and planner/executor coordination risks
Fanout/deadlock/callback depth risks
Instruction boundary and state isolation risks

Composite Scores and Interpretation

Metric	Range	Interpretation
`agent_architecture.agent_reliability_score`	`0..100`	Higher is better. Reliability posture across loop guards, memory, retries, observability, eval.
`agent_architecture.governance_readiness`	`0..100`	Higher is better. Governance posture across tool policy, schema validation, tool result validation.
`agent_architecture.overall_agent_health`	`0..1`	Uses the weaker axis: `min(agent_reliability_score/100, governance_readiness/100)`.

Practical reading:

High reliability + low governance: stable behavior but unsafe controls.
Low reliability + high governance: safer controls but fragile operations.
High both: healthy and production-ready posture.

Config Quick Reference

metrics:
  - id: agent_architecture
    enabled: true
    config:
      require_loop_guards: true
      require_tool_policy: true
      require_eval_harness: true
      languages: ["python", "typescript", "rust"]
      eval_path_patterns: ["tests/agents", "evals", "agent_specs"]
      governance_weights:
        tool_policy: 0.40
        schema_validation: 0.35
        tool_result_validation: 0.25
      reliability_weights:
        loop_guard: 0.25
        memory: 0.20
        retry: 0.20
        observability: 0.20
        eval: 0.15

Config Semantics

languages: extension-based filter over detected agent call sites. Empty means no filter.
eval_path_patterns: extra directories to scan for eval/trace harnesses in addition to defaults.
governance_weights and reliability_weights: normalized if non-zero; if a provided group sums to 0, defaults are restored.
require_loop_guards, require_tool_policy, require_eval_harness: when enabled, severity is escalated to Critical if the mapped risk score is > 0.2.

Policy Quick Start

metrics:
  - id: agent_architecture

policy:
  invariants:
    - metric: agent_architecture.loop_guard_absence
      op: "<="
      value: 0.20
      message: "Agent loops must have max-steps or timeout guards"
    - metric: agent_architecture.governance_readiness
      op: ">="
      value: 80
      message: "Tool governance and schema controls must be strong"
    - metric: agent_architecture.agent_reliability_score
      op: ">="
      value: 75
      message: "Agent reliability baseline not met"

Agent Architecture

Agent Architecture

Overview and Why It Matters

What It Measures

Reliability

Governance

Observability and Eval

Coordination and Concurrency

Composite Scores and Interpretation

Config Quick Reference

Config Semantics

Policy Quick Start

Read Next