Configuration

Arxo is configured via a YAML file. Pass it with --config path/to/config.yaml.

Configuration Structure

data:
  language: auto # "typescript", "rust", "python", "java", or "auto"
  import_graph:
    group_by: folder # how to group nodes
    group_depth: 2 # depth for folder grouping
    exclude: # paths to exclude from analysis
      - target
      - node_modules
  # Optional: limit git history for faster runs (affects all git-based metrics)
  git_history:
    max_commits: 500 # default 10000; lower = faster
    since: "2024-01-01" # ISO8601 or YYYY-MM-DD; omit for all history
    until: "2025-01-01" # end of range; omit for "now"
  # Optional: runtime trace data for centrality (traffic-weighted), traffic_hotspot, critical_path, runtime_drift, sensitive_data_flow, test_coverage
  telemetry:
    source_path: ./telemetry/traces.json # file or directory of .json files
    format: otel_json                    # otel_json | zipkin_json | jaeger_json
    service_name: my-service             # optional filter
    time_window:                         # optional time filter
      start: "2024-01-01T00:00:00Z"
      end: "2024-01-07T23:59:59Z"

metrics:
  - id: scc
    enabled: true
  - id: propagation_cost
    enabled: true
  # ... more metrics

policy:
  invariants:
    - metric: scc.max_cycle_size
      op: "<="
      value: 5
    - metric: scc.cycle_count
      op: "=="
      value: 0

report:
  format: console # console | json | html | snapshot
  file: report.html # optional; for html, json, or snapshot output

Key Sections

Section	Purpose
`data`	Language, import graph options, exclusions, optional git history and telemetry
`data.git_history`	Limit git history: `max_commits` (default 10000), `since` / `until` (ISO8601 or YYYY-MM-DD). Speeds up all git-based metrics.
`data.telemetry`	Runtime trace data for runtime metrics. See Telemetry (Runtime Metrics).
`metrics`	Which metric plugins to run (`id`, `enabled`)
`policy`	Invariants: metric ID, operator (`<=`, `>=`, `==`, etc.), and value
`report`	Output format and optional output file path

Report Formats

Format	Use case	Output
`console`	Default; terminal CI	stdout
`json`	CI, tooling, APIs	stdout or `report.file`
`html`	Human review, graphs	file (set `report.file`)
`snapshot`	Versioned summaries	YAML file (set `report.file`)

Use report.file to write html, json, or snapshot to a path (e.g. report.html, report.json, snapshot.yaml). For console or without report.file, output goes to stdout.

Telemetry (Runtime Metrics)

When using the Runtime preset or metrics like centrality, traffic_hotspot, critical_path, runtime_drift, sensitive_data_flow, or test_coverage, add a data.telemetry block to supply trace data:

Field	Required	Description
`source_path`	Yes	Path to trace file or directory of `.json` files (relative to project root)
`format`	No	Trace format. Default `otel_json`.
`service_name`	No	Filter traces by service name
`time_window`	No	Filter by time range (`start` and `end`, RFC3339)

Supported Trace Formats

Format	`format` value	Source
OTLP JSON	`otel_json`	OpenTelemetry exporters
Zipkin JSON v2	`zipkin_json`	Zipkin, OpenTelemetry→Zipkin exporter
Jaeger JSON	`jaeger_json`	Jaeger Query API export

For span-to-code mapping, ensure traces include code.filepath (or code.file_path) in span attributes/tags. See the Telemetry guide for details.

Example: Strict Cycle and Coupling Policy

data:
  import_graph:
    group_by: folder
    group_depth: 3

metrics:
  - id: scc
    enabled: true
  - id: propagation_cost
    enabled: true
  - id: centrality
    enabled: true

policy:
  invariants:
    - metric: scc.max_cycle_size
      op: "<="
      value: 5
    - metric: scc.cycle_count
      op: "=="
      value: 0
    - metric: propagation_cost.system.ratio
      op: "<="
      value: 0.12

Metric-Specific Options

Some metrics accept optional config to cap cost on large graphs:

propagation_cost: max_nodes (number). If the call graph has more nodes than this, function-level betweenness signals are skipped (default 2000). Increase or omit to compute on larger call graphs.
centrality:
- max_nodes (number). If the import (or call) graph has more nodes than this, betweenness is skipped. Omit for no cap.
- use_edge_weights (bool). If false, use unweighted BFS for betweenness (faster, ~2–3×). Default true.
- betweenness_sample_ratio (number, 0–1). If set (e.g. 0.2), approximate betweenness by sampling that fraction of sources for ~5× speedup.
core_periphery (HTML report only): reduces the size of the embedded graph in HTML reports; the metric still runs on the full graph.
- graph_max_nodes (number). If set (e.g. 500), only the top N nodes by reachability are included in the graph visualization. Omit to include all nodes.
- graph_edge_sample_rate (number, 0–1). If set (e.g. 0.2), only that fraction of edges between kept nodes are included (deterministic sample). Omit to include all edges.
package_metrics:
- stable_threshold (number, 0–1). Maximum instability considered stable. Default 0.30.
- unstable_threshold (number, 0–1). Minimum instability considered unstable. Default 0.70.
- cohesion_low_threshold (number, 0–1). Packages below this are counted as low cohesion. Default 0.20.
- zone_pain_abstractness_max (number, 0–1). Zone-of-pain abstractness limit. Default 0.30.
- zone_useless_abstractness_min (number, 0–1). Zone-of-uselessness abstractness minimum. Default 0.70.
- layer_order (string array). Optional ordered layer names for layer-level package summaries.
visibility:
- top_k (number). Max rows/items for visibility tables/top lists. Default 10.
- channels.temporal_mode (off | auto | force). Git-history channel behavior. Default auto.
- channels.runtime_mode (off | auto | force). Telemetry channel behavior. Default auto.
smells:
- top_k (number). Limit smell tables/lists and findings output size. Default 10.
- emit_findings (bool). Emit structured smell findings with evidence. Default true.
- channels.temporal_mode (off | auto | force). Temporal baseline channel behavior. Default auto.
- channels.cochange_mode (off | auto | force). Co-change channel behavior. Default auto.
- channels.sat_mode (off | auto | force). SAT overlap channel behavior. Default auto.
- temporal.baseline_report_path (string). Baseline report path for temporal comparison. Default .arxo/baselines/smells_prev_report.json.
- thresholds.* and risk_weights.*. Family detection thresholds and channel weighting; see Smells for the full contract.
modularity: controls community-detection behavior and optional function-level overlap analysis.
- Configure under config.modularity in the metric entry.
- algorithm: leiden (default) or louvain
- gamma_values: positive resolution values (default [0.5, 1.0, 1.5, 2.0])
- directed, weighted: objective options (both default true)
- include_call_graph: emit modularity.function.* and cross-graph overlap keys when call graph is available
- stability_runs: run extra seeded passes and emit stability keys (modularity.module.stability.*)
- emit_findings, findings_top_k: findings controls

Example:

metrics:
  - id: propagation_cost
    enabled: true
    config:
      max_nodes: 5000
  - id: centrality
    enabled: true
    config:
      use_edge_weights: false
      max_nodes: 5000
      betweenness_sample_ratio: 0.2 # optional: ~5x faster, approximate
  - id: core_periphery
    enabled: true
    config:
      graph_max_nodes: 500 # optional: smaller HTML report
      graph_edge_sample_rate: 0.2 # optional: 20% of edges in graph
  - id: package_metrics
    enabled: true
    config:
      stable_threshold: 0.30
      unstable_threshold: 0.70
      cohesion_low_threshold: 0.20
      zone_pain_abstractness_max: 0.30
      zone_useless_abstractness_min: 0.70
      layer_order: ["app", "domain", "infra"] # optional
  - id: visibility
    enabled: true
    config:
      top_k: 10
      channels:
        temporal_mode: auto
        runtime_mode: auto
  - id: modularity
    enabled: true
    config:
      modularity:
        gamma_values: [0.5, 1.0, 1.5, 2.0]
        include_call_graph: true
        stability_runs: 5

Running with a Config File

arxo analyze --path /path/to/project --config config.yaml

For detailed metric descriptions and policy recommendations, see the Metrics section. For more on each output format, see Reports and Output Formats.