Skip to content
Arxo Arxo

Configuration

Arxo is configured via a YAML file. Pass it with --config path/to/config.yaml.

data:
language: auto # "typescript", "rust", "python", "java", or "auto"
import_graph:
group_by: folder # how to group nodes
group_depth: 2 # depth for folder grouping
exclude: # paths to exclude from analysis
- target
- node_modules
# Optional: limit git history for faster runs (affects all git-based metrics)
git_history:
max_commits: 500 # default 10000; lower = faster
since: "2024-01-01" # ISO8601 or YYYY-MM-DD; omit for all history
until: "2025-01-01" # end of range; omit for "now"
# Optional: runtime trace data for centrality (traffic-weighted), traffic_hotspot, critical_path, runtime_drift, sensitive_data_flow, test_coverage
telemetry:
source_path: ./telemetry/traces.json # file or directory of .json files
format: otel_json # otel_json | zipkin_json | jaeger_json
service_name: my-service # optional filter
time_window: # optional time filter
start: "2024-01-01T00:00:00Z"
end: "2024-01-07T23:59:59Z"
metrics:
- id: scc
enabled: true
- id: propagation_cost
enabled: true
# ... more metrics
policy:
invariants:
- metric: scc.max_cycle_size
op: "<="
value: 5
- metric: scc.cycle_count
op: "=="
value: 0
report:
format: console # console | json | html | snapshot
file: report.html # optional; for html, json, or snapshot output
SectionPurpose
dataLanguage, import graph options, exclusions, optional git history and telemetry
data.git_historyLimit git history: max_commits (default 10000), since / until (ISO8601 or YYYY-MM-DD). Speeds up all git-based metrics.
data.telemetryRuntime trace data for runtime metrics. See Telemetry (Runtime Metrics).
metricsWhich metric plugins to run (id, enabled)
policyInvariants: metric ID, operator (<=, >=, ==, etc.), and value
reportOutput format and optional output file path
FormatUse caseOutput
consoleDefault; terminal CIstdout
jsonCI, tooling, APIsstdout or report.file
htmlHuman review, graphsfile (set report.file)
snapshotVersioned summariesYAML file (set report.file)

Use report.file to write html, json, or snapshot to a path (e.g. report.html, report.json, snapshot.yaml). For console or without report.file, output goes to stdout.

When using the Runtime preset or metrics like centrality, traffic_hotspot, critical_path, runtime_drift, sensitive_data_flow, or test_coverage, add a data.telemetry block to supply trace data:

FieldRequiredDescription
source_pathYesPath to trace file or directory of .json files (relative to project root)
formatNoTrace format. Default otel_json.
service_nameNoFilter traces by service name
time_windowNoFilter by time range (start and end, RFC3339)
Formatformat valueSource
OTLP JSONotel_jsonOpenTelemetry exporters
Zipkin JSON v2zipkin_jsonZipkin, OpenTelemetry→Zipkin exporter
Jaeger JSONjaeger_jsonJaeger Query API export

For span-to-code mapping, ensure traces include code.filepath (or code.file_path) in span attributes/tags. See the Telemetry guide for details.

data:
import_graph:
group_by: folder
group_depth: 3
metrics:
- id: scc
enabled: true
- id: propagation_cost
enabled: true
- id: centrality
enabled: true
policy:
invariants:
- metric: scc.max_cycle_size
op: "<="
value: 5
- metric: scc.cycle_count
op: "=="
value: 0
- metric: propagation_cost.system.ratio
op: "<="
value: 0.12

Some metrics accept optional config to cap cost on large graphs:

  • propagation_cost: max_nodes (number). If the call graph has more nodes than this, function-level betweenness signals are skipped (default 2000). Increase or omit to compute on larger call graphs.
  • centrality:
    • max_nodes (number). If the import (or call) graph has more nodes than this, betweenness is skipped. Omit for no cap.
    • use_edge_weights (bool). If false, use unweighted BFS for betweenness (faster, ~2–3×). Default true.
    • betweenness_sample_ratio (number, 0–1). If set (e.g. 0.2), approximate betweenness by sampling that fraction of sources for ~5× speedup.
  • core_periphery (HTML report only): reduces the size of the embedded graph in HTML reports; the metric still runs on the full graph.
    • graph_max_nodes (number). If set (e.g. 500), only the top N nodes by reachability are included in the graph visualization. Omit to include all nodes.
    • graph_edge_sample_rate (number, 0–1). If set (e.g. 0.2), only that fraction of edges between kept nodes are included (deterministic sample). Omit to include all edges.
  • package_metrics:
    • stable_threshold (number, 0–1). Maximum instability considered stable. Default 0.30.
    • unstable_threshold (number, 0–1). Minimum instability considered unstable. Default 0.70.
    • cohesion_low_threshold (number, 0–1). Packages below this are counted as low cohesion. Default 0.20.
    • zone_pain_abstractness_max (number, 0–1). Zone-of-pain abstractness limit. Default 0.30.
    • zone_useless_abstractness_min (number, 0–1). Zone-of-uselessness abstractness minimum. Default 0.70.
    • layer_order (string array). Optional ordered layer names for layer-level package summaries.
  • visibility:
    • top_k (number). Max rows/items for visibility tables/top lists. Default 10.
    • channels.temporal_mode (off | auto | force). Git-history channel behavior. Default auto.
    • channels.runtime_mode (off | auto | force). Telemetry channel behavior. Default auto.
  • smells:
    • top_k (number). Limit smell tables/lists and findings output size. Default 10.
    • emit_findings (bool). Emit structured smell findings with evidence. Default true.
    • channels.temporal_mode (off | auto | force). Temporal baseline channel behavior. Default auto.
    • channels.cochange_mode (off | auto | force). Co-change channel behavior. Default auto.
    • channels.sat_mode (off | auto | force). SAT overlap channel behavior. Default auto.
    • temporal.baseline_report_path (string). Baseline report path for temporal comparison. Default .arxo/baselines/smells_prev_report.json.
    • thresholds.* and risk_weights.*. Family detection thresholds and channel weighting; see Smells for the full contract.
  • modularity: controls community-detection behavior and optional function-level overlap analysis.
    • Configure under config.modularity in the metric entry.
    • algorithm: leiden (default) or louvain
    • gamma_values: positive resolution values (default [0.5, 1.0, 1.5, 2.0])
    • directed, weighted: objective options (both default true)
    • include_call_graph: emit modularity.function.* and cross-graph overlap keys when call graph is available
    • stability_runs: run extra seeded passes and emit stability keys (modularity.module.stability.*)
    • emit_findings, findings_top_k: findings controls

Example:

metrics:
- id: propagation_cost
enabled: true
config:
max_nodes: 5000
- id: centrality
enabled: true
config:
use_edge_weights: false
max_nodes: 5000
betweenness_sample_ratio: 0.2 # optional: ~5x faster, approximate
- id: core_periphery
enabled: true
config:
graph_max_nodes: 500 # optional: smaller HTML report
graph_edge_sample_rate: 0.2 # optional: 20% of edges in graph
- id: package_metrics
enabled: true
config:
stable_threshold: 0.30
unstable_threshold: 0.70
cohesion_low_threshold: 0.20
zone_pain_abstractness_max: 0.30
zone_useless_abstractness_min: 0.70
layer_order: ["app", "domain", "infra"] # optional
- id: visibility
enabled: true
config:
top_k: 10
channels:
temporal_mode: auto
runtime_mode: auto
- id: modularity
enabled: true
config:
modularity:
gamma_values: [0.5, 1.0, 1.5, 2.0]
include_call_graph: true
stability_runs: 5
Terminal window
arxo analyze --path /path/to/project --config config.yaml

For detailed metric descriptions and policy recommendations, see the Metrics section. For more on each output format, see Reports and Output Formats.