Configuration
Configuration
Section titled “Configuration”Arxo is configured via a YAML file. Pass it with --config path/to/config.yaml.
Configuration Structure
Section titled “Configuration Structure”data: language: auto # "typescript", "rust", "python", "java", or "auto" import_graph: group_by: folder # how to group nodes group_depth: 2 # depth for folder grouping exclude: # paths to exclude from analysis - target - node_modules # Optional: limit git history for faster runs (affects all git-based metrics) git_history: max_commits: 500 # default 10000; lower = faster since: "2024-01-01" # ISO8601 or YYYY-MM-DD; omit for all history until: "2025-01-01" # end of range; omit for "now" # Optional: runtime trace data for centrality (traffic-weighted), traffic_hotspot, critical_path, runtime_drift, sensitive_data_flow, test_coverage telemetry: source_path: ./telemetry/traces.json # file or directory of .json files format: otel_json # otel_json | zipkin_json | jaeger_json service_name: my-service # optional filter time_window: # optional time filter start: "2024-01-01T00:00:00Z" end: "2024-01-07T23:59:59Z"
metrics: - id: scc enabled: true - id: propagation_cost enabled: true # ... more metrics
policy: invariants: - metric: scc.max_cycle_size op: "<=" value: 5 - metric: scc.cycle_count op: "==" value: 0
report: format: console # console | json | html | snapshot file: report.html # optional; for html, json, or snapshot outputKey Sections
Section titled “Key Sections”| Section | Purpose |
|---|---|
data | Language, import graph options, exclusions, optional git history and telemetry |
data.git_history | Limit git history: max_commits (default 10000), since / until (ISO8601 or YYYY-MM-DD). Speeds up all git-based metrics. |
data.telemetry | Runtime trace data for runtime metrics. See Telemetry (Runtime Metrics). |
metrics | Which metric plugins to run (id, enabled) |
policy | Invariants: metric ID, operator (<=, >=, ==, etc.), and value |
report | Output format and optional output file path |
Report Formats
Section titled “Report Formats”| Format | Use case | Output |
|---|---|---|
console | Default; terminal CI | stdout |
json | CI, tooling, APIs | stdout or report.file |
html | Human review, graphs | file (set report.file) |
snapshot | Versioned summaries | YAML file (set report.file) |
Use report.file to write html, json, or snapshot to a path (e.g. report.html, report.json, snapshot.yaml). For console or without report.file, output goes to stdout.
Telemetry (Runtime Metrics)
Section titled “Telemetry (Runtime Metrics)”When using the Runtime preset or metrics like centrality, traffic_hotspot, critical_path, runtime_drift, sensitive_data_flow, or test_coverage, add a data.telemetry block to supply trace data:
| Field | Required | Description |
|---|---|---|
source_path | Yes | Path to trace file or directory of .json files (relative to project root) |
format | No | Trace format. Default otel_json. |
service_name | No | Filter traces by service name |
time_window | No | Filter by time range (start and end, RFC3339) |
Supported Trace Formats
Section titled “Supported Trace Formats”| Format | format value | Source |
|---|---|---|
| OTLP JSON | otel_json | OpenTelemetry exporters |
| Zipkin JSON v2 | zipkin_json | Zipkin, OpenTelemetry→Zipkin exporter |
| Jaeger JSON | jaeger_json | Jaeger Query API export |
For span-to-code mapping, ensure traces include code.filepath (or code.file_path) in span attributes/tags. See the Telemetry guide for details.
Example: Strict Cycle and Coupling Policy
Section titled “Example: Strict Cycle and Coupling Policy”data: import_graph: group_by: folder group_depth: 3
metrics: - id: scc enabled: true - id: propagation_cost enabled: true - id: centrality enabled: true
policy: invariants: - metric: scc.max_cycle_size op: "<=" value: 5 - metric: scc.cycle_count op: "==" value: 0 - metric: propagation_cost.system.ratio op: "<=" value: 0.12Metric-Specific Options
Section titled “Metric-Specific Options”Some metrics accept optional config to cap cost on large graphs:
- propagation_cost:
max_nodes(number). If the call graph has more nodes than this, function-level betweenness signals are skipped (default 2000). Increase or omit to compute on larger call graphs. - centrality:
max_nodes(number). If the import (or call) graph has more nodes than this, betweenness is skipped. Omit for no cap.use_edge_weights(bool). Iffalse, use unweighted BFS for betweenness (faster, ~2–3×). Defaulttrue.betweenness_sample_ratio(number, 0–1). If set (e.g.0.2), approximate betweenness by sampling that fraction of sources for ~5× speedup.
- core_periphery (HTML report only): reduces the size of the embedded graph in HTML reports; the metric still runs on the full graph.
graph_max_nodes(number). If set (e.g.500), only the top N nodes by reachability are included in the graph visualization. Omit to include all nodes.graph_edge_sample_rate(number, 0–1). If set (e.g.0.2), only that fraction of edges between kept nodes are included (deterministic sample). Omit to include all edges.
- package_metrics:
stable_threshold(number, 0–1). Maximum instability considered stable. Default0.30.unstable_threshold(number, 0–1). Minimum instability considered unstable. Default0.70.cohesion_low_threshold(number, 0–1). Packages below this are counted as low cohesion. Default0.20.zone_pain_abstractness_max(number, 0–1). Zone-of-pain abstractness limit. Default0.30.zone_useless_abstractness_min(number, 0–1). Zone-of-uselessness abstractness minimum. Default0.70.layer_order(string array). Optional ordered layer names for layer-level package summaries.
- visibility:
top_k(number). Max rows/items for visibility tables/top lists. Default10.channels.temporal_mode(off|auto|force). Git-history channel behavior. Defaultauto.channels.runtime_mode(off|auto|force). Telemetry channel behavior. Defaultauto.
- smells:
top_k(number). Limit smell tables/lists and findings output size. Default10.emit_findings(bool). Emit structured smell findings with evidence. Defaulttrue.channels.temporal_mode(off|auto|force). Temporal baseline channel behavior. Defaultauto.channels.cochange_mode(off|auto|force). Co-change channel behavior. Defaultauto.channels.sat_mode(off|auto|force). SAT overlap channel behavior. Defaultauto.temporal.baseline_report_path(string). Baseline report path for temporal comparison. Default.arxo/baselines/smells_prev_report.json.thresholds.*andrisk_weights.*. Family detection thresholds and channel weighting; see Smells for the full contract.
- modularity: controls community-detection behavior and optional function-level overlap analysis.
- Configure under
config.modularityin the metric entry. algorithm:leiden(default) orlouvaingamma_values: positive resolution values (default[0.5, 1.0, 1.5, 2.0])directed,weighted: objective options (both defaulttrue)include_call_graph: emitmodularity.function.*and cross-graph overlap keys when call graph is availablestability_runs: run extra seeded passes and emit stability keys (modularity.module.stability.*)emit_findings,findings_top_k: findings controls
- Configure under
Example:
metrics: - id: propagation_cost enabled: true config: max_nodes: 5000 - id: centrality enabled: true config: use_edge_weights: false max_nodes: 5000 betweenness_sample_ratio: 0.2 # optional: ~5x faster, approximate - id: core_periphery enabled: true config: graph_max_nodes: 500 # optional: smaller HTML report graph_edge_sample_rate: 0.2 # optional: 20% of edges in graph - id: package_metrics enabled: true config: stable_threshold: 0.30 unstable_threshold: 0.70 cohesion_low_threshold: 0.20 zone_pain_abstractness_max: 0.30 zone_useless_abstractness_min: 0.70 layer_order: ["app", "domain", "infra"] # optional - id: visibility enabled: true config: top_k: 10 channels: temporal_mode: auto runtime_mode: auto - id: modularity enabled: true config: modularity: gamma_values: [0.5, 1.0, 1.5, 2.0] include_call_graph: true stability_runs: 5Running with a Config File
Section titled “Running with a Config File”arxo analyze --path /path/to/project --config config.yamlFor detailed metric descriptions and policy recommendations, see the Metrics section. For more on each output format, see Reports and Output Formats.