Skip to content
Arxo Arxo

DataStore — Programmatic Data Access

:::info For engine extenders This page describes internal APIs used when building or extending the engine (e.g. custom metrics, orchestration). When using the closed-source engine via arxo-loader or the FFI, you do not have access to DataStore; you receive analysis results as a JSON string only. See Rust API and FFI API for the public library API. :::

The DataStore trait (in arxo-types) is the main interface for reading analysis data inside the engine. Metric plugins and custom integrations use it to get graphs and derived indices. The engine builds graphs lazily and can serve them from cache.

When you run analysis, the engine constructs a DataStoreImpl (or equivalent) that implements DataStore. You do not construct the store yourself when using the engine; you receive it indirectly:

  • Metric plugins: Your plugin’s compute receives a MetricContext whose data field is &dyn DataStore. Use it to call the accessors below.
  • Orchestration: After Orchestrator::run(), results (metrics, violations, report) are produced using the same store internally. For direct graph access in Rust you would use the engine’s APIs that expose the store or its results (e.g. via OrchestrationResult or loader output).

All DataStore methods are async and return Result<T>. The first call to a given accessor may trigger building that graph or index; subsequent calls return the same cached value.

async fn import_graph(&self) -> Result<Arc<ImportGraph>>

Module/file dependency graph: nodes are files (or groups), edges are imports with EdgeType (Import, Reexport, DynamicImport, etc.). Used for cycle detection, layering, and structural metrics.

See Graph types: Import graph for node/edge structure.

async fn call_graph(&self) -> Result<Arc<CallGraph>>

File-to-file call graph: which file’s code calls into which other file. Edges carry CallEdgeData (edge type, confidence, resolution method). Used for call-based metrics and dependency analysis.

See Graph types: Call graph for details.

async fn entity_graph(&self) -> Result<Arc<EntityGraph>>

Function/class/variable-level call graph. Nodes are entities (EntityId = file_id::symbol), edges are calls. Used for fine-grained call analysis.

See Graph types: Entity graph for details.

async fn type_graph(&self) -> Result<Arc<TypeGraph>>

Type relationship graph: extends, implements, trait impl, type alias, generic bounds. Nodes are types (TypeId), edges carry TypeEdgeType. Used for inheritance and type-coupling metrics.

See Graph types: Type graph for details.

These are computed from the core graphs. Accessors return Arc<T> (or T where noted) so you can hold and use them without cloning the underlying data.

async fn scc_dag(&self) -> Result<Arc<SccDag>>

Strongly connected components of the import graph: cycles and their condensation into a DAG. Used for cycle metrics and cycle-aware reporting.

async fn call_scc_dag(&self) -> Result<CallSccDag>

Strongly connected components of the call graph. Used for call-level cycle detection.

async fn reachability(&self) -> Result<Arc<ReachabilityIndex>>

Transitive dependency index over the import graph: which nodes can reach which nodes. Used for impact and blast-radius analysis.

async fn call_reachability(&self) -> Result<Arc<CallReachabilityIndex>>

Transitive call reachability: which call sites can (transitively) reach which targets. Used for call-based impact analysis.

async fn call_dependencies(&self) -> Result<Arc<CallDependencyIndex>>

Call dependency index: who depends on whom at the call level. Used for dependency and refactoring metrics.

async fn effect_index(&self) -> Result<Arc<EffectIndex>>

Side-effect index: which files/functions have which effects (IO, network, storage, log, time, random, mutation, LLM). Used for effect and purity metrics.

Some accessors return optional or conditional data (e.g. only when configured or already built).

async fn git_history(&self) -> Result<Arc<GitHistory>>

Git history data: file churn, co-change, authors. Built only when data.git_history is configured. Used for evolution and ownership metrics.

git_history_if_loaded: Synchronous accessor that returns Option<Arc<GitHistory>> if already loaded, without triggering a build. Useful for metrics that can run without git.

async fn telemetry(&self) -> Result<Telemetry>

Runtime telemetry (e.g. OpenTelemetry traces). Built only when data.telemetry is configured.

async fn telemetry_mapping(&self) -> Result<TelemetryMappingIndex>

Maps runtime spans to code (e.g. file/line). Used for runtime–code correlation metrics.

async fn workspace_config(&self) -> Result<WorkspaceConfig>

Detected workspace/monorepo configuration: packages, roots, build tool. Used for workspace-aware metrics.

async fn build_graph(&self) -> Result<Option<BuildGraph>>

Build dependency graph when workspace/build info is available. None when not applicable.

Plugins can read and write the computed metrics cache so later plugins can reuse earlier results.

async fn computed_metrics(&self) -> Result<ComputedMetricsCache>

Read the current cache of metric results (by plugin id and by key). Do not modify this cache directly in most cases; the engine fills it as metrics run.

fn set_computed_metrics(&self, cache: ComputedMetricsCache) -> Result<()>

Write a new computed metrics cache (e.g. after your plugin has run). Used by the engine to pass results between metric phases.

async fn project_path(&self) -> Result<PathBuf>

Absolute project path that was analyzed.

The trait also includes:

  • exports_index — Exports index (signature may be engine-specific).
  • dataflow_graph — Dataflow graph when implemented (signature may be placeholder).

These are used internally or by specific metrics; see arxo-types for up-to-date signatures.

Your plugin receives a MetricContext with data: &dyn DataStore. Use it inside compute:

use arxo_types::core::types::{DataStore, MetricContext, MetricPlugin, MetricResult};
use std::sync::Arc;
async fn compute(&self, ctx: &MetricContext) -> anyhow::Result<MetricResult> {
let import_graph = ctx.data.import_graph().await?;
let node_count = import_graph.node_count();
let edge_count = import_graph.edge_count();
// Optional: use SCC DAG for cycle count
let scc_dag = ctx.data.scc_dag().await?;
// ... use scc_dag ...
let values = std::collections::HashMap::from([
("node_count".to_string(), node_count as f64),
("edge_count".to_string(), edge_count as f64),
]);
Ok(MetricResult::new(
self.id().to_string(),
self.version().to_string(),
values,
None,
))
}
  • Lazy: The first time you call e.g. import_graph() or scc_dag(), the engine may parse files, build the graph, and compute the index. Later calls for the same store return the same Arc without recomputing.
  • Cache: When caching is enabled, the engine may load graphs and derived indices from disk (bincode). In that case, accessors deserialize from cache instead of building from source.
  • Order: You can call accessors in any order. The implementation resolves dependencies (e.g. SCC DAG depends on import graph) internally.

DataStore is Send + Sync. You can share a store across threads (e.g. multiple metric plugins); the implementation uses internal locking or immutable cached data as appropriate.