DataStore — Programmatic Data Access

:::info For engine extenders This page describes internal APIs used when building or extending the engine (e.g. custom metrics, orchestration). When using the closed-source engine via arxo-loader or the FFI, you do not have access to DataStore; you receive analysis results as a JSON string only. See Rust API and FFI API for the public library API. :::

DataStore — Programmatic Data Access

The DataStore trait (in arxo-types) is the main interface for reading analysis data inside the engine. Metric plugins and custom integrations use it to get graphs and derived indices. The engine builds graphs lazily and can serve them from cache.

Overview

When you run analysis, the engine constructs a DataStoreImpl (or equivalent) that implements DataStore. You do not construct the store yourself when using the engine; you receive it indirectly:

Metric plugins: Your plugin’s compute receives a MetricContext whose data field is &dyn DataStore. Use it to call the accessors below.
Orchestration: After Orchestrator::run(), results (metrics, violations, report) are produced using the same store internally. For direct graph access in Rust you would use the engine’s APIs that expose the store or its results (e.g. via OrchestrationResult or loader output).

All DataStore methods are async and return Result<T>. The first call to a given accessor may trigger building that graph or index; subsequent calls return the same cached value.

Core Graphs

import_graph

async fn import_graph(&self) -> Result<Arc<ImportGraph>>

Module/file dependency graph: nodes are files (or groups), edges are imports with EdgeType (Import, Reexport, DynamicImport, etc.). Used for cycle detection, layering, and structural metrics.

See Graph types: Import graph for node/edge structure.

call_graph

async fn call_graph(&self) -> Result<Arc<CallGraph>>

File-to-file call graph: which file’s code calls into which other file. Edges carry CallEdgeData (edge type, confidence, resolution method). Used for call-based metrics and dependency analysis.

See Graph types: Call graph for details.

entity_graph

async fn entity_graph(&self) -> Result<Arc<EntityGraph>>

Function/class/variable-level call graph. Nodes are entities (EntityId = file_id::symbol), edges are calls. Used for fine-grained call analysis.

See Graph types: Entity graph for details.

type_graph

async fn type_graph(&self) -> Result<Arc<TypeGraph>>

Type relationship graph: extends, implements, trait impl, type alias, generic bounds. Nodes are types (TypeId), edges carry TypeEdgeType. Used for inheritance and type-coupling metrics.

See Graph types: Type graph for details.

Derived Indices

These are computed from the core graphs. Accessors return Arc<T> (or T where noted) so you can hold and use them without cloning the underlying data.

scc_dag

async fn scc_dag(&self) -> Result<Arc<SccDag>>

Strongly connected components of the import graph: cycles and their condensation into a DAG. Used for cycle metrics and cycle-aware reporting.

call_scc_dag

async fn call_scc_dag(&self) -> Result<CallSccDag>

Strongly connected components of the call graph. Used for call-level cycle detection.

reachability

async fn reachability(&self) -> Result<Arc<ReachabilityIndex>>

Transitive dependency index over the import graph: which nodes can reach which nodes. Used for impact and blast-radius analysis.

call_reachability

async fn call_reachability(&self) -> Result<Arc<CallReachabilityIndex>>

Transitive call reachability: which call sites can (transitively) reach which targets. Used for call-based impact analysis.

call_dependencies

async fn call_dependencies(&self) -> Result<Arc<CallDependencyIndex>>

Call dependency index: who depends on whom at the call level. Used for dependency and refactoring metrics.

effect_index

async fn effect_index(&self) -> Result<Arc<EffectIndex>>

Side-effect index: which files/functions have which effects (IO, network, storage, log, time, random, mutation, LLM). Used for effect and purity metrics.

Optional Data

Some accessors return optional or conditional data (e.g. only when configured or already built).

git_history

async fn git_history(&self) -> Result<Arc<GitHistory>>

Git history data: file churn, co-change, authors. Built only when data.git_history is configured. Used for evolution and ownership metrics.

git_history_if_loaded: Synchronous accessor that returns Option<Arc<GitHistory>> if already loaded, without triggering a build. Useful for metrics that can run without git.

telemetry

async fn telemetry(&self) -> Result<Telemetry>

Runtime telemetry (e.g. OpenTelemetry traces). Built only when data.telemetry is configured.

telemetry_mapping

async fn telemetry_mapping(&self) -> Result<TelemetryMappingIndex>

Maps runtime spans to code (e.g. file/line). Used for runtime–code correlation metrics.

workspace_config

async fn workspace_config(&self) -> Result<WorkspaceConfig>

Detected workspace/monorepo configuration: packages, roots, build tool. Used for workspace-aware metrics.

build_graph

async fn build_graph(&self) -> Result<Option<BuildGraph>>

Build dependency graph when workspace/build info is available. None when not applicable.

Computed Metrics Cache

Plugins can read and write the computed metrics cache so later plugins can reuse earlier results.

computed_metrics

async fn computed_metrics(&self) -> Result<ComputedMetricsCache>

Read the current cache of metric results (by plugin id and by key). Do not modify this cache directly in most cases; the engine fills it as metrics run.

set_computed_metrics

fn set_computed_metrics(&self, cache: ComputedMetricsCache) -> Result<()>

Write a new computed metrics cache (e.g. after your plugin has run). Used by the engine to pass results between metric phases.

Project and Config

project_path

async fn project_path(&self) -> Result<PathBuf>

Absolute project path that was analyzed.

Other Accessors

The trait also includes:

exports_index — Exports index (signature may be engine-specific).
dataflow_graph — Dataflow graph when implemented (signature may be placeholder).

These are used internally or by specific metrics; see arxo-types for up-to-date signatures.

Using DataStore in a Metric Plugin

Your plugin receives a MetricContext with data: &dyn DataStore. Use it inside compute:

use arxo_types::core::types::{DataStore, MetricContext, MetricPlugin, MetricResult};
use std::sync::Arc;

async fn compute(&self, ctx: &MetricContext) -> anyhow::Result<MetricResult> {
    let import_graph = ctx.data.import_graph().await?;
    let node_count = import_graph.node_count();
    let edge_count = import_graph.edge_count();

    // Optional: use SCC DAG for cycle count
    let scc_dag = ctx.data.scc_dag().await?;
    // ... use scc_dag ...

    let values = std::collections::HashMap::from([
        ("node_count".to_string(), node_count as f64),
        ("edge_count".to_string(), edge_count as f64),
    ]);

    Ok(MetricResult::new(
        self.id().to_string(),
        self.version().to_string(),
        values,
        None,
    ))
}

Lazy Building and Caching

Lazy: The first time you call e.g. import_graph() or scc_dag(), the engine may parse files, build the graph, and compute the index. Later calls for the same store return the same Arc without recomputing.
Cache: When caching is enabled, the engine may load graphs and derived indices from disk (bincode). In that case, accessors deserialize from cache instead of building from source.
Order: You can call accessors in any order. The implementation resolves dependencies (e.g. SCC DAG depends on import graph) internally.

Thread Safety

DataStore is Send + Sync. You can share a store across threads (e.g. multiple metric plugins); the implementation uses internal locking or immutable cached data as appropriate.

Next steps

Graph types — Structure of each graph
Plugin system — Implementing metrics that use the DataStore
Caching and incremental — How the store is filled from cache