DataStore — Programmatic Data Access
:::info For engine extenders This page describes internal APIs used when building or extending the engine (e.g. custom metrics, orchestration). When using the closed-source engine via arxo-loader or the FFI, you do not have access to DataStore; you receive analysis results as a JSON string only. See Rust API and FFI API for the public library API. :::
DataStore — Programmatic Data Access
Section titled “DataStore — Programmatic Data Access”The DataStore trait (in arxo-types) is the main interface for reading analysis data inside the engine. Metric plugins and custom integrations use it to get graphs and derived indices. The engine builds graphs lazily and can serve them from cache.
Overview
Section titled “Overview”When you run analysis, the engine constructs a DataStoreImpl (or equivalent) that implements DataStore. You do not construct the store yourself when using the engine; you receive it indirectly:
- Metric plugins: Your plugin’s
computereceives aMetricContextwhosedatafield is&dyn DataStore. Use it to call the accessors below. - Orchestration: After
Orchestrator::run(), results (metrics, violations, report) are produced using the same store internally. For direct graph access in Rust you would use the engine’s APIs that expose the store or its results (e.g. viaOrchestrationResultor loader output).
All DataStore methods are async and return Result<T>. The first call to a given accessor may trigger building that graph or index; subsequent calls return the same cached value.
Core Graphs
Section titled “Core Graphs”import_graph
Section titled “import_graph”async fn import_graph(&self) -> Result<Arc<ImportGraph>>Module/file dependency graph: nodes are files (or groups), edges are imports with EdgeType (Import, Reexport, DynamicImport, etc.). Used for cycle detection, layering, and structural metrics.
See Graph types: Import graph for node/edge structure.
call_graph
Section titled “call_graph”async fn call_graph(&self) -> Result<Arc<CallGraph>>File-to-file call graph: which file’s code calls into which other file. Edges carry CallEdgeData (edge type, confidence, resolution method). Used for call-based metrics and dependency analysis.
See Graph types: Call graph for details.
entity_graph
Section titled “entity_graph”async fn entity_graph(&self) -> Result<Arc<EntityGraph>>Function/class/variable-level call graph. Nodes are entities (EntityId = file_id::symbol), edges are calls. Used for fine-grained call analysis.
See Graph types: Entity graph for details.
type_graph
Section titled “type_graph”async fn type_graph(&self) -> Result<Arc<TypeGraph>>Type relationship graph: extends, implements, trait impl, type alias, generic bounds. Nodes are types (TypeId), edges carry TypeEdgeType. Used for inheritance and type-coupling metrics.
See Graph types: Type graph for details.
Derived Indices
Section titled “Derived Indices”These are computed from the core graphs. Accessors return Arc<T> (or T where noted) so you can hold and use them without cloning the underlying data.
scc_dag
Section titled “scc_dag”async fn scc_dag(&self) -> Result<Arc<SccDag>>Strongly connected components of the import graph: cycles and their condensation into a DAG. Used for cycle metrics and cycle-aware reporting.
call_scc_dag
Section titled “call_scc_dag”async fn call_scc_dag(&self) -> Result<CallSccDag>Strongly connected components of the call graph. Used for call-level cycle detection.
reachability
Section titled “reachability”async fn reachability(&self) -> Result<Arc<ReachabilityIndex>>Transitive dependency index over the import graph: which nodes can reach which nodes. Used for impact and blast-radius analysis.
call_reachability
Section titled “call_reachability”async fn call_reachability(&self) -> Result<Arc<CallReachabilityIndex>>Transitive call reachability: which call sites can (transitively) reach which targets. Used for call-based impact analysis.
call_dependencies
Section titled “call_dependencies”async fn call_dependencies(&self) -> Result<Arc<CallDependencyIndex>>Call dependency index: who depends on whom at the call level. Used for dependency and refactoring metrics.
effect_index
Section titled “effect_index”async fn effect_index(&self) -> Result<Arc<EffectIndex>>Side-effect index: which files/functions have which effects (IO, network, storage, log, time, random, mutation, LLM). Used for effect and purity metrics.
Optional Data
Section titled “Optional Data”Some accessors return optional or conditional data (e.g. only when configured or already built).
git_history
Section titled “git_history”async fn git_history(&self) -> Result<Arc<GitHistory>>Git history data: file churn, co-change, authors. Built only when data.git_history is configured. Used for evolution and ownership metrics.
git_history_if_loaded: Synchronous accessor that returns Option<Arc<GitHistory>> if already loaded, without triggering a build. Useful for metrics that can run without git.
telemetry
Section titled “telemetry”async fn telemetry(&self) -> Result<Telemetry>Runtime telemetry (e.g. OpenTelemetry traces). Built only when data.telemetry is configured.
telemetry_mapping
Section titled “telemetry_mapping”async fn telemetry_mapping(&self) -> Result<TelemetryMappingIndex>Maps runtime spans to code (e.g. file/line). Used for runtime–code correlation metrics.
workspace_config
Section titled “workspace_config”async fn workspace_config(&self) -> Result<WorkspaceConfig>Detected workspace/monorepo configuration: packages, roots, build tool. Used for workspace-aware metrics.
build_graph
Section titled “build_graph”async fn build_graph(&self) -> Result<Option<BuildGraph>>Build dependency graph when workspace/build info is available. None when not applicable.
Computed Metrics Cache
Section titled “Computed Metrics Cache”Plugins can read and write the computed metrics cache so later plugins can reuse earlier results.
computed_metrics
Section titled “computed_metrics”async fn computed_metrics(&self) -> Result<ComputedMetricsCache>Read the current cache of metric results (by plugin id and by key). Do not modify this cache directly in most cases; the engine fills it as metrics run.
set_computed_metrics
Section titled “set_computed_metrics”fn set_computed_metrics(&self, cache: ComputedMetricsCache) -> Result<()>Write a new computed metrics cache (e.g. after your plugin has run). Used by the engine to pass results between metric phases.
Project and Config
Section titled “Project and Config”project_path
Section titled “project_path”async fn project_path(&self) -> Result<PathBuf>Absolute project path that was analyzed.
Other Accessors
Section titled “Other Accessors”The trait also includes:
exports_index— Exports index (signature may be engine-specific).dataflow_graph— Dataflow graph when implemented (signature may be placeholder).
These are used internally or by specific metrics; see arxo-types for up-to-date signatures.
Using DataStore in a Metric Plugin
Section titled “Using DataStore in a Metric Plugin”Your plugin receives a MetricContext with data: &dyn DataStore. Use it inside compute:
use arxo_types::core::types::{DataStore, MetricContext, MetricPlugin, MetricResult};use std::sync::Arc;
async fn compute(&self, ctx: &MetricContext) -> anyhow::Result<MetricResult> { let import_graph = ctx.data.import_graph().await?; let node_count = import_graph.node_count(); let edge_count = import_graph.edge_count();
// Optional: use SCC DAG for cycle count let scc_dag = ctx.data.scc_dag().await?; // ... use scc_dag ...
let values = std::collections::HashMap::from([ ("node_count".to_string(), node_count as f64), ("edge_count".to_string(), edge_count as f64), ]);
Ok(MetricResult::new( self.id().to_string(), self.version().to_string(), values, None, ))}Lazy Building and Caching
Section titled “Lazy Building and Caching”- Lazy: The first time you call e.g.
import_graph()orscc_dag(), the engine may parse files, build the graph, and compute the index. Later calls for the same store return the sameArcwithout recomputing. - Cache: When caching is enabled, the engine may load graphs and derived indices from disk (bincode). In that case, accessors deserialize from cache instead of building from source.
- Order: You can call accessors in any order. The implementation resolves dependencies (e.g. SCC DAG depends on import graph) internally.
Thread Safety
Section titled “Thread Safety”DataStore is Send + Sync. You can share a store across threads (e.g. multiple metric plugins); the implementation uses internal locking or immutable cached data as appropriate.
Next steps
Section titled “Next steps”- Graph types — Structure of each graph
- Plugin system — Implementing metrics that use the DataStore
- Caching and incremental — How the store is filled from cache