Caching and Incremental Analysis

:::info Implementation detail This page describes how the engine caches and does incremental analysis internally. As a library user you only control behavior via config (run_options.disable_cache, run_options.incremental); you do not interact with cache structures or keys directly. For the public API see Rust API and Configuration. :::

Caching and Incremental Analysis

The engine can cache full analysis results to disk and use incremental parsing to skip unchanged files on the next run. This reduces latency for repeated analyses (e.g. in CI or local iterations).

Overview

Cache: Graphs and derived indices are serialized with bincode and stored under a project- and config-specific key. The next run with the same key can load from cache instead of rebuilding.
Cache key: Derived from git commit hash (if in a repo) or from a content hash of source files plus config (grouping, exclude patterns, language). Key changes invalidate the cache.
Incremental: When enabled with cache, the engine keeps an incremental parse state (file path → content hash, content hash → parse result). Only files whose content hash changed are re-parsed; results are merged with cached parse data for unchanged files, then graphs and derived indices are recomputed.

You control caching and incremental behavior via configuration (run_options.cache, run_options.incremental) and, when using the loader/FFI, the equivalent JSON options.

Cache Location

Default: ~/.cache/arxo/analysis/ (or the platform equivalent, e.g. dirs::cache_dir()).
Override: Set ARCH0_CACHE_DIR to a directory path. The engine creates {ARCH0_CACHE_DIR}/{project_hash}/{cache_key}/ for each project and key.

Project hash: SHA-256 of the canonical project path. Cache key: SHA-256 of (commit or content hash + grouping config + language + exclude patterns). So each distinct project and each distinct config get a separate cache entry.

What Gets Cached

The CachedAnalysis payload includes:

Version: Engine/arxo version string; used to reject caches from incompatible versions.
Cache key and timestamp.
Graphs: Import graph, call graph, entity graph, type graph (serialized form).
Derived indices: SCC DAG, call SCC DAG, reachability, call reachability, call dependencies, effect index.
Git history (optional): If git history was built, it can be stored so the next run skips git parsing.

When cache is enabled and a valid entry exists for the current key, the engine loads this payload and uses it to satisfy DataStore accessors instead of building from source.

Incremental Parse State

IncrementalParseState is stored alongside CachedAnalysis (in the same cache entry directory):

Version: Same compatibility check as above.
file_index: Map from NodeId (file path) to content hash (e.g. XXH64). Tracks which file had which content when we last parsed.
parse_cache: Map from content hash to FullParseResult (imports, exports, calls). Reusing the same hash avoids re-parsing the same content.

On the next run with incremental enabled:

For each file that might be parsed, compute its current content hash.
If the hash is in parse_cache, reuse the stored FullParseResult (no parser call).
If the hash is new or missing, parse the file and store the result in parse_cache and update file_index.
Merge all parse results (reused + new) and build graphs and derived indices as usual.
Persist updated CachedAnalysis and IncrementalParseState for the next run.

So incremental only skips parsing for unchanged files; graph and index construction still runs, but from a mix of cached parse results and new ones. Full cache load (no incremental) skips both parsing and graph building when the key matches.

Cache Key Construction

CacheKeyBuilder (engine internal) builds the key from:

Project path
Grouping config: group_by, group_depth (see configuration)
Exclude patterns: Order-sensitive list
Language: e.g. Language::TypeScript, Language::Rust

Inputs to the hash:

If the project is a git repo: git HEAD commit + config (grouping, language, excludes). Any commit or config change produces a new key.
If not git: content hash of all relevant source files (by extension for the chosen language, respecting excludes) + same config. Any file or config change produces a new key.

So:

Same commit + same config → same key → cache hit (full load or incremental).
Different branch/commit or different excludes/grouping/language → different key → cache miss; full rebuild (and new cache write).

Enabling Cache and Incremental

Configuration (YAML or equivalent JSON for FFI):

run_options:
  cache: true          # Enable reading/writing analysis cache
  incremental: true     # Use incremental parse state (implies cache for that)

cache: true: Engine will try to load from cache when the key matches, and save after a successful run.
incremental: true: Engine will use incremental parse state when available (and typically enables cache if not already). Best for local or CI runs where only a subset of files change.

Invalidation: Changing data.import_graph (group_by, group_depth, exclude), data.language, or the source tree (or git commit) changes the key and invalidates the cache for that project/config.

Changed-Files Hint

Some integrations can pass a changed-files hint (e.g. from git diff). The engine can use this to prioritize hashing and parsing only those paths in incremental mode, which can speed up the “dirty” phase. This is engine-specific (e.g. DataStoreImpl::new_with_cache(..., changed_files_hint)); see the engine API for how to pass it.

Version Compatibility

Cached and incremental data store an arxo/engine version string. If the running engine version differs (e.g. after an upgrade), the engine does not use the old cache and falls back to a full run. This avoids subtle bugs from schema or format changes.

Best Practices

CI: Enable cache and use a stable cache directory (e.g. ARCH0_CACHE_DIR pointing at a volume or restored artifact). Use the same exclude/grouping/language as your main config so the key is stable.
Local: Enable cache and incremental so repeated runs on the same repo are fast after the first run.
Clean key: Avoid changing exclude patterns or group_depth unnecessarily; each change creates a new key and a new cache entry.
Disk space: Cache entries are per project and per key. Old keys (e.g. after many commits) are not auto-deleted; clear ARCH0_CACHE_DIR or the project hash subdirectory if you need to reclaim space.

Summary

Feature	Effect
Cache on	Load/save graphs and derived indices by key; avoid full rebuild when key matches.
Incremental on	Reuse parse results for unchanged files (by content hash); re-parse only changed files, then rebuild graphs.
Cache key	Git commit (or content hash) + grouping + language + excludes.
Location	`~/.cache/arxo/analysis/` or `ARCH0_CACHE_DIR`.

Next steps

Configuration — run_options.cache and run_options.incremental
DataStore — How cached data is exposed to plugins and callers