Framework and Language Coverage

ml_architecture combines extension-based file discovery, ML pattern detection, Python AST enhancements, and fallback heuristics.

Extension and Notebook Coverage

Plugin file inclusion is extension-based:

.py
.r
.R
.jl

Additionally, code cells from *.ipynb notebooks are collected and analyzed as synthetic files (<notebook>#cell_<n>).

Sources:

crates/arxo-engine/src/metrics/ai_observability/ml_architecture/plugin.rs
crates/arxo-engine/src/metrics/ai_observability/ml_architecture/detectors/notebook.rs
crates/arxo-engine/src/metrics/ai_observability/ml_architecture/detectors/mod.rs

ML Pattern Detection Coverage

detect_ml_files tags training and serving files by framework/pipeline anchors, including patterns such as:

Training: model.fit, Trainer, torch.optim, tf.keras, sklearn, xgboost.train
Serving: FastAPI, Flask, route decorators, prediction/inference anchors
Data references: CSV/dataset load patterns

AST-Enhanced Python Paths

When Python parsing succeeds, detectors use AST helpers for stronger precision:

Eval integrity: fit-before-split ordering checks.
Serving maturity: model-load-inside-handler detection.
Reproducibility/lineage: scoped RNG-seed checks and data/artifact load extraction.
Experiment isolation: path literal and global singleton detection.
Train/serve skew: AST feature extraction in Python.

Fallback Behavior

If AST parsing fails or language is non-Python:

Detectors fall back to line-order, regex, and content-pattern heuristics.
Evidence confidence may be lower on fallback paths.
Results remain emitted but should be interpreted with evidence context.

Dependency on Call Graph and Effect Index

ml_architecture is designed to run with:

ImportGraph
ComputedMetrics
CallGraph
EffectIndex

When optional CallGraph/EffectIndex lookups are unavailable at runtime, scoring still proceeds, but:

centrality-aware/evidence-weighted context is reduced
GPU/database/env-config diagnostic counts may be 0
some findings may be less complete

Governance and Policy Surface Coverage

Detectors such as attestation enforcement, model registry governance, lineage schema fidelity, and post-market incident readiness also scan selected config/doc artifacts (*.yaml, *.json, *.toml, *.md, CI workflow files) to capture deployment/governance controls outside source code.

genai_telemetry_semconv is neutralized when no GenAI/LLM serving surface is detected, so non-GenAI ML repositories are not penalized.

Known FP/FN Caveats

Wrapper abstractions can hide true train/serve/eval boundaries.
Pattern fallback can overmatch generic helper names.
Extension filtering omits non-standard ML file extensions.
Dynamic path construction can hide versioning/lineage semantics from static analysis.

Practical Rollout Guidance

Start with visibility mode and review evidence in top-centrality files.
Stabilize core controls first: skew, reproducibility, lineage, eval, serving.
Promote CI gates gradually once score movement is stable across releases.
Keep baseline no-regression checks to catch architecture drift.

Framework and Language Coverage

Framework and Language Coverage

Extension and Notebook Coverage

ML Pattern Detection Coverage

AST-Enhanced Python Paths

Fallback Behavior

Dependency on Call Graph and Effect Index

Governance and Policy Surface Coverage

Known FP/FN Caveats

Practical Rollout Guidance

Read Next