Skip to content
Arxo Arxo

ML Architecture

The ml_architecture metric evaluates architecture quality for machine learning systems across training, evaluation, serving, and MLOps operations.

Last verified against engine metric version 1.2.0.

Production ML systems often fail due to architecture gaps rather than model quality:

  • Train/serve skew creates silent prediction drift.
  • Weak reproducibility and lineage make incidents hard to debug.
  • Missing eval and validation controls allow regressions to ship.
  • Serving and monitoring gaps increase outage and staleness risk.

ml_architecture surfaces these issues as detector-level scores plus evidence-backed findings.

  • ml_architecture.train_serve_skew_score
  • ml_architecture.skew_test_absence_score
  • ml_architecture.pipeline_complexity_score
  • ml_architecture.reproducibility_score
  • ml_architecture.train_inference_boundary_score
  • ml_architecture.data_lineage_integrity_score
  • ml_architecture.experiment_isolation_score
  • ml_architecture.eval_integrity_score
  • ml_architecture.serving_maturity_score
  • ml_architecture.drift_monitoring_score
  • ml_architecture.data_validation_score
  • ml_architecture.ci_integration_score
  • ml_architecture.fairness_audit_score
  • ml_architecture.ab_testing_score
  • ml_architecture.shadow_canary_score
  • ml_architecture.monitoring_alerting_score
  • ml_architecture.model_staleness_score
  • ml_architecture.serving_ops_score

Advanced Governance and Resilience Controls

Section titled “Advanced Governance and Resilience Controls”
  • ml_architecture.model_validation_gates_score
  • ml_architecture.calibration_uncertainty_score
  • ml_architecture.feature_store_consistency_score
  • ml_architecture.progressive_delivery_analysis_score
  • ml_architecture.provenance_attestation_score
  • ml_architecture.responsible_ai_governance_score
  • ml_architecture.attestation_enforcement_score
  • ml_architecture.model_registry_governance_score
  • ml_architecture.lineage_schema_fidelity_score
  • ml_architecture.adversarial_resilience_score
  • ml_architecture.post_market_incident_readiness_score
  • ml_architecture.genai_telemetry_semconv_score
Metric KeyRange / TypeDirection
ml_architecture.overall_score0..1Higher is better
ml_architecture.overall_score_extended0..1Higher is better
ml_architecture.gpu_file_countNumberInformational
ml_architecture.database_file_countNumberInformational
ml_architecture.env_config_file_countNumberInformational
ml_architecture.graph.*Graph entriesInformational

If no training or serving ML files are detected:

  • detector scores are emitted as neutral (0.5)
  • ml_architecture.overall_score and ml_architecture.overall_score_extended are both 0.5
  • diagnostic counts (ml_architecture.gpu_file_count, ml_architecture.database_file_count, ml_architecture.env_config_file_count) are emitted as 0
  • no findings are emitted in this neutral mode
metrics:
- id: ml_architecture
enabled: true
metrics:
- id: ml_architecture
policy:
invariants:
- metric: ml_architecture.overall_score
op: ">="
value: 0.70
message: "ML architecture health baseline not met"
- metric: ml_architecture.overall_score_extended
op: ">="
value: 0.70
message: "Extended ML architecture health baseline not met"
- metric: ml_architecture.train_serve_skew_score
op: ">="
value: 0.75
message: "Train/serve skew controls are insufficient"
- metric: ml_architecture.reproducibility_score
op: ">="
value: 0.70
message: "Reproducibility baseline not met"

For production-ready profiles, see Policy and CI Gates.

  • Documentation route: /metrics/ml-architecture
  • Stable metric ID: ml_architecture