ML Architecture

The ml_architecture metric evaluates architecture quality for machine learning systems across training, evaluation, serving, and MLOps operations.

Last verified against engine metric version 1.2.0.

Why It Matters

Production ML systems often fail due to architecture gaps rather than model quality:

Train/serve skew creates silent prediction drift.
Weak reproducibility and lineage make incidents hard to debug.
Missing eval and validation controls allow regressions to ship.
Serving and monitoring gaps increase outage and staleness risk.

ml_architecture surfaces these issues as detector-level scores plus evidence-backed findings.

What It Measures

Core ML Architecture Integrity

ml_architecture.train_serve_skew_score
ml_architecture.skew_test_absence_score
ml_architecture.pipeline_complexity_score
ml_architecture.reproducibility_score
ml_architecture.train_inference_boundary_score
ml_architecture.data_lineage_integrity_score
ml_architecture.experiment_isolation_score
ml_architecture.eval_integrity_score
ml_architecture.serving_maturity_score

Operational Readiness and Controls

ml_architecture.drift_monitoring_score
ml_architecture.data_validation_score
ml_architecture.ci_integration_score
ml_architecture.fairness_audit_score
ml_architecture.ab_testing_score
ml_architecture.shadow_canary_score
ml_architecture.monitoring_alerting_score
ml_architecture.model_staleness_score
ml_architecture.serving_ops_score

Advanced Governance and Resilience Controls

ml_architecture.model_validation_gates_score
ml_architecture.calibration_uncertainty_score
ml_architecture.feature_store_consistency_score
ml_architecture.progressive_delivery_analysis_score
ml_architecture.provenance_attestation_score
ml_architecture.responsible_ai_governance_score
ml_architecture.attestation_enforcement_score
ml_architecture.model_registry_governance_score
ml_architecture.lineage_schema_fidelity_score
ml_architecture.adversarial_resilience_score
ml_architecture.post_market_incident_readiness_score
ml_architecture.genai_telemetry_semconv_score

Composite and Diagnostic Outputs

Metric Key	Range / Type	Direction
`ml_architecture.overall_score`	`0..1`	Higher is better
`ml_architecture.overall_score_extended`	`0..1`	Higher is better
`ml_architecture.gpu_file_count`	Number	Informational
`ml_architecture.database_file_count`	Number	Informational
`ml_architecture.env_config_file_count`	Number	Informational
`ml_architecture.graph.*`	Graph entries	Informational

No ML Files Detected

If no training or serving ML files are detected:

detector scores are emitted as neutral (0.5)
ml_architecture.overall_score and ml_architecture.overall_score_extended are both 0.5
diagnostic counts (ml_architecture.gpu_file_count, ml_architecture.database_file_count, ml_architecture.env_config_file_count) are emitted as 0
no findings are emitted in this neutral mode

Config Quick Reference

metrics:
  - id: ml_architecture
    enabled: true

Policy Quick Start

metrics:
  - id: ml_architecture

policy:
  invariants:
    - metric: ml_architecture.overall_score
      op: ">="
      value: 0.70
      message: "ML architecture health baseline not met"
    - metric: ml_architecture.overall_score_extended
      op: ">="
      value: 0.70
      message: "Extended ML architecture health baseline not met"
    - metric: ml_architecture.train_serve_skew_score
      op: ">="
      value: 0.75
      message: "Train/serve skew controls are insufficient"
    - metric: ml_architecture.reproducibility_score
      op: ">="
      value: 0.70
      message: "Reproducibility baseline not met"

For production-ready profiles, see Policy and CI Gates.

Runtime and ID Compatibility

Documentation route: /metrics/ml-architecture
Stable metric ID: ml_architecture

ML Architecture

ML Architecture

Why It Matters

What It Measures

Core ML Architecture Integrity

Operational Readiness and Controls

Advanced Governance and Resilience Controls

Composite and Diagnostic Outputs

No ML Files Detected

Config Quick Reference

Policy Quick Start

Runtime and ID Compatibility

Read Next