Skip to content
Arxo Arxo

Scoring and Keys

This page documents the scoring behavior and emitted metric keys for ml_architecture, aligned to:

  • crates/arxo-engine/src/metrics/ai_observability/ml_architecture/plugin.rs
  • crates/arxo-engine/src/metrics/ai_observability/ml_architecture/ui/health_summary.rs

ml_architecture.overall_score is confidence-weighted and clamped to 0..1.

For each detector score s_i with detector confidence c_i:

  • effective_conf_i = max(c_i, 0.35)

For each detector group G:

  • group_score(G) = sum(s_i * effective_conf_i) / sum(effective_conf_i)

Overall:

overall_score = clamp(
0.20 * group1 +
0.35 * group2 +
0.45 * group3,
0.0,
1.0
)

Group membership:

  • group1 (0.20): train_serve_skew_risk, skew_test_absence, train_inference_boundary
  • group2 (0.35): pipeline_complexity, reproducibility, data_lineage_integrity, experiment_isolation, eval_integrity
  • group3 (0.45): serving_maturity, drift_monitoring, data_validation, ci_integration, monitoring_alerting, model_staleness, serving_ops, shadow_canary, ab_testing, fairness_audit

ml_architecture.overall_score_extended keeps the same base groups and adds two additive groups:

  • group4 (0.20): model_validation_gates, calibration_uncertainty, feature_store_consistency, progressive_delivery_analysis, provenance_attestation, responsible_ai_governance
  • group5 (0.20): attestation_enforcement, model_registry_governance, lineage_schema_fidelity, adversarial_resilience, post_market_incident_readiness, genai_telemetry_semconv

If no training/serving ML files are detected:

  • detector scores are neutral (0.5)
  • ml_architecture.overall_score = 0.5
  • ml_architecture.overall_score_extended = 0.5
  • diagnostic file counts are emitted as 0
  • findings are not emitted in this neutral mode

Detector score keys (0..1, higher is better)

Section titled “Detector score keys (0..1, higher is better)”
Metric Key
ml_architecture.train_serve_skew_score
ml_architecture.skew_test_absence_score
ml_architecture.pipeline_complexity_score
ml_architecture.reproducibility_score
ml_architecture.train_inference_boundary_score
ml_architecture.data_lineage_integrity_score
ml_architecture.experiment_isolation_score
ml_architecture.eval_integrity_score
ml_architecture.serving_maturity_score
ml_architecture.drift_monitoring_score
ml_architecture.data_validation_score
ml_architecture.ci_integration_score
ml_architecture.fairness_audit_score
ml_architecture.ab_testing_score
ml_architecture.shadow_canary_score
ml_architecture.monitoring_alerting_score
ml_architecture.model_staleness_score
ml_architecture.serving_ops_score
ml_architecture.model_validation_gates_score
ml_architecture.calibration_uncertainty_score
ml_architecture.feature_store_consistency_score
ml_architecture.progressive_delivery_analysis_score
ml_architecture.provenance_attestation_score
ml_architecture.responsible_ai_governance_score
ml_architecture.attestation_enforcement_score
ml_architecture.model_registry_governance_score
ml_architecture.lineage_schema_fidelity_score
ml_architecture.adversarial_resilience_score
ml_architecture.post_market_incident_readiness_score
ml_architecture.genai_telemetry_semconv_score
Metric KeyRange / TypeDirection
ml_architecture.overall_score0..1Higher is better
ml_architecture.overall_score_extended0..1Higher is better
ml_architecture.gpu_file_countNumberInformational
ml_architecture.database_file_countNumberInformational
ml_architecture.env_config_file_countNumberInformational
ml_architecture.graph.*Graph entriesInformational

This contract is documented against metric version 2.0.0.