Skip to content
Arxo Arxo

Scoring and Keys

This page defines the public llm.* key contract and how to interpret values.

Metric Key
llm.observability_gap
llm.context_budget_absence
llm.pii_leakage_risk
llm.cost_tracking_gap
llm.eval_harness_absence
llm.fallback_absence
llm.model_version_unpinned
llm.tool_policy_absence
llm.cache_idempotency_gap
llm.streaming_risk
llm.rate_limit_absence
llm.embedding_drift_risk
llm.template_governance_gap
llm.agent_loop_risk
llm.instruction_boundary_violation
llm.prompt_injection_surface
llm.insecure_output_handling
llm.sensitive_info_in_telemetry
llm.unbounded_consumption
llm.supply_chain_risk
llm.data_model_poisoning_exposure
llm.system_prompt_leakage
llm.vector_embedding_weakness
llm.misinformation_overreliance
llm.mcp_authz_gap
llm.mcp_tool_contract_gap
llm.genai_otel_semconv_gap
llm.structured_output_enforcement_gap
llm.model_rollout_guardrail_gap
llm.mcp_oauth21_gap
llm.mcp_pkce_gap
llm.mcp_resource_binding_gap
llm.mcp_audience_validation_gap
llm.mcp_tool_annotations_gap
llm.mcp_tool_output_schema_gap
Metric Key
llm.prompt_hardcoding_score
llm.model_coupling_score
llm.overall_integration_health
llm.eval_presence_score
llm.eval_breadth_score
llm.eval_gating_score
llm.eval_dataset_versioning_score
llm.eval_quality_score
llm.<primary_key>.confidence (all primary metrics)
Metric KeySemantics
llm.pii_taint_used1 = taint-based PII analysis used; 0 = fallback path used
llm.pii_fallback_reason0 taint used, 1 no call graph or no call sites, 2 taint propagation failed, 3 source detection failed
llm.blast_radius_available1 = call graph available, 0 = unavailable
llm.call_sites_totalTotal canonical call sites considered
llm.call_sites_discovered_countCall sites discovered from EffectIndex
llm.call_sites_enriched_countDiscovered call sites improved by bounded enrichment
llm.call_sites_unresolved_countDiscovered call sites still unresolved after enrichment

Overall Health (llm.overall_integration_health)

Section titled “Overall Health (llm.overall_integration_health)”
  • Range: 0..1 (higher is better)
  • Computed from weighted primary metrics
  • health_weights overrides are allowed
  • Effective weights are normalized to sum to 1.0

Default weights and config details are documented on the main metric page:

  • Confidence keys are emitted for all primary metrics.
  • Lower confidence usually means fallback heuristics or reduced context.
  • Before tightening policy thresholds, check:
    • llm.blast_radius_available
    • llm.pii_taint_used
    • llm.pii_fallback_reason
    • llm.call_sites_unresolved_count

Version note: this page is aligned with metric version 1.2.0.