Scoring and Keys
Scoring and Keys
Section titled “Scoring and Keys”This page defines the public llm.* key contract and how to interpret values.
Higher = Worse (Risk/Gap/Absence)
Section titled “Higher = Worse (Risk/Gap/Absence)”| Metric Key |
|---|
llm.observability_gap |
llm.context_budget_absence |
llm.pii_leakage_risk |
llm.cost_tracking_gap |
llm.eval_harness_absence |
llm.fallback_absence |
llm.model_version_unpinned |
llm.tool_policy_absence |
llm.cache_idempotency_gap |
llm.streaming_risk |
llm.rate_limit_absence |
llm.embedding_drift_risk |
llm.template_governance_gap |
llm.agent_loop_risk |
llm.instruction_boundary_violation |
llm.prompt_injection_surface |
llm.insecure_output_handling |
llm.sensitive_info_in_telemetry |
llm.unbounded_consumption |
llm.supply_chain_risk |
llm.data_model_poisoning_exposure |
llm.system_prompt_leakage |
llm.vector_embedding_weakness |
llm.misinformation_overreliance |
llm.mcp_authz_gap |
llm.mcp_tool_contract_gap |
llm.genai_otel_semconv_gap |
llm.structured_output_enforcement_gap |
llm.model_rollout_guardrail_gap |
llm.mcp_oauth21_gap |
llm.mcp_pkce_gap |
llm.mcp_resource_binding_gap |
llm.mcp_audience_validation_gap |
llm.mcp_tool_annotations_gap |
llm.mcp_tool_output_schema_gap |
Higher = Better (Score/Health/Confidence)
Section titled “Higher = Better (Score/Health/Confidence)”| Metric Key |
|---|
llm.prompt_hardcoding_score |
llm.model_coupling_score |
llm.overall_integration_health |
llm.eval_presence_score |
llm.eval_breadth_score |
llm.eval_gating_score |
llm.eval_dataset_versioning_score |
llm.eval_quality_score |
llm.<primary_key>.confidence (all primary metrics) |
Diagnostic Signals
Section titled “Diagnostic Signals”| Metric Key | Semantics |
|---|---|
llm.pii_taint_used | 1 = taint-based PII analysis used; 0 = fallback path used |
llm.pii_fallback_reason | 0 taint used, 1 no call graph or no call sites, 2 taint propagation failed, 3 source detection failed |
llm.blast_radius_available | 1 = call graph available, 0 = unavailable |
llm.call_sites_total | Total canonical call sites considered |
llm.call_sites_discovered_count | Call sites discovered from EffectIndex |
llm.call_sites_enriched_count | Discovered call sites improved by bounded enrichment |
llm.call_sites_unresolved_count | Discovered call sites still unresolved after enrichment |
Overall Health (llm.overall_integration_health)
Section titled “Overall Health (llm.overall_integration_health)”- Range:
0..1(higher is better) - Computed from weighted primary metrics
health_weightsoverrides are allowed- Effective weights are normalized to sum to
1.0
Default weights and config details are documented on the main metric page:
Confidence and Fidelity Notes
Section titled “Confidence and Fidelity Notes”- Confidence keys are emitted for all primary metrics.
- Lower confidence usually means fallback heuristics or reduced context.
- Before tightening policy thresholds, check:
llm.blast_radius_availablellm.pii_taint_usedllm.pii_fallback_reasonllm.call_sites_unresolved_count
Version note: this page is aligned with metric version 1.2.0.