Skip to content
Arxo Arxo

Policy and CI Gates

Use these profiles to roll out llm_integration safely in CI.

metrics:
- id: llm_integration
policy:
invariants:
- metric: llm.pii_leakage_risk
op: "=="
value: 0
severity: error
message: "PII must not reach prompts without controls"
- metric: llm.prompt_injection_surface
op: "<="
value: 0.2
severity: error
message: "Prompt injection exposure is too high"
- metric: llm.observability_gap
op: "<="
value: 0.25
severity: warning
message: "Improve observability around LLM calls"
- metric: llm.fallback_absence
op: "<="
value: 0.25
severity: warning
message: "Fallback and timeout coverage should be improved"
- metric: llm.rate_limit_absence
op: "<="
value: 0.25
severity: warning
message: "Rate limiting/backoff coverage should be improved"
- metric: llm.structured_output_enforcement_gap
op: "<="
value: 0.3
severity: warning
message: "Schema enforcement for model outputs is too weak"
- metric: llm.overall_integration_health
op: ">="
value: 0.7
severity: warning
message: "Overall LLM architecture health baseline not met"
metrics:
- id: llm_integration
policy:
invariants:
- metric: llm.pii_leakage_risk
op: "=="
value: 0
severity: error
message: "PII must be redacted before LLM prompts"
- metric: llm.prompt_injection_surface
op: "<="
value: 0.1
severity: error
message: "Prompt injection surface must be minimized"
- metric: llm.observability_gap
op: "<="
value: 0.15
severity: error
message: "LLM call observability coverage is insufficient"
- metric: llm.fallback_absence
op: "<="
value: 0.2
severity: error
message: "Fallback strategy is required"
- metric: llm.rate_limit_absence
op: "<="
value: 0.2
severity: error
message: "Rate limiting/backoff is required"
- metric: llm.supply_chain_risk
op: "<="
value: 0.2
severity: error
message: "Model and tool dependencies must be pinned and verified"
- metric: llm.data_model_poisoning_exposure
op: "<="
value: 0.2
severity: error
message: "Ingestion and index paths must resist poisoning"
- metric: llm.system_prompt_leakage
op: "<="
value: 0.1
severity: error
message: "System prompts must not leak in logs or responses"
- metric: llm.mcp_authz_gap
op: "<="
value: 0.2
severity: error
message: "MCP surfaces require strong authN/authZ"
- metric: llm.mcp_tool_contract_gap
op: "<="
value: 0.2
severity: error
message: "MCP tools must enforce explicit input/output contracts"
- metric: llm.genai_otel_semconv_gap
op: "<="
value: 0.3
severity: error
message: "GenAI OTel semantic conventions are required"
- metric: llm.model_rollout_guardrail_gap
op: "<="
value: 0.2
severity: error
message: "Model rollout guardrails are required"
- metric: llm.overall_integration_health
op: ">="
value: 0.8
severity: error
message: "Overall LLM architecture health baseline not met"
metrics:
- id: llm_integration
policy:
invariants:
- metric: llm.observability_gap
op: "<="
value: 0.35
severity: warning
message: "Improve observability coverage"
- metric: llm.cost_tracking_gap
op: "<="
value: 0.35
severity: warning
message: "Increase token and cost tracking coverage"
- metric: llm.eval_harness_absence
op: "<="
value: 0.5
severity: warning
message: "Add eval coverage for critical call paths"
- metric: llm.overall_integration_health
op: ">="
value: 0.65
severity: warning
message: "Incrementally raise LLM architecture health"
metrics:
- id: llm_integration
policy:
baseline:
mode: git
ref: origin/main
invariants:
- metric: llm.overall_integration_health
op: ">="
baseline: true
severity: error
message: "Overall LLM architecture health regressed"
- metric: llm.pii_leakage_risk
op: "<="
baseline: true
severity: error
message: "PII leakage risk regressed"
- metric: llm.prompt_injection_surface
op: "<="
baseline: true
severity: error
message: "Prompt injection risk regressed"
Terminal window
# Focused metric run
arxo analyze --path . --metric llm_integration --format json
Terminal window
# AI preset run (includes llm_integration)
arxo analyze --path . --preset ai --config arxo.yml --fail-fast
  1. Start with Exploratory or Balanced profile for one to two release cycles.
  2. Fix recurring findings in PII, injection, fallback, rate limit, and structured outputs.
  3. Promote key controls to Strict profile for production-critical services.
  4. Keep baseline no-regression checks enabled permanently.

If llm.blast_radius_available = 0 or llm.pii_taint_used = 0, analysis fidelity is reduced.

Recommended temporary posture:

  1. Keep baseline no-regression gates enabled.
  2. Keep hard safety gates (llm.pii_leakage_risk, llm.prompt_injection_surface) with conservative thresholds.
  3. Use warnings for newly introduced strict keys until diagnostics recover.
  4. Treat persistent degraded diagnostics as a platform issue and fix parser/call-graph availability.

Version note: this page is aligned with metric version 1.2.0.