Policy and CI Gates

Use these profiles to roll out llm_integration safely in CI.

Balanced Policy (Recommended)

metrics:
  - id: llm_integration

policy:
  invariants:
    - metric: llm.pii_leakage_risk
      op: "=="
      value: 0
      severity: error
      message: "PII must not reach prompts without controls"

    - metric: llm.prompt_injection_surface
      op: "<="
      value: 0.2
      severity: error
      message: "Prompt injection exposure is too high"

    - metric: llm.observability_gap
      op: "<="
      value: 0.25
      severity: warning
      message: "Improve observability around LLM calls"

    - metric: llm.fallback_absence
      op: "<="
      value: 0.25
      severity: warning
      message: "Fallback and timeout coverage should be improved"

    - metric: llm.rate_limit_absence
      op: "<="
      value: 0.25
      severity: warning
      message: "Rate limiting/backoff coverage should be improved"

    - metric: llm.structured_output_enforcement_gap
      op: "<="
      value: 0.3
      severity: warning
      message: "Schema enforcement for model outputs is too weak"

    - metric: llm.overall_integration_health
      op: ">="
      value: 0.7
      severity: warning
      message: "Overall LLM architecture health baseline not met"

Strict Policy (Production-Sensitive)

metrics:
  - id: llm_integration

policy:
  invariants:
    - metric: llm.pii_leakage_risk
      op: "=="
      value: 0
      severity: error
      message: "PII must be redacted before LLM prompts"

    - metric: llm.prompt_injection_surface
      op: "<="
      value: 0.1
      severity: error
      message: "Prompt injection surface must be minimized"

    - metric: llm.observability_gap
      op: "<="
      value: 0.15
      severity: error
      message: "LLM call observability coverage is insufficient"

    - metric: llm.fallback_absence
      op: "<="
      value: 0.2
      severity: error
      message: "Fallback strategy is required"

    - metric: llm.rate_limit_absence
      op: "<="
      value: 0.2
      severity: error
      message: "Rate limiting/backoff is required"

    - metric: llm.supply_chain_risk
      op: "<="
      value: 0.2
      severity: error
      message: "Model and tool dependencies must be pinned and verified"

    - metric: llm.data_model_poisoning_exposure
      op: "<="
      value: 0.2
      severity: error
      message: "Ingestion and index paths must resist poisoning"

    - metric: llm.system_prompt_leakage
      op: "<="
      value: 0.1
      severity: error
      message: "System prompts must not leak in logs or responses"

    - metric: llm.mcp_authz_gap
      op: "<="
      value: 0.2
      severity: error
      message: "MCP surfaces require strong authN/authZ"

    - metric: llm.mcp_tool_contract_gap
      op: "<="
      value: 0.2
      severity: error
      message: "MCP tools must enforce explicit input/output contracts"

    - metric: llm.genai_otel_semconv_gap
      op: "<="
      value: 0.3
      severity: error
      message: "GenAI OTel semantic conventions are required"

    - metric: llm.model_rollout_guardrail_gap
      op: "<="
      value: 0.2
      severity: error
      message: "Model rollout guardrails are required"

    - metric: llm.overall_integration_health
      op: ">="
      value: 0.8
      severity: error
      message: "Overall LLM architecture health baseline not met"

Exploratory Policy (Early Adoption)

metrics:
  - id: llm_integration

policy:
  invariants:
    - metric: llm.observability_gap
      op: "<="
      value: 0.35
      severity: warning
      message: "Improve observability coverage"

    - metric: llm.cost_tracking_gap
      op: "<="
      value: 0.35
      severity: warning
      message: "Increase token and cost tracking coverage"

    - metric: llm.eval_harness_absence
      op: "<="
      value: 0.5
      severity: warning
      message: "Add eval coverage for critical call paths"

    - metric: llm.overall_integration_health
      op: ">="
      value: 0.65
      severity: warning
      message: "Incrementally raise LLM architecture health"

Baseline No-Regression Policy

metrics:
  - id: llm_integration

policy:
  baseline:
    mode: git
    ref: origin/main
  invariants:
    - metric: llm.overall_integration_health
      op: ">="
      baseline: true
      severity: error
      message: "Overall LLM architecture health regressed"

    - metric: llm.pii_leakage_risk
      op: "<="
      baseline: true
      severity: error
      message: "PII leakage risk regressed"

    - metric: llm.prompt_injection_surface
      op: "<="
      baseline: true
      severity: error
      message: "Prompt injection risk regressed"

CI Commands

# Focused metric run
arxo analyze --path . --metric llm_integration --format json

# AI preset run (includes llm_integration)
arxo analyze --path . --preset ai --config arxo.yml --fail-fast

Rollout Sequence

Start with Exploratory or Balanced profile for one to two release cycles.
Fix recurring findings in PII, injection, fallback, rate limit, and structured outputs.
Promote key controls to Strict profile for production-critical services.
Keep baseline no-regression checks enabled permanently.

Degraded-Analysis Mode

If llm.blast_radius_available = 0 or llm.pii_taint_used = 0, analysis fidelity is reduced.