check_llm_integration

Check the health of LLM integrations in your codebase. This tool detects observability gaps, PII leakage risks, cost tracking issues, prompt governance problems, and resilience anti-patterns in code that calls LLM providers (OpenAI, Anthropic, etc.).

Parameters

Parameter	Type	Required	Description
`project_path`	string	Yes	Absolute or relative path to the project root directory

Response

Returns a JSON summary with LLM integration health scores and findings.

Response Schema

{
  "llm_integration": {
    "health_score": "number",           // 0-1 overall health (1 = perfect)
    "observability_gap": "number",      // Count of unobserved LLM calls
    "pii_leakage_risk": "number",       // Count of potential PII leaks to LLM
    "cost_tracking_gap": "number",      // Count of calls without token/cost tracking
    "prompt_hardcoding": "number",      // Count of hardcoded prompts (should be in registry)
    "model_coupling": "number",         // Count of direct model dependencies (should use adapter)
    "fallback_absence": "number"        // Count of calls without fallback/retry logic
  },
  "findings": [
    {
      "title": "string",
      "severity": "error|warning|info",
      "evidence_count": "number",
      "description": "string"
    }
  ],
  "violations_count": "number"
}

Health Score Interpretation

Score	Grade	Interpretation
0.8 - 1.0	✅ Excellent	Production-ready LLM integration
0.6 - 0.8	⚠️ Good	Minor issues, review findings
0.4 - 0.6	⚠️ Fair	Address observability and cost tracking
0.2 - 0.4	🚨 Poor	Major gaps — not production-ready
0.0 - 0.2	🚨 Critical	Immediate action required

Examples

Healthy integration

Request:

{
  "project_path": "."
}

Response:

{
  "llm_integration": {
    "health_score": 0.87,
    "observability_gap": 0,
    "pii_leakage_risk": 0,
    "cost_tracking_gap": 1,
    "prompt_hardcoding": 2,
    "model_coupling": 0,
    "fallback_absence": 0
  },
  "findings": [
    {
      "title": "Cost tracking gap",
      "severity": "warning",
      "evidence_count": 1,
      "description": "1 LLM call missing token usage logging"
    },
    {
      "title": "Prompt hardcoding",
      "severity": "info",
      "evidence_count": 2,
      "description": "2 inline prompts detected — consider moving to prompt registry"
    }
  ],
  "violations_count": 0
}

Integration with issues

Request:

{
  "project_path": "/path/to/project"
}

Response:

{
  "llm_integration": {
    "health_score": 0.42,
    "observability_gap": 5,
    "pii_leakage_risk": 2,
    "cost_tracking_gap": 8,
    "prompt_hardcoding": 12,
    "model_coupling": 3,
    "fallback_absence": 6
  },
  "findings": [
    {
      "title": "Observability gap",
      "severity": "error",
      "evidence_count": 5,
      "description": "LLM calls without tracing or logging detected"
    },
    {
      "title": "PII leakage risk",
      "severity": "error",
      "evidence_count": 2,
      "description": "User data sent to LLM without redaction"
    },
    {
      "title": "Cost tracking gap",
      "severity": "warning",
      "evidence_count": 8,
      "description": "Missing token usage logging and budget alerts"
    },
    {
      "title": "Fallback absence",
      "severity": "warning",
      "evidence_count": 6,
      "description": "No timeout, retry, or fallback configuration"
    }
  ],
  "violations_count": 2
}

Interpretation:

Health score 0.42 — needs improvement before production
5 LLM calls lack observability (add tracing/logging)
2 PII leakage risks (add redaction)
8 calls don’t track token usage (add cost monitoring)
6 calls have no fallback logic (add retries/timeouts)

Findings and Remediation

Observability Gap

Issue: LLM calls without tracing or structured logging

Fix:

Add OpenTelemetry spans around LLM calls
Log request/response metadata (model, tokens, latency)
Use structured logging libraries (e.g., winston, slog)

PII Leakage Risk

Issue: User data sent to LLM without redaction

Fix:

Implement PII detection and redaction before LLM calls
Use allowlists for data fields sent to LLMs
Add audit logging for all data sent to external LLM providers

Cost Tracking Gap

Issue: Missing token usage logging and budget monitoring

Fix:

Log usage.total_tokens from LLM responses
Implement per-user or per-request cost tracking
Set up budget alerts (e.g., daily spend limits)

Prompt Hardcoding

Issue: Inline prompts make versioning/A-B testing difficult

Fix:

Move prompts to a prompt registry or template system
Version prompts separately from application code
Use prompt management tools (e.g., LangSmith, PromptLayer)

Model Coupling

Issue: Direct dependencies on specific LLM providers

Fix:

Introduce an adapter/interface layer
Use provider-agnostic SDKs (e.g., LiteLLM, LangChain)
Make model selection configurable

Fallback Absence

Issue: No timeout, retry, or fallback configuration

Fix:

Add timeouts to LLM calls (e.g., 30s)
Implement exponential backoff retry logic
Add fallback models or cached responses

Workflow

Use this tool as part of an LLM integration audit:

1. check_llm_integration → assess health score
2. If health_score < 0.6:
   a. Review findings and prioritize fixes
   b. Add observability (tracing, logging)
   c. Implement PII redaction
   d. Add cost tracking and budgets
3. Re-run check_llm_integration to verify improvements

See Workflows: LLM integration audit for a full example.

Error Cases

Error	Cause	Solution
`missing required parameter: project_path`	`project_path` not provided	Include `project_path` in request
`llm_integration metric not in results`	No LLM calls detected in codebase	Verify project uses LLM providers (OpenAI, Anthropic, etc.)
`ai preset may not include it`	Engine version mismatch	Update to latest arxo version

Performance

Speed: 5-15 seconds (uses ai preset with 5 AI-related metrics)
Caching: Does not use cache (always fresh analysis)
Scalability: Handles projects with 1k+ LLM call sites efficiently

analyze_architecture — Full analysis including rag_architecture, agent_architecture, etc.
evaluate_policy — Enforce LLM integration policies

llm://risks/<path> — Read cached LLM integration findings
LLM Architecture Metric — Metric semantics, scoring, and policy references

LLM Architecture — Practical rollout, triage, and CI enforcement workflow
AI Observability — Tracing and logging patterns (if exists)

CLI Equivalent

# Check LLM integration health
arxo analyze --preset ai --format json | jq '.results[] | select(.id=="llm_integration")'

check_llm_integration

check_llm_integration

Parameters

Response

Response Schema

Health Score Interpretation

Examples

Healthy integration

Integration with issues

Findings and Remediation

Observability Gap

PII Leakage Risk

Cost Tracking Gap

Prompt Hardcoding

Model Coupling

Fallback Absence

Workflow

Error Cases

Performance

Related Tools

Related Resources

Related Guides

CLI Equivalent