check_llm_integration
check_llm_integration
Section titled “check_llm_integration”Check the health of LLM integrations in your codebase. This tool detects observability gaps, PII leakage risks, cost tracking issues, prompt governance problems, and resilience anti-patterns in code that calls LLM providers (OpenAI, Anthropic, etc.).
Parameters
Section titled “Parameters”| Parameter | Type | Required | Description |
|---|---|---|---|
project_path | string | Yes | Absolute or relative path to the project root directory |
Response
Section titled “Response”Returns a JSON summary with LLM integration health scores and findings.
Response Schema
Section titled “Response Schema”{ "llm_integration": { "health_score": "number", // 0-1 overall health (1 = perfect) "observability_gap": "number", // Count of unobserved LLM calls "pii_leakage_risk": "number", // Count of potential PII leaks to LLM "cost_tracking_gap": "number", // Count of calls without token/cost tracking "prompt_hardcoding": "number", // Count of hardcoded prompts (should be in registry) "model_coupling": "number", // Count of direct model dependencies (should use adapter) "fallback_absence": "number" // Count of calls without fallback/retry logic }, "findings": [ { "title": "string", "severity": "error|warning|info", "evidence_count": "number", "description": "string" } ], "violations_count": "number"}Health Score Interpretation
Section titled “Health Score Interpretation”| Score | Grade | Interpretation |
|---|---|---|
| 0.8 - 1.0 | ✅ Excellent | Production-ready LLM integration |
| 0.6 - 0.8 | ⚠️ Good | Minor issues, review findings |
| 0.4 - 0.6 | ⚠️ Fair | Address observability and cost tracking |
| 0.2 - 0.4 | 🚨 Poor | Major gaps — not production-ready |
| 0.0 - 0.2 | 🚨 Critical | Immediate action required |
Examples
Section titled “Examples”Healthy integration
Section titled “Healthy integration”Request:
{ "project_path": "."}Response:
{ "llm_integration": { "health_score": 0.87, "observability_gap": 0, "pii_leakage_risk": 0, "cost_tracking_gap": 1, "prompt_hardcoding": 2, "model_coupling": 0, "fallback_absence": 0 }, "findings": [ { "title": "Cost tracking gap", "severity": "warning", "evidence_count": 1, "description": "1 LLM call missing token usage logging" }, { "title": "Prompt hardcoding", "severity": "info", "evidence_count": 2, "description": "2 inline prompts detected — consider moving to prompt registry" } ], "violations_count": 0}Integration with issues
Section titled “Integration with issues”Request:
{ "project_path": "/path/to/project"}Response:
{ "llm_integration": { "health_score": 0.42, "observability_gap": 5, "pii_leakage_risk": 2, "cost_tracking_gap": 8, "prompt_hardcoding": 12, "model_coupling": 3, "fallback_absence": 6 }, "findings": [ { "title": "Observability gap", "severity": "error", "evidence_count": 5, "description": "LLM calls without tracing or logging detected" }, { "title": "PII leakage risk", "severity": "error", "evidence_count": 2, "description": "User data sent to LLM without redaction" }, { "title": "Cost tracking gap", "severity": "warning", "evidence_count": 8, "description": "Missing token usage logging and budget alerts" }, { "title": "Fallback absence", "severity": "warning", "evidence_count": 6, "description": "No timeout, retry, or fallback configuration" } ], "violations_count": 2}Interpretation:
- Health score 0.42 — needs improvement before production
- 5 LLM calls lack observability (add tracing/logging)
- 2 PII leakage risks (add redaction)
- 8 calls don’t track token usage (add cost monitoring)
- 6 calls have no fallback logic (add retries/timeouts)
Findings and Remediation
Section titled “Findings and Remediation”Observability Gap
Section titled “Observability Gap”Issue: LLM calls without tracing or structured logging
Fix:
- Add OpenTelemetry spans around LLM calls
- Log request/response metadata (model, tokens, latency)
- Use structured logging libraries (e.g.,
winston,slog)
PII Leakage Risk
Section titled “PII Leakage Risk”Issue: User data sent to LLM without redaction
Fix:
- Implement PII detection and redaction before LLM calls
- Use allowlists for data fields sent to LLMs
- Add audit logging for all data sent to external LLM providers
Cost Tracking Gap
Section titled “Cost Tracking Gap”Issue: Missing token usage logging and budget monitoring
Fix:
- Log
usage.total_tokensfrom LLM responses - Implement per-user or per-request cost tracking
- Set up budget alerts (e.g., daily spend limits)
Prompt Hardcoding
Section titled “Prompt Hardcoding”Issue: Inline prompts make versioning/A-B testing difficult
Fix:
- Move prompts to a prompt registry or template system
- Version prompts separately from application code
- Use prompt management tools (e.g., LangSmith, PromptLayer)
Model Coupling
Section titled “Model Coupling”Issue: Direct dependencies on specific LLM providers
Fix:
- Introduce an adapter/interface layer
- Use provider-agnostic SDKs (e.g., LiteLLM, LangChain)
- Make model selection configurable
Fallback Absence
Section titled “Fallback Absence”Issue: No timeout, retry, or fallback configuration
Fix:
- Add timeouts to LLM calls (e.g., 30s)
- Implement exponential backoff retry logic
- Add fallback models or cached responses
Workflow
Section titled “Workflow”Use this tool as part of an LLM integration audit:
1. check_llm_integration → assess health score2. If health_score < 0.6: a. Review findings and prioritize fixes b. Add observability (tracing, logging) c. Implement PII redaction d. Add cost tracking and budgets3. Re-run check_llm_integration to verify improvementsSee Workflows: LLM integration audit for a full example.
Error Cases
Section titled “Error Cases”| Error | Cause | Solution |
|---|---|---|
missing required parameter: project_path | project_path not provided | Include project_path in request |
llm_integration metric not in results | No LLM calls detected in codebase | Verify project uses LLM providers (OpenAI, Anthropic, etc.) |
ai preset may not include it | Engine version mismatch | Update to latest arxo version |
Performance
Section titled “Performance”- Speed: 5-15 seconds (uses
aipreset with 5 AI-related metrics) - Caching: Does not use cache (always fresh analysis)
- Scalability: Handles projects with 1k+ LLM call sites efficiently
Related Tools
Section titled “Related Tools”analyze_architecture— Full analysis includingrag_architecture,agent_architecture, etc.evaluate_policy— Enforce LLM integration policies
Related Resources
Section titled “Related Resources”llm://risks/<path>— Read cached LLM integration findings- LLM Architecture Metric — Metric semantics, scoring, and policy references
Related Guides
Section titled “Related Guides”- LLM Architecture — Practical rollout, triage, and CI enforcement workflow
- AI Observability — Tracing and logging patterns (if exists)
CLI Equivalent
Section titled “CLI Equivalent”# Check LLM integration healtharxo analyze --preset ai --format json | jq '.results[] | select(.id=="llm_integration")'