Examples and Report Walkthrough
Examples and Report Walkthrough
Section titled “Examples and Report Walkthrough”This page walks through real finetuning_architecture sample artifacts and how to interpret results.
Sample Artifacts
Section titled “Sample Artifacts”Bundled local sample outputs:
crates/arxo-engine/src/metrics/ai_observability/finetuning_architecture/samples/toy-finetune-workflow-report.jsoncrates/arxo-engine/src/metrics/ai_observability/finetuning_architecture/samples/toy-finetune-workflow-report.md
Sample project and config:
crates/arxo-engine/src/metrics/ai_observability/finetuning_architecture/samples/toy-finetune-workflow/crates/arxo-engine/src/metrics/ai_observability/finetuning_architecture/samples/finetuning-architecture-config.yaml
Regenerate the JSON Sample
Section titled “Regenerate the JSON Sample”From your project directory, run Arxo with the path to your fine-tuning project and config:
arxo analyze \ --path /path/to/your/finetune-project \ --config finetuning-architecture-config.yaml \ --format json \ --output report.jsonHow to Read the Report
Section titled “How to Read the Report”1. Start with composite health
Section titled “1. Start with composite health”finetuning_architecture.overall_finetuning_healthfinetuning_architecture.reproducibility_scorefinetuning_architecture.data_integrity_scorefinetuning_architecture.safety_governance_score
These summarize whether the pipeline is broadly healthy before detector-level triage.
2. Triage detector families
Section titled “2. Triage detector families”- Reproducibility: base model pinning, run lineage, determinism envelope, checkpoint-eval linkage.
- Data/eval: eval harness maturity, contamination risk, prompt/template-loss consistency, distillation integrity.
- Safety/governance: artifact access, trust surface, privacy recordkeeping, provenance.
- Operations: OOM controls, cost tracking, checkpoint hygiene, resume safety.
3. Use findings evidence
Section titled “3. Use findings evidence”Review findings with rule_id and CodeSpan evidence to prioritize fixes in concrete files/lines.
4. Use pipeline diagnostics
Section titled “4. Use pipeline diagnostics”pipeline_dag_depthpipeline_cycle_countpipeline_completeness_score- effect counts for GPU/database/storage in training files
These help explain operational topology and missing stages.
Example Triage Pattern: Risky Pipeline
Section titled “Example Triage Pattern: Risky Pipeline”Signals:
eval_absence_score: lowdataset_contamination_score: lowcheckpoint_eval_lineage_score: lowartifact_trust_surface_score: low
Action order:
- Add eval split and quality/safety metrics.
- Enforce split contamination controls and dedup checks.
- Link checkpoints to eval outcomes and rollback criteria.
- Harden artifact trust surface (
safe_serialization, avoid unsafe trust paths).
Example Triage Pattern: Stable Pipeline
Section titled “Example Triage Pattern: Stable Pipeline”Signals:
reproducibility_score: highdata_integrity_score: highsafety_governance_score: highoverall_finetuning_health: high
Action order:
- Keep baseline no-regression policies enabled in CI.
- Raise thresholds gradually for high-impact detector keys.
- Focus remediation on new findings only, not already green categories.
Practical Notes
Section titled “Practical Notes”- Low score without findings usually means weak evidence density; inspect central training/config files first.
- Findings are best used as fix-entry points, while scores are better for release gates and trend tracking.