Skip to content
Arxo Arxo

Fine-tuning Architecture

This guide shows an end-to-end workflow for improving finetuning_architecture health in production fine-tuning systems.

  1. Run a fine-tuning architecture audit.
  2. Triage findings by risk family and blast radius.
  3. Apply targeted remediation.
  4. Enforce policy gates in CI.
  5. Prevent regressions with baseline checks.
Terminal window
# Focused fine-tuning metric
arxo analyze --path . --metric finetuning_architecture --format json
Terminal window
# AI preset (includes finetuning_architecture)
arxo analyze --path . --preset ai --format json

Prioritize in this order:

  1. method_integrity_score, checkpoint_eval_lineage_score, artifact_trust_surface_score
  2. base_model_versioning_score, run_lineage_score, determinism_envelope_score
  3. eval_absence_score, dataset_contamination_score, distillation_integrity_score
  4. privacy_recordkeeping_score, model_artifact_access_score, adapter_isolation_score
  5. oom_risk_score, cost_tracking_score, artifact_metadata_score, prompt_format_inconsistency_score

Then inspect composite movement:

  • finetuning_architecture.overall_finetuning_health
  • finetuning_architecture.reproducibility_score
  • finetuning_architecture.data_integrity_score
  • finetuning_architecture.safety_governance_score
  • Method track: enforce profile-specific invariants (dpo, ppo, rft, grpo, rloo, distill).
  • Repro track: base/tokenizer pinning, run passport, determinism envelope, checkpoint-eval linkage.
  • Data/eval track: explicit split strategy, contamination checks, eval maturity, distillation provenance/parity.
  • Safety/governance track: trust surface hardening, access controls, privacy budget logging, technical recordkeeping.
  • Ops/cost track: OOM controls, budget and throughput tracking.

Use the Remediation Playbook for detector-by-detector fixes.

Use profiles from Policy and CI Gates.

Terminal window
arxo analyze --path . --preset ai --config arxo.yml --fail-fast

Recommended rollout:

  1. Start with warning-level thresholds.
  2. Fix recurring low-score categories in central modules.
  3. Promote critical gates to error after score trend stabilizes.
  • Enable baseline no-regression checks against origin/main.
  • Start in one critical training service/workspace.
  • Expand to additional workspaces after stable score trends.