Skip to content
Arxo Arxo

Remediation Playbook

Use this playbook to translate finetuning_architecture findings into concrete engineering fixes.

  • Symptom metric: finetuning_architecture.base_model_versioning_score
  • Minimal fix: pin base model/tokenizer revisions and store base fingerprints in adapter metadata.
  • Symptom metric: finetuning_architecture.run_lineage_score
  • Minimal fix: require run tracking (wandb/mlflow), commit hash capture, deterministic seeds, dataset snapshot fingerprint.
  • Symptom metric: finetuning_architecture.eval_absence_score
  • Minimal fix: enforce eval split, task metrics, and profile-appropriate maturity (dpo/ppo/rft/grpo/rloo).
  • Symptom metric: finetuning_architecture.dataset_contamination_score
  • Minimal fix: explicit split policy, overlap checks, duplicate/near-duplicate removal.
  • Symptom metric: finetuning_architecture.checkpoint_management_score
  • Minimal fix: use run-scoped output dirs, retention controls, and best-checkpoint policy.
  • Symptom metric: finetuning_architecture.checkpoint_eval_lineage_score
  • Minimal fix: link checkpoints with eval outcomes, enable best-checkpoint selection, and rollback quality criteria.
  • Symptom metric: finetuning_architecture.method_integrity_score
  • Minimal fix: enforce method-specific invariants (DPO pair schema+ref model, PPO/RFT/GRPO/RLOO reward+safety eval).
  • Symptom metric: finetuning_architecture.distillation_integrity_score
  • Minimal fix: pin teacher model revisions, track synthetic provenance, and verify teacher-student eval parity and split separation.
  • Symptom metric: finetuning_architecture.artifact_trust_surface_score
  • Minimal fix: avoid trust_remote_code=true; enforce safetensors/safe_serialization=true on save/load paths.
  • Symptom metric: finetuning_architecture.privacy_recordkeeping_score
  • Minimal fix: add DP controls where required, log epsilon/delta budget, and publish technical governance metadata.
  • Symptom metric: finetuning_architecture.adapter_isolation_score
  • Minimal fix: persist base compatibility metadata and explicit adapter target modules.
  • Symptom metric: finetuning_architecture.model_artifact_access_score
  • Minimal fix: enforce access checks around checkpoint/model read/write paths.
  • Symptom metric: finetuning_architecture.prompt_format_inconsistency_score
  • Minimal fix: choose one canonical training format per pipeline stage and enforce through data contracts.

After category fixes, confirm these together:

  • finetuning_architecture.overall_finetuning_health
  • finetuning_architecture.reproducibility_score
  • finetuning_architecture.data_integrity_score
  • finetuning_architecture.safety_governance_score

If overall score stalls, inspect persistent low detector categories from Scoring and Keys.