Remediation Playbook
Remediation Playbook
Section titled “Remediation Playbook”Use this playbook to translate finetuning_architecture findings into concrete engineering fixes.
Base Model Pinning
Section titled “Base Model Pinning”- Symptom metric:
finetuning_architecture.base_model_versioning_score - Minimal fix: pin base model/tokenizer revisions and store base fingerprints in adapter metadata.
Run Lineage
Section titled “Run Lineage”- Symptom metric:
finetuning_architecture.run_lineage_score - Minimal fix: require run tracking (
wandb/mlflow), commit hash capture, deterministic seeds, dataset snapshot fingerprint.
Eval Harness and Maturity
Section titled “Eval Harness and Maturity”- Symptom metric:
finetuning_architecture.eval_absence_score - Minimal fix: enforce eval split, task metrics, and profile-appropriate maturity (
dpo/ppo/rft/grpo/rloo).
Dataset Contamination
Section titled “Dataset Contamination”- Symptom metric:
finetuning_architecture.dataset_contamination_score - Minimal fix: explicit split policy, overlap checks, duplicate/near-duplicate removal.
Checkpoint Management
Section titled “Checkpoint Management”- Symptom metric:
finetuning_architecture.checkpoint_management_score - Minimal fix: use run-scoped output dirs, retention controls, and best-checkpoint policy.
Checkpoint-Eval Lineage
Section titled “Checkpoint-Eval Lineage”- Symptom metric:
finetuning_architecture.checkpoint_eval_lineage_score - Minimal fix: link checkpoints with eval outcomes, enable best-checkpoint selection, and rollback quality criteria.
Method Integrity
Section titled “Method Integrity”- Symptom metric:
finetuning_architecture.method_integrity_score - Minimal fix: enforce method-specific invariants (DPO pair schema+ref model, PPO/RFT/GRPO/RLOO reward+safety eval).
Distillation Integrity
Section titled “Distillation Integrity”- Symptom metric:
finetuning_architecture.distillation_integrity_score - Minimal fix: pin teacher model revisions, track synthetic provenance, and verify teacher-student eval parity and split separation.
Artifact Trust Surface
Section titled “Artifact Trust Surface”- Symptom metric:
finetuning_architecture.artifact_trust_surface_score - Minimal fix: avoid
trust_remote_code=true; enforcesafetensors/safe_serialization=trueon save/load paths.
Privacy and Recordkeeping
Section titled “Privacy and Recordkeeping”- Symptom metric:
finetuning_architecture.privacy_recordkeeping_score - Minimal fix: add DP controls where required, log epsilon/delta budget, and publish technical governance metadata.
Adapter Isolation
Section titled “Adapter Isolation”- Symptom metric:
finetuning_architecture.adapter_isolation_score - Minimal fix: persist base compatibility metadata and explicit adapter target modules.
Model Artifact Access
Section titled “Model Artifact Access”- Symptom metric:
finetuning_architecture.model_artifact_access_score - Minimal fix: enforce access checks around checkpoint/model read/write paths.
Prompt Format Inconsistency
Section titled “Prompt Format Inconsistency”- Symptom metric:
finetuning_architecture.prompt_format_inconsistency_score - Minimal fix: choose one canonical training format per pipeline stage and enforce through data contracts.
Composite Cross-Check
Section titled “Composite Cross-Check”After category fixes, confirm these together:
finetuning_architecture.overall_finetuning_healthfinetuning_architecture.reproducibility_scorefinetuning_architecture.data_integrity_scorefinetuning_architecture.safety_governance_score
If overall score stalls, inspect persistent low detector categories from Scoring and Keys.