Fine-tuning Architecture
Fine-tuning Architecture
Section titled “Fine-tuning Architecture”The finetuning_architecture metric evaluates architecture quality for LLM fine-tuning systems. It focuses on reproducibility, data/eval integrity, safety/governance, and operational reliability.
Last verified against engine metric version 2.0.0.
Why It Matters
Section titled “Why It Matters”Fine-tuning failures are often caused by pipeline architecture gaps, not by model design alone:
- Unpinned base models and weak lineage break reproducibility.
- Missing eval harnesses and weak split hygiene allow regressions to ship.
- Weak checkpoint/resume controls create unstable training recovery.
- Adapter/metadata/access-control gaps increase governance and deployment risk.
- Missing budget and OOM controls increase cost and operational failures.
finetuning_architecture surfaces these risks as detector scores plus evidence-backed findings.
What It Catches
Section titled “What It Catches”- Reproducibility gaps: base-model pinning, lineage capture, determinism envelope, checkpoint-eval linkage.
- Data and evaluation integrity gaps: missing eval harness, contamination risk, prompt/template-loss mismatch, weak preference/distillation controls.
- Safety and governance gaps: artifact access checks, unsafe serialization/trust surfaces, privacy recordkeeping, metadata and provenance gaps.
- Operational risk gaps: OOM controls, checkpoint management, resume safety, and cost tracking.
What It Does Not Catch
Section titled “What It Does Not Catch”- Runtime model quality or benchmark accuracy.
- End-to-end privacy compliance proof.
- Security guarantees for infrastructure outside scanned repositories.
- Dynamic behavior hidden behind external services without repository evidence.
Use this metric as an architectural risk signal, then validate critical paths with runtime tests and governance review.
How Detection Works
Section titled “How Detection Works”- Discovers likely fine-tuning files using file-extension and framework-pattern anchors.
- Uses semantic indexing and targeted parsing where applicable to reduce noisy matches.
- Emits normalized metric scores and actionable findings with file/line evidence when available.
Required Inputs and Scan Scope
Section titled “Required Inputs and Scan Scope”- Requires analysis data that includes call-graph and effect-index enrichment.
- Scans code/config candidates from common training extensions (
.py,.ipynb,.ts,.js,.rs,.yml,.yaml,.json,.toml). - Scans dataset-like files (
.jsonl,.json,.parquet,.csv,.arrow) with bounded sampling for content-based checks.
Key Outputs
Section titled “Key Outputs”finetuning_architecture.base_model_versioning_scorefinetuning_architecture.run_lineage_scorefinetuning_architecture.eval_absence_scorefinetuning_architecture.eval_maturity_levelfinetuning_architecture.dataset_contamination_scorefinetuning_architecture.chat_template_scorefinetuning_architecture.checkpoint_management_scorefinetuning_architecture.resume_safety_scorefinetuning_architecture.adapter_isolation_scorefinetuning_architecture.model_artifact_access_scorefinetuning_architecture.artifact_trust_surface_scorefinetuning_architecture.privacy_recordkeeping_scorefinetuning_architecture.method_integrity_scorefinetuning_architecture.distillation_integrity_scorefinetuning_architecture.checkpoint_eval_lineage_scorefinetuning_architecture.prompt_format_inconsistency_scorefinetuning_architecture.oom_risk_scorefinetuning_architecture.cost_tracking_scorefinetuning_architecture.artifact_metadata_score
Composite and pipeline outputs:
| Metric Key | Range / Type | Direction |
|---|---|---|
finetuning_architecture.reproducibility_score | 0..1 | Higher is better |
finetuning_architecture.data_integrity_score | 0..1 | Higher is better |
finetuning_architecture.safety_governance_score | 0..1 | Higher is better |
finetuning_architecture.overall_finetuning_health | 0..1 | Higher is better |
finetuning_architecture.pipeline_dag_depth | Number | Informational |
finetuning_architecture.pipeline_cycle_count | Number | Informational |
finetuning_architecture.pipeline_completeness_score | 0..1 | Higher is better |
For the full key contract and formulas, see Scoring and Keys.
Finding Anatomy (What You Triage)
Section titled “Finding Anatomy (What You Triage)”Findings are emitted when evidence is available, including:
rule_idfor automation and policy correlationseverityfor triage priorityevidencewith code span (path,line) where possiblerecommendation,impact, andeffortto guide remediation
Config Quick Reference
Section titled “Config Quick Reference”metrics: - id: finetuning_architecture enabled: true config: profile: "sft" # "sft" | "dpo" | "ppo" | "rft" | "grpo" | "rloo" | "distill" require_eval_harness: true require_base_pinning: true require_full_determinism: false require_preference_eval: false require_checkpoint_eval_lineage: true require_safe_serialization: true privacy_profile: "strict" # "none" | "dp" | "recordkeeping" | "strict" large_sequence_threshold: 2048Policy Quick Start
Section titled “Policy Quick Start”metrics: - id: finetuning_architecture
policy: invariants: - metric: finetuning_architecture.overall_finetuning_health op: ">=" value: 0.70 message: "Overall fine-tuning architecture health baseline not met" - metric: finetuning_architecture.base_model_versioning_score op: ">=" value: 0.80 message: "Base model and tokenizer pinning baseline not met" - metric: finetuning_architecture.eval_absence_score op: ">=" value: 0.75 message: "Eval harness and maturity baseline not met" - metric: finetuning_architecture.dataset_contamination_score op: ">=" value: 0.75 message: "Dataset contamination controls are insufficient" - metric: finetuning_architecture.resume_safety_score op: ">=" value: 0.80 message: "Checkpoint resume safety baseline not met" - metric: finetuning_architecture.checkpoint_eval_lineage_score op: ">=" value: 0.80 message: "Checkpoint-eval lineage baseline not met" - metric: finetuning_architecture.artifact_trust_surface_score op: ">=" value: 0.85 message: "Artifact trust/safe serialization baseline not met"For staged rollout profiles, see Policy and CI Gates.
Runtime and ID Compatibility
Section titled “Runtime and ID Compatibility”- Documentation route:
/metrics/finetuning-architecture - Stable metric ID:
finetuning_architecture