Framework and Language Coverage
Framework and Language Coverage
Section titled “Framework and Language Coverage”finetuning_architecture uses a hybrid approach: file discovery + framework anchors + targeted parsing + evidence scoring.
Supported File Types
Section titled “Supported File Types”Code and config candidates:
.py,.ipynb,.ts,.js,.rs,.yml,.yaml,.json,.toml
Dataset-like files:
.jsonl,.json,.parquet,.csv,.arrow
For dataset-content checks, scanning is intentionally bounded to improve speed on large repositories.
Framework Detection Coverage
Section titled “Framework Detection Coverage”Framework and stack anchors include:
- HuggingFace Trainer/Transformers
Trainer,TrainingArguments,Seq2SeqTrainer,from_pretrained
- TRL training paths
SFTTrainer,PPOTrainer,DPOTrainer,ORPOTrainer, plus modern method hints (rft,grpo,rloo)
- OpenAI fine-tuning jobs
openai.FineTuningJob,fine_tuning.jobs.create
- Axolotl-style YAML config keys
base_model,adapter,datasets,chat_template, and related training controls
Detection Approach
Section titled “Detection Approach”- Pattern and semantic checks provide broad, fast coverage.
- Targeted parsing is used on supported paths for stronger precision.
- Findings are emitted only when concrete evidence is found.
This is architecture analysis, not runtime validation.
Pipeline and Effect Diagnostics
Section titled “Pipeline and Effect Diagnostics”finetuning_architecture requires:
CallGraphEffectIndex
Effect index is used for pipeline-effect diagnostics:
training_files_with_gpu_counttraining_files_with_database_counttraining_files_with_storage_count
Known Blind Spots and Caveats
Section titled “Known Blind Spots and Caveats”- Wrapper abstractions can hide true trainer/eval/checkpoint boundaries.
- Local heuristics can overmatch helper names in utility files.
- Dynamic metadata construction can hide lineage or provenance semantics.
- Non-standard training stacks without known anchors may be under-detected.
- Access-control and trust-surface checks are local-context based, so centralized wrappers may be missed.
How to Improve Coverage in Your Repo
Section titled “How to Improve Coverage in Your Repo”- Keep training/config conventions explicit and centralized.
- Prefer explicit eval/checkpoint/lineage declarations over implicit defaults.
- Standardize metadata fields (base model, dataset version, artifact checksums).
- Review findings in core training modules before broad CI enforcement.