Skip to content
Arxo Arxo

Framework and Language Coverage

finetuning_architecture uses a hybrid approach: file discovery + framework anchors + targeted parsing + evidence scoring.

Code and config candidates:

  • .py, .ipynb, .ts, .js, .rs, .yml, .yaml, .json, .toml

Dataset-like files:

  • .jsonl, .json, .parquet, .csv, .arrow

For dataset-content checks, scanning is intentionally bounded to improve speed on large repositories.

Framework and stack anchors include:

  • HuggingFace Trainer/Transformers
    • Trainer, TrainingArguments, Seq2SeqTrainer, from_pretrained
  • TRL training paths
    • SFTTrainer, PPOTrainer, DPOTrainer, ORPOTrainer, plus modern method hints (rft, grpo, rloo)
  • OpenAI fine-tuning jobs
    • openai.FineTuningJob, fine_tuning.jobs.create
  • Axolotl-style YAML config keys
    • base_model, adapter, datasets, chat_template, and related training controls
  • Pattern and semantic checks provide broad, fast coverage.
  • Targeted parsing is used on supported paths for stronger precision.
  • Findings are emitted only when concrete evidence is found.

This is architecture analysis, not runtime validation.

finetuning_architecture requires:

  • CallGraph
  • EffectIndex

Effect index is used for pipeline-effect diagnostics:

  • training_files_with_gpu_count
  • training_files_with_database_count
  • training_files_with_storage_count
  • Wrapper abstractions can hide true trainer/eval/checkpoint boundaries.
  • Local heuristics can overmatch helper names in utility files.
  • Dynamic metadata construction can hide lineage or provenance semantics.
  • Non-standard training stacks without known anchors may be under-detected.
  • Access-control and trust-surface checks are local-context based, so centralized wrappers may be missed.
  1. Keep training/config conventions explicit and centralized.
  2. Prefer explicit eval/checkpoint/lineage declarations over implicit defaults.
  3. Standardize metadata fields (base model, dataset version, artifact checksums).
  4. Review findings in core training modules before broad CI enforcement.