Skip to content
Arxo Arxo

MSR (Mining Software Repositories) Metrics

MSR (Mining Software Repositories) Metrics

Section titled “MSR (Mining Software Repositories) Metrics”

The MSR plugin analyzes git history to extract evolution-based architectural insights that complement static analysis. These metrics reveal real-world change patterns that may not be visible in the code structure alone.

MSR (Mining Software Repositories) is a research field that analyzes version control history to understand software evolution. This plugin implements key MSR metrics:

  • Churn: Volume of code changes per file/module
  • Co-change coupling: Files that change together (logical coupling)
  • Hotspots: Files with high churn AND high centrality (technical debt indicators)
  • Wrong boundaries: Files that co-change frequently but aren’t linked in import graph

Churn measures the volume of code changes over time:

  • msr.churn_total - Total lines changed (added + deleted) across all files
  • msr.churn_avg - Average churn per file
  • msr.churn_max - Maximum churn for a single file
  • msr.high_churn_file_count - Number of files with high churn (>1000 lines or >50 commits)
  • msr.commit_count - Total number of commits analyzed

Interpretation:

  • High churn files are often indicators of:
    • Technical debt
    • Frequently changing requirements
    • Unstable modules
    • Areas needing refactoring

Co-change coupling measures how often files change together:

  • msr.cochange_pairs - Number of file pairs that co-changed
  • msr.cochange_avg - Average co-change count per pair
  • msr.cochange_max - Maximum co-change count between any two files

Interpretation:

  • High co-change indicates logical coupling
  • Files that co-change should ideally be architecturally linked
  • Co-change without import dependency suggests wrong boundaries

Hotspots are files with both high churn AND high centrality:

  • msr.hotspot_count - Total number of hotspots detected
  • msr.hotspot_severe_count - Severe hotspots (churn >2000, centrality >20)
  • msr.hotspot_moderate_count - Moderate hotspots (churn >1000, centrality >15)

Hotspot Criteria:

  • Churn > 500 lines AND centrality > 10
  • Severity levels:
    • Severe: churn > 2000 AND centrality > 20
    • Moderate: churn > 1000 AND centrality > 15
    • Mild: churn > 500 AND centrality > 10

Interpretation:

  • Hotspots are critical technical debt indicators
  • These files are both frequently changed AND central to the architecture
  • Prioritize refactoring hotspots to reduce maintenance burden

Wrong boundaries are files that co-change but aren’t architecturally linked:

  • msr.wrong_boundary_count - Total number of wrong boundaries
  • msr.wrong_boundary_severe_count - Severe cases (co-change >10 times)

Wrong Boundary Criteria:

  • Files co-changed ≥3 times
  • No direct import dependency between them
  • Both files exist in the import graph

Severity Levels:

  • Severe: co-change > 10 times
  • Moderate: co-change > 5 times
  • Mild: co-change ≥ 3 times

Interpretation:

  • Wrong boundaries indicate architectural misalignment
  • Files that change together should be in the same module/domain
  • Consider restructuring to align with change patterns

The MSR plugin requires a git repository. It automatically detects the repository root by walking up from the source path.

metrics:
- id: msr
enabled: true
config:
# Optional: limit commits analyzed (default: 10000)
max_commits: 10000
# Optional: time range
since: "2024-01-01T00:00:00Z"
until: "2024-12-31T23:59:59Z"
policy:
invariants:
- metric: msr.high_churn_file_count
op: "<="
value: 5
policy:
invariants:
- metric: msr.hotspot_count
op: "<="
value: 3
- metric: msr.hotspot_severe_count
op: "=="
value: 0
policy:
invariants:
- metric: msr.wrong_boundary_count
op: "<="
value: 10
- metric: msr.wrong_boundary_severe_count
op: "=="
value: 0

Use hotspots to identify files that need refactoring:

metrics:
- id: msr
policy:
invariants:
- metric: msr.hotspot_severe_count
op: "=="
value: 0

Use wrong boundaries to validate that architectural boundaries align with change patterns:

metrics:
- id: msr
policy:
invariants:
- metric: msr.wrong_boundary_severe_count
op: "=="
value: 0

Use churn metrics to understand which modules are most volatile:

metrics:
- id: msr
report:
format: console
# Will show top churn files in details

The MSR plugin provides detailed information in the details field:

{
"hotspots": [
{
"node_id": "src/core/auth.ts",
"churn": 2500,
"centrality": 25,
"severity": "severe"
}
],
"wrong_boundaries": [
{
"file1": "src/auth/login.ts",
"file2": "src/auth/session.ts",
"cochange_count": 15,
"severity": "severe"
}
],
"date_range": {
"first_commit": "2024-01-01T00:00:00Z",
"last_commit": "2024-12-31T23:59:59Z"
}
}
  1. Git Repository Required: The plugin requires a git repository. If no repository is found, it returns empty metrics with a message.

  2. Performance: Analyzing large repositories can be slow. The plugin limits to 10,000 commits by default.

  3. Line Count Approximation: For performance, line counts are approximated by distributing total diff stats across changed files. For exact per-file counts, a more detailed analysis would be needed.

  4. Time Range: Time-based filtering is done after fetching commits, which may be inefficient for very large repositories.

  1. Combine with Static Metrics: Use MSR metrics together with static metrics (SCC, PC, Modularity) for a complete picture.

  2. Focus on Trends: Track MSR metrics over time to identify deteriorating areas.

  3. Prioritize Hotspots: Address severe hotspots first - they have the highest maintenance cost.

  4. Validate Boundaries: Use wrong boundaries to validate that your architectural boundaries match real change patterns.

  5. Set Realistic Thresholds: Start with lenient thresholds and tighten them as you refactor.

  • D’Ambros, M., et al. (2010). “Analyzing software evolution through code churn”
  • Zimmermann, T., et al. (2005). “Mining version histories to guide software changes”
  • Hassan, A. E. (2009). “Predicting faults using the complexity of code changes”
  • Arcan, S., et al. (2017). “How do developers react to API evolution? The Pharo ecosystem case”