MSR (Mining Software Repositories) Metrics
MSR (Mining Software Repositories) Metrics
Section titled “MSR (Mining Software Repositories) Metrics”The MSR plugin analyzes git history to extract evolution-based architectural insights that complement static analysis. These metrics reveal real-world change patterns that may not be visible in the code structure alone.
Overview
Section titled “Overview”MSR (Mining Software Repositories) is a research field that analyzes version control history to understand software evolution. This plugin implements key MSR metrics:
- Churn: Volume of code changes per file/module
- Co-change coupling: Files that change together (logical coupling)
- Hotspots: Files with high churn AND high centrality (technical debt indicators)
- Wrong boundaries: Files that co-change frequently but aren’t linked in import graph
Metrics
Section titled “Metrics”Churn Metrics
Section titled “Churn Metrics”Churn measures the volume of code changes over time:
msr.churn_total- Total lines changed (added + deleted) across all filesmsr.churn_avg- Average churn per filemsr.churn_max- Maximum churn for a single filemsr.high_churn_file_count- Number of files with high churn (>1000 lines or >50 commits)msr.commit_count- Total number of commits analyzed
Interpretation:
- High churn files are often indicators of:
- Technical debt
- Frequently changing requirements
- Unstable modules
- Areas needing refactoring
Co-change Metrics
Section titled “Co-change Metrics”Co-change coupling measures how often files change together:
msr.cochange_pairs- Number of file pairs that co-changedmsr.cochange_avg- Average co-change count per pairmsr.cochange_max- Maximum co-change count between any two files
Interpretation:
- High co-change indicates logical coupling
- Files that co-change should ideally be architecturally linked
- Co-change without import dependency suggests wrong boundaries
Hotspot Detection
Section titled “Hotspot Detection”Hotspots are files with both high churn AND high centrality:
msr.hotspot_count- Total number of hotspots detectedmsr.hotspot_severe_count- Severe hotspots (churn >2000, centrality >20)msr.hotspot_moderate_count- Moderate hotspots (churn >1000, centrality >15)
Hotspot Criteria:
- Churn > 500 lines AND centrality > 10
- Severity levels:
- Severe: churn > 2000 AND centrality > 20
- Moderate: churn > 1000 AND centrality > 15
- Mild: churn > 500 AND centrality > 10
Interpretation:
- Hotspots are critical technical debt indicators
- These files are both frequently changed AND central to the architecture
- Prioritize refactoring hotspots to reduce maintenance burden
Wrong Boundary Detection
Section titled “Wrong Boundary Detection”Wrong boundaries are files that co-change but aren’t architecturally linked:
msr.wrong_boundary_count- Total number of wrong boundariesmsr.wrong_boundary_severe_count- Severe cases (co-change >10 times)
Wrong Boundary Criteria:
- Files co-changed ≥3 times
- No direct import dependency between them
- Both files exist in the import graph
Severity Levels:
- Severe: co-change > 10 times
- Moderate: co-change > 5 times
- Mild: co-change ≥ 3 times
Interpretation:
- Wrong boundaries indicate architectural misalignment
- Files that change together should be in the same module/domain
- Consider restructuring to align with change patterns
Configuration
Section titled “Configuration”The MSR plugin requires a git repository. It automatically detects the repository root by walking up from the source path.
metrics: - id: msr enabled: true config: # Optional: limit commits analyzed (default: 10000) max_commits: 10000 # Optional: time range since: "2024-01-01T00:00:00Z" until: "2024-12-31T23:59:59Z"Policy Examples
Section titled “Policy Examples”Detect High Churn
Section titled “Detect High Churn”policy: invariants: - metric: msr.high_churn_file_count op: "<=" value: 5Prevent Hotspots
Section titled “Prevent Hotspots”policy: invariants: - metric: msr.hotspot_count op: "<=" value: 3 - metric: msr.hotspot_severe_count op: "==" value: 0Find Wrong Boundaries
Section titled “Find Wrong Boundaries”policy: invariants: - metric: msr.wrong_boundary_count op: "<=" value: 10 - metric: msr.wrong_boundary_severe_count op: "==" value: 0Use Cases
Section titled “Use Cases”1. Technical Debt Identification
Section titled “1. Technical Debt Identification”Use hotspots to identify files that need refactoring:
metrics: - id: msrpolicy: invariants: - metric: msr.hotspot_severe_count op: "==" value: 02. Domain Boundary Validation
Section titled “2. Domain Boundary Validation”Use wrong boundaries to validate that architectural boundaries align with change patterns:
metrics: - id: msrpolicy: invariants: - metric: msr.wrong_boundary_severe_count op: "==" value: 03. Change Impact Analysis
Section titled “3. Change Impact Analysis”Use churn metrics to understand which modules are most volatile:
metrics: - id: msrreport: format: console # Will show top churn files in detailsDetails Output
Section titled “Details Output”The MSR plugin provides detailed information in the details field:
{ "hotspots": [ { "node_id": "src/core/auth.ts", "churn": 2500, "centrality": 25, "severity": "severe" } ], "wrong_boundaries": [ { "file1": "src/auth/login.ts", "file2": "src/auth/session.ts", "cochange_count": 15, "severity": "severe" } ], "date_range": { "first_commit": "2024-01-01T00:00:00Z", "last_commit": "2024-12-31T23:59:59Z" }}Limitations
Section titled “Limitations”-
Git Repository Required: The plugin requires a git repository. If no repository is found, it returns empty metrics with a message.
-
Performance: Analyzing large repositories can be slow. The plugin limits to 10,000 commits by default.
-
Line Count Approximation: For performance, line counts are approximated by distributing total diff stats across changed files. For exact per-file counts, a more detailed analysis would be needed.
-
Time Range: Time-based filtering is done after fetching commits, which may be inefficient for very large repositories.
Best Practices
Section titled “Best Practices”-
Combine with Static Metrics: Use MSR metrics together with static metrics (SCC, PC, Modularity) for a complete picture.
-
Focus on Trends: Track MSR metrics over time to identify deteriorating areas.
-
Prioritize Hotspots: Address severe hotspots first - they have the highest maintenance cost.
-
Validate Boundaries: Use wrong boundaries to validate that your architectural boundaries match real change patterns.
-
Set Realistic Thresholds: Start with lenient thresholds and tighten them as you refactor.
References
Section titled “References”- D’Ambros, M., et al. (2010). “Analyzing software evolution through code churn”
- Zimmermann, T., et al. (2005). “Mining version histories to guide software changes”
- Hassan, A. E. (2009). “Predicting faults using the complexity of code changes”
- Arcan, S., et al. (2017). “How do developers react to API evolution? The Pharo ecosystem case”