Skip to content
Arxo Arxo

Ricci Curvature Metrics

Ricci curvature metrics provide a geometric perspective on software architecture by measuring the “curvature” of the dependency graph. Negative curvature indicates architectural problems such as bridges between modules, cycles, and hubs.

This implementation is based on Perelman’s work on Ricci flow and applies it to software architecture analysis.

Forman-Ricci curvature is a fast approximation of Ricci curvature that can be computed in O(E) time:

F(e) = 4 - deg(u) - deg(v) + 3 * triangles

Where:

  • deg(u) is the undirected degree of node u
  • triangles is the number of common neighbors (triangles through edge e)

Interpretation:

  • Negative FRC: Edge is a “bridge” between clusters (architectural problem)
  • Positive FRC: Edge is within a well-connected module (healthy)

Ollivier-Ricci curvature uses optimal transport (Wasserstein-1 distance) for more accurate measurement:

κ(u,v) = 1 - W₁(m_u, m_v) / d(u,v)

Where:

  • W₁ is Wasserstein-1 distance (Earth Mover’s Distance)
  • m_u, m_v are probability distributions on neighbors
  • d(u,v) is the graph distance

Interpretation:

  • Negative ORC: Strong indication of architectural bridge
  • More accurate but slower than FRC (O(E·N²) worst case)
  • ricci.frc_neg_share: Share of edges with negative Forman curvature (0-1)

    • < 10%: Healthy boundaries
    • 10-30%: Mixed structure
    • 30%: High probability of architectural problems

  • ricci.frc_bridge_mass: Sum of |κ| for negative FRC edges, normalized

    • < 0.03: Usually good
    • 0.03-0.08: Worth investigating top edges
    • 0.08: Architecture held together by bridges

  • ricci.frc_hotspot_conc: Concentration of negative curvature in top 5% nodes

    • 0.50: Half of “bridge pain” in small group → refactoring candidates

    • < 0.30: Problem distributed (more systemic)
  • ricci.orc_neg_share: Share of edges with negative Ollivier curvature

  • ricci.orc_bridge_mass: Sum of |κ| for negative ORC edges

Pattern Detection (Canonical Neighborhoods)

Section titled “Pattern Detection (Canonical Neighborhoods)”

Based on Perelman’s theory, we detect 5 canonical patterns:

  • Metric: ricci.pattern_neck_count
  • Criteria: Negative curvature (κ < -0.3) AND high edge betweenness (> p95)
  • Meaning: Thin “neck” connecting two clusters
  • Surgery: Interface extraction, Dependency Inversion, Event bus
  • Metric: ricci.pattern_knot_count
  • Criteria: Part of SCC with size 2-5
  • Meaning: Small cycles (mutual dependencies)
  • Surgery: Extract types.ts, remove barrel exports
  • Metric: ricci.pattern_cap_violation_rate
  • Criteria: Low degree AND imports wrong layers
  • Meaning: Leaf nodes that violate layer boundaries
  • Surgery: Add adapter layer
  • Metric: ricci.pattern_horn_count
  • Criteria: Node on chain length > 5 with low clustering
  • Meaning: Long thin chains (helper -> helper -> helper)
  • Surgery: Collapse into single module or facade
  • Metric: ricci.pattern_hub_count
  • Criteria: High degree (> p95) AND high incident negative curvature
  • Meaning: Architecture collapsing onto single node
  • Surgery: Split by domain (shared/date, shared/http, etc.)
  • ricci.algebraic_connectivity: λ₂ (Fiedler eigenvalue)

    • Small values: Graph can be easily split
    • Large values: Graph is tightly connected
  • ricci.conductance_min: Minimum conductance across communities

    • ≤ 0.05: Clear module boundaries
    • 0.05-0.15: Moderate boundaries
    • 0.15: Blurry boundaries

  • ricci.conductance_median: Median conductance

  • ricci.flow_energy_drop: Energy convergence after flow iterations

    • 40%: Structure converges, boundaries stable

    • 15-40%: Moderate
    • < 15%: Weak modularity or noisy graph
  • ricci.flow_time_to_separation: Iterations until boundaries stabilize

    • ≤ 10: Clear boundaries
    • 10-30: Moderate
    • 30: Blurry boundaries

  • ricci.cut_stability: Bootstrap stability score (0-1)

    • ≥ 0.80: Stable boundaries
    • 0.60-0.80: Tolerable
    • < 0.60: Unstable boundaries

RCD extends traditional structural dependency weights by incorporating change-coupling from Git history, providing a “cost of change” perspective on the architecture.

Mathematical Foundation:

cost(e) = α_runtime · w_runtime + α_coupling · w_coupling + α_symbols · w_symbols

Where:

  • w_runtime: Structural edge weight (import strength)
  • w_coupling: Normalized Git co-change frequency [0, 1]
  • w_symbols: Number of symbols used in the import
  • α_*: Tunable weights (default: 1.0, 1.0, 0.5)

Core Metrics:

  • ricci.rcd_within: Average edge cost within detected communities

    • Lower values: Well-encapsulated modules with low change coupling
    • Higher values: Modules change together frequently (potential for consolidation)
  • ricci.rcd_cross: Average edge cost across community boundaries

    • Lower values: Clean boundaries with minimal cross-module coupling
    • Higher values: Leaky boundaries, modules change together despite separation
  • ricci.rcd_anisotropy: Ratio of cross-boundary to within-community costs

    • High ratio (cross >> within): Clear boundaries, good separation
    • Low ratio (cross ≈ within): Leaky boundaries, potential misalignment with evolution
    • Ideal: Anisotropy > 1.5 indicates well-defined module boundaries

Interpretation:

RCD metrics reveal the alignment between structural architecture and evolutionary patterns:

  • High RCD_CROSS + Low RCD_ANISOTROPY: Files separated structurally but coupled evolutionarily → Consider merging or reducing coupling
  • High RCD_WITHIN: High internal coupling → Module is cohesive or too large
  • Low RCD_CROSS: Minimal cross-boundary coupling → Good architectural boundaries

The Singularity Score unifies three dimensions (curvature, betweenness, change coupling) into a single prioritization metric for architectural issues.

Mathematical Foundation:

S(e) = z_curvature + z_betweenness + z_coupling

Where each component is a z-score (standardized to mean=0, std=1):

  • z_curvature: How negative the edge curvature is (from FRC/ORC)
  • z_betweenness: How central the edge is in shortest paths
  • z_coupling: How frequently the endpoints change together (from Git)

Components:

  1. Curvature Z-Score: Structural “pain” (negative curvature = bridge)
  2. Betweenness Z-Score: Information flow criticality (high betweenness = chokepoint)
  3. Coupling Z-Score: Evolutionary coupling (frequent co-change = hidden dependency)

Interpretation:

  • High composite score (> 2.0): Critical architectural issue requiring immediate attention

    • Combines structural, information flow, and evolutionary problems
    • Highest ROI for refactoring efforts
  • Moderate score (1.0-2.0): Significant issue worth investigating

    • May have one or two dimensions of concern
  • Low score (< 1.0): Minor or no issue

Use Cases:

  • Priority ranking: Sort refactoring candidates by composite score
  • Hotspot identification: Edges with high scores across all dimensions
  • Pattern validation: Cross-validate canonical neighborhood detection with quantitative scores

Output:

Singularity scores are included in the details section of the metric result, with top edges ranked by composite score. Each edge includes:

  • Individual z-scores for curvature, betweenness, coupling
  • Composite score
  • Source and target nodes
metrics:
- id: ricci_curvature
enabled: true
config:
alpha: 0.5 # ORC idleness parameter
flow_iterations: 10 # Ricci flow steps
flow_step_size: 0.1 # η parameter
pattern_detection: true # Enable CN-1 to CN-5
surgery_suggestions: true
top_edges: 20 # Number of worst edges to report
# RCD (Reduced Change Distance) weights
rcd_alpha_runtime: 1.0 # Weight for structural dependencies
rcd_alpha_coupling: 1.0 # Weight for Git co-change coupling
rcd_alpha_symbols: 0.5 # Weight for symbol usage

The plugin provides detailed output in the details field:

  • top_negative_curvature_edges: Top N edges with worst curvature, including:

    • from, to: Node IDs
    • frc_curvature, orc_curvature: Curvature values
    • surgery_type: Recommended refactoring
  • top_hub_nodes: Top 10 hub nodes with severity scores

  • surgery_suggestions: Prioritized list of refactoring recommendations, now enhanced with singularity scores for better prioritization

  • singularity_scores: Edges ranked by composite score, including:

    • from, to: Node indices
    • curvature_z, betweenness_z, coupling_z: Individual z-scores
    • composite: Combined singularity score
  • rcd_metrics: Summary of Reduced Change Distance analysis:

    • rcd_within: Average edge cost within communities
    • rcd_cross: Average edge cost across boundaries
    • rcd_anisotropy: Boundary clarity metric (cross/within ratio)
  • Forman-Ricci: O(E) - Fast, suitable for large graphs
  • Ollivier-Ricci: O(E·N²) worst case - Accurate but slower
  • Spectral metrics: O(N³) for eigenvalue computation
  • Pattern detection: O(E + V) - Fast
  • RCD computation: O(E + H) where H is Git history size - Fast, scales with edge count
  • Singularity scores: O(E) - Fast, linear in edge count (z-score normalization)

For large graphs (>1000 nodes), consider:

  • Using FRC as primary metric
  • Approximating ORC with Sinkhorn algorithm
  • Using power iteration for λ₂ only
  • RCD and Singularity scores add minimal overhead (~5-10% on top of base curvature computation)
  • Ollivier, Y. (2009). “Ricci curvature of Markov chains on metric spaces”
  • Forman, R. (2003). “Bochner’s method for cell complexes and combinatorial Ricci curvature”
  • Perelman, G. (2002-2003). “Ricci flow with surgery”
  • D’Ambros, M., et al. (2012). “On the interplay between structural and logical coupling in software”
  • Nagappan, N., et al. (2008). “The influence of organizational structure on software quality”

Negative curvature on an edge indicates:

  1. Bridge: Edge connects two clusters with few other connections
  2. Layer violation: Edge crosses architectural boundaries incorrectly
  3. Hub connection: Edge connects to/from a hub node
  • High NEG_SHARE (>30%): Systematic architectural problems
  • High BRIDGE_MASS (>0.08): Architecture held together by bridges
  • High HOTSPOT_CONC (>0.50): Concentrated problems, easier to fix
  • Pattern detection: Specific refactoring opportunities

The plugin provides specific surgery suggestions based on detected patterns:

  • InterfaceExtraction: Create ports.ts to decouple modules
  • DependencyInversion: Apply DIP
  • BreakCycle: Extract types.ts without back-imports
  • RemoveBarrel: Inline barrel exports causing cycles
  • LayerAdapters: Add adapter layer for boundary violations
  • CollapseChain: Merge helper chains into single module
  • SplitHub: Split hub by domain
// The plugin is automatically registered and computed
// Results are available in MetricResult with:
// - values: HashMap of metric keys to values
// - details: JSON with top edges, hubs, and surgery suggestions
  1. Start with FRC: Use Forman-Ricci for initial analysis (fast)
  2. Deep dive with ORC: Use Ollivier-Ricci for critical edges (accurate)
  3. Monitor trends: Track NEG_SHARE and BRIDGE_MASS over time
  4. Prioritize with Singularity Scores: Use composite scores to rank refactoring candidates
    • Focus on edges with composite scores > 2.0 first
    • Singularity scores combine structural, information flow, and evolutionary signals
  5. Analyze RCD metrics: Check alignment between structure and evolution
    • High RCD_ANISOTROPY (> 1.5): Good module boundaries
    • Low RCD_ANISOTROPY (< 1.0): Potential architectural drift
    • Use RCD to validate that structural boundaries match change patterns
  6. Cross-validate patterns: Use multiple signals together
    • Negative curvature + high betweenness + high coupling = critical issue
    • Canonical neighborhoods + singularity scores = high-confidence refactoring targets
  7. Tune RCD weights: Adjust rcd_alpha_* parameters based on your context
    • Emphasize rcd_alpha_coupling if evolutionary coupling is primary concern
    • Increase rcd_alpha_symbols for fine-grained API usage analysis

Priority 2 Features: Entropy & Feature Analysis

Section titled “Priority 2 Features: Entropy & Feature Analysis”

Entropy Metrics (Anti-Gaming Health Indicators)

Section titled “Entropy Metrics (Anti-Gaming Health Indicators)”

These metrics are difficult to manipulate locally and provide stable long-term health signals.

Shannon entropy of degree distribution (normalized)

  • High (> 0.7): Balanced architecture with diverse node roles
  • Medium (0.4-0.7): Typical for layered systems
  • Low (< 0.4): High concentration - few nodes dominate

Use: Track whether architecture becomes more concentrated over time

Shannon entropy of betweenness distribution

  • High: Traffic distributed - resilient
  • Low: Traffic through bottlenecks - fragile

Use: Detect architecture collapsing onto hub nodes

Shannon entropy of edge curvature distribution

  • High: Mixed edge types - complex boundaries
  • Low: Uniform edges (all healthy or all problematic)

Gini coefficient of degree distribution (0=equal, 1=concentrated)

  • < 0.3: Well-balanced
  • 0.3-0.6: Moderate inequality (expected)
  • > 0.6: Dominated by few nodes

Thresholds:

  • ✅ Good: < 0.4
  • ⚠️ Warning: 0.4-0.6
  • ❌ Critical: > 0.6

Share of total degree held by top 10% nodes

  • < 0.3: Distributed
  • 0.3-0.5: Moderate
  • > 0.5: High concentration

Track whether metrics improve or degrade over releases. Helps identify:

  • Trend violations: When metrics regress
  • Architecture health direction: Overall improvement or degradation
  • Early warning signals: Detect problems before they become critical

Implementation: Store snapshots of bridge_mass, neg_share, and hub_concentration over time. Compute violations as the number of times metrics increased (got worse) between releases.

Interpretation:

  • trend_score = 1.0: All metrics improving
  • trend_score = 0.67: 33% of metrics regressing
  • trend_score < 0.5: More regressions than improvements - urgent action needed

When a specific feature/module starts having problems, this workflow identifies:

  1. What changed architecturally (delta metrics)
  2. Where the problem is (root cause edges/nodes)
  3. Why it happened (boundary drifts, new necks)

Key Concepts:

  • delta_ccr: Change in Cross-Change Rate (files changing with other modules)

    • > 0.1: Boundaries eroding
    • > 0.2: Significant violation
  • delta_aniso: Change in anisotropy (cross/within cost ratio)

    • < 0: Boundaries weakening
    • Healthy target: anisotropy > 2.0
  • new_necks: Edges that became thin bottlenecks

  • delta_hotspot_shift: Change in hub concentration

  • new_cycles: New circular dependencies

Edges and nodes ranked by composite score considering:

  • Curvature (structural)
  • Betweenness (information flow)
  • Co-change coupling (evolution)

Identifies pairs of modules that started co-changing, indicating:

  • Wrong boundaries
  • Missing abstractions
  • Need for interface extraction

See: Full documentation for detailed usage examples and API reference.

metrics:
- id: ricci_curvature
enabled: true
config:
# ... existing config ...
# Priority 2: Enable entropy metrics
compute_entropy: true
# Priority 2: Feature analysis (optional)
# feature_analysis:
# enabled: true
# features:
# - name: "checkout"
# path_prefix: "src/features/checkout"
# problem_date: "2024-03-15T00:00:00Z"
policy:
invariants:
# Anti-gaming entropy thresholds
- metric: ricci.degree_gini
op: "<="
value: 0.6
- metric: ricci.concentration_top10
op: "<="
value: 0.5
- metric: ricci.entropy_structural
op: ">="
value: 0.4
  1. Hard to fake locally: Improving entropy requires system-wide changes
  2. Scale-invariant: Normalized metrics work for any system size
  3. Multi-dimensional: Can’t optimize one without considering others
  4. Temporal signal: Monotonicity tracking catches regressions

Use feature-level root cause analysis when:

  • A specific feature/screen suddenly becomes problematic
  • Bug frequency increases in a module
  • Development velocity drops for a feature
  • Multiple developers complain about a subsystem

The analysis will pinpoint:

  • Which dependencies became problematic
  • When the degradation started
  • What type of refactoring will help most