Resilience Simulation & Top-N Singularities

Resilience Simulation & Top-N Singularities (Priority 2.0a & 2.0b)

Overview

This document describes two high-impact features that enhance architectural analysis:

Top-N Singularities in UI (0a): Expose the most critical architectural singularities directly in the UI for immediate action
Cut Resilience Simulation (0b): Simulate “what-if” scenarios to identify single points of failure and measure architectural brittleness

Both features are based on concepts from the Perelman papers on Ricci curvature and architectural analysis.

1. Top-N Singularities in UI (Priority 2.0a) ⚡

Concept

From RICCI_PERELMAN_1.md: “Singularities are not ‘bad happened’, they are ‘brewing’ and can be diagnosed.”

Singularity scores combine three z-scored metrics:

Geometric anomaly: Negative curvature (bridging edges)
Structural bottleneck: High betweenness centrality
Change coupling: Co-change frequency from Git history

Formula

For edges:

S(e) = z(-κ(e)) + z(edge_betweenness(e)) + z(Δcoupling(e))

For nodes (hub singularities):

S(n) = z(betweenness(n)) + z(incident_negative_curvature(n)) + z(degree(n))

Implementation

The singularity analysis is computed in src/metrics/ricci_curvature/singularity.rs and exposed through the UI in src/metrics/ricci_curvature/ui.rs.

Configuration

ricci_curvature:
  enabled: true
  top_singularities: 10  # Number of singularities to show in UI (default: 10)

UI Output

The UI now includes a new issue category called “singularities” with:

Edge Singularities:

Rank (1-N)
From/To nodes
Composite score
Component z-scores (curvature, betweenness, coupling)

Node Singularities:

Rank (1-N)
Node ID
Composite score
Component z-scores (betweenness, incident curvature, degree)

Interpretation

High Composite Score (> 3.0): Critical architectural issue requiring immediate attention

Combines multiple risk factors
High probability of causing problems during changes
Should be prioritized for refactoring

Medium Score (1.5 - 3.0): Worth investigating

May cause issues under certain conditions
Consider refactoring if in critical path

Low Score (< 1.5): Monitor but not urgent

Example Output

{
  "singularities": {
    "count": 25,
    "description": "Critical architectural points combining curvature, betweenness, and change coupling",
    "examples": [
      {
        "rank": 1,
        "from": "src/core/database.ts",
        "to": "src/api/handlers.ts",
        "composite_score": "4.23",
        "curvature_z": "2.1",
        "betweenness_z": "1.8",
        "coupling_z": "0.33"
      },
      {
        "rank": 2,
        "node_id": "src/utils/helpers.ts",
        "composite_score": "3.87",
        "betweenness_z": "2.3",
        "incident_curvature_z": "1.2",
        "degree_z": "0.37"
      }
    ]
  }
}

Metrics Exposed

ricci.singularity_edge_count: Total number of edge singularities detected
ricci.singularity_node_count: Total number of node singularities detected
ricci.singularity_avg_edge_score: Average composite score for edge singularities

2. Cut Resilience Simulation (Priority 2.0b) ⚡

Concept

From RICCI_PERELMAN_3.md: “Simulating edge/node removal to measure architectural brittleness and critical point detection.”

This feature performs “what-if” analysis by:

Identifying high-betweenness edges and nodes
Simulating their removal from the graph
Measuring the impact on connectivity and structure
Identifying Single Points of Failure (SPOFs)

Implementation

The resilience simulation is implemented in src/metrics/ricci_curvature/resilience.rs.

Configuration

ricci_curvature:
  enabled: true
  enable_resilience: true  # Enable resilience simulation (default: true)
  resilience_top_edges: 5  # Number of edges to simulate (default: 5)
  resilience_top_nodes: 3  # Number of nodes to simulate (default: 3)

Simulation Process

For each high-betweenness edge/node:

Clone the graph and remove the element
Analyze connectivity:
- Count connected components
- Measure largest component size
- Detect disconnection
Recompute betweenness on modified graph
Calculate impact score (0-1, higher = worse):
- Disconnection penalty: 0.5
- Component fragmentation: 0.05 per component
- Isolation penalty: 0.3 × (1 - largest_component_fraction)
- Betweenness redistribution: up to 0.2

Metrics Computed

For each simulation:

removed: ID of removed edge/node
type: “Edge” or “Node”
disconnected: Boolean - did removal disconnect the graph?
component_count: Number of components after removal
largest_component_fraction: Size of largest component (0-1)
impact_score: Overall impact (0-1, higher = worse)
nodes_affected: Number of nodes isolated or in small components

Overall metrics:

ricci.resilience_overall: Overall resilience score (0-1, higher = more resilient)
ricci.resilience_spof_count: Number of single points of failure
ricci.resilience_avg_impact: Average impact score across simulations

Interpretation

Overall Resilience Score:

> 0.8: Highly resilient - no critical single points of failure
0.6 - 0.8: Moderately resilient - some weak points but manageable
< 0.6: Fragile architecture - multiple SPOFs detected

SPOF Count:

0: Excellent - architecture can tolerate any single failure
1-2: Good - only a few critical dependencies
3+: Warning - architecture is brittle and risky

Impact Score (per simulation):

> 0.7: Critical SPOF - removal would severely damage architecture
0.4 - 0.7: Significant impact - removal would cause problems
< 0.4: Tolerable - removal would have limited impact

Example Output

{
  "resilience": {
    "count": 2,
    "description": "Single points of failure that would disconnect the architecture",
    "examples": [
      {
        "removed": "src/core/router.ts -> src/api/handlers.ts",
        "type": "Edge",
        "disconnected": true,
        "component_count": 3,
        "largest_component_fraction": "0.65",
        "impact_score": "0.82",
        "nodes_affected": 15
      },
      {
        "removed": "src/utils/helpers.ts",
        "type": "Node",
        "disconnected": true,
        "component_count": 2,
        "largest_component_fraction": "0.78",
        "impact_score": "0.71",
        "nodes_affected": 9
      }
    ]
  }
}

Use Cases

Risk Assessment: Identify which modules/dependencies are critical to system integrity
Refactoring Priority: Focus on reducing SPOFs before other improvements
Architectural Planning: Design redundancy for high-impact components
Team Coordination: Ensure critical modules have multiple maintainers
Migration Planning: Understand dependencies before major refactors

Integration with Surgery Recommendations

Both features integrate with the surgery recommendation system:

Singularities are used to prioritize surgery suggestions (see enhance_surgery_with_singularity)
Resilience SPOFs should be addressed through:
- Interface Extraction: Create alternative paths
- Dependency Inversion: Reduce coupling to critical nodes
- Split Hub: Distribute load from critical hub nodes

Testing

Unit Tests

Tests are located in:

src/metrics/ricci_curvature/singularity.rs (existing tests)
src/metrics/ricci_curvature/resilience.rs (new tests)

Example Test Cases

Singularity Tests:

Z-score computation
Edge singularity ranking
Node singularity detection
Composite score calculation

Resilience Tests:

Connectivity analysis
Edge cut simulation
Node cut simulation
Impact score calculation
SPOF detection

Performance Considerations

Singularity Analysis:

Complexity: O(E + N) for z-score computation
Memory: O(E + N) for storing scores
Impact: Minimal - runs once per analysis

Resilience Simulation:

Complexity: O(k × (N + E)) where k = number of simulations
Memory: O(N + E) per simulation (graph cloning)
Impact: Moderate - can be disabled if performance is critical

Recommendations:

Keep resilience_top_edges and resilience_top_nodes small (< 10)
Disable resilience simulation for very large graphs (> 10,000 nodes)
Run resilience analysis periodically, not on every commit

Configuration Examples

Minimal (Fast)

ricci_curvature:
  enabled: true
  top_singularities: 5
  enable_resilience: false

Balanced (Recommended)

ricci_curvature:
  enabled: true
  top_singularities: 10
  enable_resilience: true
  resilience_top_edges: 5
  resilience_top_nodes: 3

Comprehensive (Thorough)

ricci_curvature:
  enabled: true
  top_singularities: 20
  enable_resilience: true
  resilience_top_edges: 10
  resilience_top_nodes: 5

References

RICCI_PERELMAN_1.md: Singularity score formulation
RICCI_PERELMAN_3.md: Cut resilience simulation concept
RICCI_CURVATURE.md: Core Ricci curvature metrics
CROSS_METRIC_INSIGHTS.md: Integration with other metrics

Future Enhancements

Cascade Analysis: Simulate multiple simultaneous failures
Recovery Paths: Suggest alternative paths when SPOFs are removed
Temporal Resilience: Track resilience changes over time
Team Impact: Map SPOFs to team ownership for risk management
Automated Refactoring: Generate specific code changes to reduce SPOFs