Entropy Metrics and Feature Analysis
Entropy Metrics and Feature Analysis (Priority 2)
Section titled “Entropy Metrics and Feature Analysis (Priority 2)”Overview
Section titled “Overview”This document describes Priority 2 features based on Perelman Papers 1, BUGFIX, and MSR_2:
- Entropy Metrics: Anti-gaming health indicators and long-term architecture health tracking
- Feature-Level Root Cause Analysis: Temporal debugging workflow for problematic features
These features provide temporal analysis and anti-gaming metrics that complement the core Ricci curvature analysis.
Entropy Metrics (Paper 1)
Section titled “Entropy Metrics (Paper 1)”Purpose
Section titled “Purpose”Entropy metrics serve as “anti-gaming” indicators that are difficult to manipulate with local changes. They measure the distribution and concentration of architectural properties, providing stable long-term health signals.
Core Metrics
Section titled “Core Metrics”ricci.entropy_structural
Section titled “ricci.entropy_structural”Shannon entropy of degree distribution (normalized to [0,1])
H = -Σ p_i * log₂(p_i) / log₂(n)where p_i is the probability of degree i
Interpretation:
- High (> 0.7): Nodes have diverse roles - balanced architecture
- Medium (0.4-0.7): Some specialization - typical for layered systems
- Low (< 0.4): High concentration - few nodes dominate
Use case: Track whether architecture is becoming more concentrated over time
ricci.entropy_flow
Section titled “ricci.entropy_flow”Shannon entropy of betweenness centrality distribution
Measures how evenly “flow” (paths) is distributed across nodes.
Interpretation:
- High: Traffic distributed across many nodes - resilient
- Low: Traffic concentrated through few bottlenecks - fragile
Use case: Detect architecture collapsing onto hub nodes
ricci.entropy_curvature
Section titled “ricci.entropy_curvature”Shannon entropy of edge curvature distribution
Measures diversity of edge types (positive vs negative curvature).
Interpretation:
- High: Mixed edge types - complex boundaries
- Low: Uniform edges - either all healthy or all problematic
Use case: Distinguish between “uniformly bad” and “mixed quality” architectures
ricci.degree_gini
Section titled “ricci.degree_gini”Gini coefficient of degree distribution (0=equal, 1=concentrated)
Classic inequality measure applied to node degrees.
Interpretation:
- < 0.3: Fairly equal - well-balanced modules
- 0.3-0.6: Moderate inequality - some hubs expected
- > 0.6: High inequality - architecture dominated by few nodes
Thresholds:
- ✅ Good: < 0.4
- ⚠️ Warning: 0.4-0.6
- ❌ Critical: > 0.6
ricci.concentration_top10
Section titled “ricci.concentration_top10”Share of total degree held by top 10% of nodes
Direct measure of power concentration.
Interpretation:
- < 0.3: Power distributed
- 0.3-0.5: Moderate concentration
- > 0.5: High concentration - few nodes control architecture
Use case: Quick indicator of hub dominance
Monotonicity Tracking
Section titled “Monotonicity Tracking”Track whether metrics improve or degrade over releases.
pub struct MonotonicityReport { pub bridge_mass_increases: usize, // Times bridge_mass got worse pub neg_share_increases: usize, // Times neg_share got worse pub hub_conc_increases: usize, // Times concentration increased pub total_violations: usize, // Total regressions pub trend_score: f64, // 0.0 (all bad) to 1.0 (all good)}Usage:
let snapshots = vec![ SnapshotMetrics { timestamp: Some(1704067200), bridge_mass: 0.05, neg_share: 0.15, hub_concentration: 0.25, structural_entropy: 0.80, }, SnapshotMetrics { timestamp: Some(1706745600), bridge_mass: 0.08, // ❌ Increased neg_share: 0.12, // ✅ Decreased hub_concentration: 0.30, // ❌ Increased structural_entropy: 0.75, },];
let report = compute_monotonicity(&snapshots);// report.total_violations = 2// report.trend_score = 0.67 (2 violations out of 6 comparisons)Why Entropy Metrics are “Anti-Gaming”
Section titled “Why Entropy Metrics are “Anti-Gaming””- Hard to fake locally: Improving entropy requires system-wide changes, not local hacks
- Scale-invariant: Normalized metrics work for small and large systems
- Multi-dimensional: Can’t optimize one without considering others
- Temporal signal: Monotonicity tracking catches regressions
Feature-Level Root Cause Analysis (Papers BUGFIX & MSR_2)
Section titled “Feature-Level Root Cause Analysis (Papers BUGFIX & MSR_2)”Purpose
Section titled “Purpose”When a specific feature/screen/module starts having problems, this workflow helps you understand:
- What changed architecturally (delta metrics)
- Where the problem is (root cause edges/nodes)
- Why it happened (boundary drifts, new necks)
Workflow
Section titled “Workflow”Step 1: Define Feature Slice
Section titled “Step 1: Define Feature Slice”let feature_slice = compute_feature_slice( graph, node_map, "src/features/payments" // Path prefix);// Returns: FeatureSlice with nodes and boundary_edgesStep 2: Define Time Windows
Section titled “Step 2: Define Time Windows”let before_window = TimeRange { start_timestamp: problem_date - 30_days, end_timestamp: problem_date,};
let after_window = TimeRange { start_timestamp: problem_date, end_timestamp: problem_date + 30_days,};Step 3: Compute Delta Metrics
Section titled “Step 3: Compute Delta Metrics”pub struct DeltaMetrics { pub delta_ccr: f64, // Cross-Change Rate change pub delta_aniso: f64, // Anisotropy change pub new_necks: Vec<EdgeInfo>, // New bottlenecks pub delta_hotspot_shift: f64, // Hub concentration change pub delta_bridge_mass: f64, // Bridge mass change pub new_cycles: usize, // New circular dependencies}Key Metrics Explained:
Cross-Change Rate (CCR)
Section titled “Cross-Change Rate (CCR)”% of changes in feature that also touched other modules
delta_ccr > 0.1: Feature boundaries are erodingdelta_ccr > 0.2: Significant boundary violation
Anisotropy
Section titled “Anisotropy”Ratio of cross-module to within-module change cost
- Healthy:
anisotropy > 2.0(crossing boundaries is 2x costlier) - Problematic:
anisotropy < 1.5(boundaries provide no protection) delta_aniso < 0: Boundaries weakening
New Necks
Section titled “New Necks”Edges that became thin bottlenecks (κ < 0, high betweenness)
Each new neck is a candidate for refactoring.
Hotspot Shift
Section titled “Hotspot Shift”Change in concentration of churn/centrality
delta_hotspot_shift > 0.1: Architecture collapsing onto hub files
Step 4: Root Cause Ranking
Section titled “Step 4: Root Cause Ranking”Top Root Cause Edges
Section titled “Top Root Cause Edges”Edges ranked by composite score:
rank_score = |κ| + (betweenness × 10) + (cochange_count × 0.1)Example Output:
{ "from": "src/features/payments/PaymentForm.tsx", "to": "src/shared/api/httpClient.ts", "rank_score": 8.4, "reasons": [ "High negative curvature (bridge)", "High edge betweenness (bottleneck)", "High co-change count (15)" ]}Top Root Cause Nodes
Section titled “Top Root Cause Nodes”Nodes ranked by:
rank_score = curvature_mass_incident + delta_betweennessWhere curvature_mass_incident = sum of |κ| for incident edges with κ < 0
Step 5: Boundary Drifts
Section titled “Step 5: Boundary Drifts”Pairs of modules that started co-changing frequently
pub struct BoundaryDrift { pub cluster1: String, pub cluster2: String, pub cochange_before: usize, pub cochange_after: usize, pub drift_score: f64,}Interpretation:
- Indicates wrong boundaries or missing abstractions
- High drift suggests need for interface extraction
Step 6: Diagnosis
Section titled “Step 6: Diagnosis”Auto-generated summary:
"Significant increase in bridge dependencies;Feature started changing with external modules more frequently;3 new bottleneck edges appeared;Architecture collapsing onto hub modules"Usage Example
Section titled “Usage Example”use crate::metrics::topology::ricci_curvature::feature_analysis::*;
// 1. Define featurelet feature_slice = compute_feature_slice( graph, &node_map, "src/features/checkout");
// 2. Compute metrics before/afterlet before_metrics = FeatureMetrics { cross_change_rate: 0.10, anisotropy: 2.5, hotspot_concentration: 0.20, bridge_mass: 0.04, neck_edges: vec![], cycle_count: 1,};
let after_metrics = FeatureMetrics { cross_change_rate: 0.28, // ❌ Jumped 18 points anisotropy: 1.8, // ❌ Weakened hotspot_concentration: 0.42, // ❌ Doubled bridge_mass: 0.09, // ❌ More than doubled neck_edges: vec![...], // 4 new necks cycle_count: 3, // 2 new cycles};
// 3. Generate root cause reportlet report = generate_root_cause_map( &feature_slice, graph, &node_map, &curvatures, &edge_betweenness, Some(&cochange_data), &before_metrics, &after_metrics, problem_timestamp,);
// 4. Act on top issuesfor edge in &report.top_root_cause_edges { println!("Priority {}: {} -> {}", edge.rank_score, edge.edge_info.from, edge.edge_info.to ); println!("Reasons: {:?}", edge.reasons);}Real-World Scenarios
Section titled “Real-World Scenarios”Scenario 1: Layer Violation
Section titled “Scenario 1: Layer Violation”Signals:
delta_ccrhigh- New necks between UI and infra
- Low
delta_aniso
Diagnosis: UI started directly importing infrastructure Surgery: Introduce ports/adapters layer
Scenario 2: Shared Module Explosion
Section titled “Scenario 2: Shared Module Explosion”Signals:
delta_hotspot_shifthigh- Hub nodes in
shared/orutils/ - High
concentration_top10
Diagnosis: Everything depends on growing shared module Surgery: Split shared by domain
Scenario 3: Feature Coupling
Section titled “Scenario 3: Feature Coupling”Signals:
- High boundary drifts
new_cycles > 0delta_bridge_masshigh
Diagnosis: Two features became coupled through shared state Surgery: Extract common contract/DTO module or merge features
Integration with Existing Metrics
Section titled “Integration with Existing Metrics”Temporal Dashboard
Section titled “Temporal Dashboard”Track over time:
Timestamp | bridge_mass | entropy_struct | trend_score----------|-------------|----------------|------------2024-01 | 0.05 | 0.75 | 1.002024-02 | 0.07 | 0.72 | 0.67 ⚠️2024-03 | 0.06 | 0.74 | 1.00 ✅Feature Health Report
Section titled “Feature Health Report”For each feature:
Feature: src/features/checkoutStatus: ⚠️ DegradingDelta CCR: +0.18 (crossed threshold)New Necks: 4Surgery Priority: HIGH
Top Issues:1. CheckoutForm -> httpClient (score: 8.4) → Create ports.ts interface2. CheckoutFlow -> shared/state (score: 7.2) → Split shared state by domainConfiguration
Section titled “Configuration”Enable Priority 2 features in config:
metrics: - id: ricci_curvature enabled: true config: # Existing Priority 1 config...
# Priority 2: Entropy tracking compute_entropy: true
# Priority 2: Feature analysis (optional) feature_analysis: enabled: true features: - name: "checkout" path_prefix: "src/features/checkout" problem_date: "2024-03-15" - name: "payments" path_prefix: "src/features/payments" problem_date: "2024-03-20"References
Section titled “References”- Paper 1 (RICCI_PERELMAN_1.md): Section 6 - Entropy and monotonicity
- Paper BUGFIX (RICCI_PERELMAN_BUGFIX.md): Complete bugfix workflow
- Paper MSR_2 (RICCI_PERELMAN_MSR_2.md): Delta metrics and temporal analysis
API Reference
Section titled “API Reference”Entropy Module
Section titled “Entropy Module”// Compute all entropy metricslet entropy = EntropyMetrics::compute(graph, curvatures);
// Track monotonicity over snapshotslet report = compute_monotonicity(&snapshots);Feature Analysis Module
Section titled “Feature Analysis Module”// Define feature slicelet slice = compute_feature_slice(graph, node_map, "src/features/X");
// Compute deltalet delta = compute_delta_metrics(&slice, &before, &after);
// Generate full root cause reportlet report = generate_root_cause_map( &slice, graph, node_map, curvatures, edge_betweenness, cochange_data, &before, &after, timestamp);Next Steps
Section titled “Next Steps”After implementing Priority 2, consider:
- Automated temporal tracking: CI pipeline that stores snapshots and computes monotonicity
- Feature health dashboard: Visual timeline of feature metrics
- Predictive alerts: “Feature X shows early degradation signs”
- Surgery effectiveness tracking: Measure before/after surgery impact (Priority 3.6)