Entropy Metrics and Feature Analysis

Entropy Metrics and Feature Analysis (Priority 2)

Overview

This document describes Priority 2 features based on Perelman Papers 1, BUGFIX, and MSR_2:

Entropy Metrics: Anti-gaming health indicators and long-term architecture health tracking
Feature-Level Root Cause Analysis: Temporal debugging workflow for problematic features

These features provide temporal analysis and anti-gaming metrics that complement the core Ricci curvature analysis.

Entropy Metrics (Paper 1)

Purpose

Entropy metrics serve as “anti-gaming” indicators that are difficult to manipulate with local changes. They measure the distribution and concentration of architectural properties, providing stable long-term health signals.

Core Metrics

`ricci.entropy_structural`

Shannon entropy of degree distribution (normalized to [0,1])

H = -Σ p_i * log₂(p_i) / log₂(n)

where p_i is the probability of degree i

Interpretation:

High (> 0.7): Nodes have diverse roles - balanced architecture
Medium (0.4-0.7): Some specialization - typical for layered systems
Low (< 0.4): High concentration - few nodes dominate

Use case: Track whether architecture is becoming more concentrated over time

`ricci.entropy_flow`

Shannon entropy of betweenness centrality distribution

Measures how evenly “flow” (paths) is distributed across nodes.

Interpretation:

High: Traffic distributed across many nodes - resilient
Low: Traffic concentrated through few bottlenecks - fragile

Use case: Detect architecture collapsing onto hub nodes

`ricci.entropy_curvature`

Shannon entropy of edge curvature distribution

Measures diversity of edge types (positive vs negative curvature).

Interpretation:

High: Mixed edge types - complex boundaries
Low: Uniform edges - either all healthy or all problematic

Use case: Distinguish between “uniformly bad” and “mixed quality” architectures

`ricci.degree_gini`

Gini coefficient of degree distribution (0=equal, 1=concentrated)

Classic inequality measure applied to node degrees.

Interpretation:

< 0.3: Fairly equal - well-balanced modules
0.3-0.6: Moderate inequality - some hubs expected
> 0.6: High inequality - architecture dominated by few nodes

Thresholds:

✅ Good: < 0.4
⚠️ Warning: 0.4-0.6
❌ Critical: > 0.6

`ricci.concentration_top10`

Share of total degree held by top 10% of nodes

Direct measure of power concentration.

Interpretation:

< 0.3: Power distributed
0.3-0.5: Moderate concentration
> 0.5: High concentration - few nodes control architecture

Use case: Quick indicator of hub dominance

Monotonicity Tracking

Track whether metrics improve or degrade over releases.

pub struct MonotonicityReport {
    pub bridge_mass_increases: usize,    // Times bridge_mass got worse
    pub neg_share_increases: usize,      // Times neg_share got worse
    pub hub_conc_increases: usize,       // Times concentration increased
    pub total_violations: usize,         // Total regressions
    pub trend_score: f64,                // 0.0 (all bad) to 1.0 (all good)
}

Usage:

let snapshots = vec![
    SnapshotMetrics {
        timestamp: Some(1704067200),
        bridge_mass: 0.05,
        neg_share: 0.15,
        hub_concentration: 0.25,
        structural_entropy: 0.80,
    },
    SnapshotMetrics {
        timestamp: Some(1706745600),
        bridge_mass: 0.08,  // ❌ Increased
        neg_share: 0.12,    // ✅ Decreased
        hub_concentration: 0.30, // ❌ Increased
        structural_entropy: 0.75,
    },
];

let report = compute_monotonicity(&snapshots);
// report.total_violations = 2
// report.trend_score = 0.67 (2 violations out of 6 comparisons)

Why Entropy Metrics are “Anti-Gaming”

Hard to fake locally: Improving entropy requires system-wide changes, not local hacks
Scale-invariant: Normalized metrics work for small and large systems
Multi-dimensional: Can’t optimize one without considering others
Temporal signal: Monotonicity tracking catches regressions

Feature-Level Root Cause Analysis (Papers BUGFIX & MSR_2)

Purpose

When a specific feature/screen/module starts having problems, this workflow helps you understand:

What changed architecturally (delta metrics)
Where the problem is (root cause edges/nodes)
Why it happened (boundary drifts, new necks)

Workflow

Step 1: Define Feature Slice

let feature_slice = compute_feature_slice(
    graph,
    node_map,
    "src/features/payments"  // Path prefix
);
// Returns: FeatureSlice with nodes and boundary_edges

Step 2: Define Time Windows

let before_window = TimeRange {
    start_timestamp: problem_date - 30_days,
    end_timestamp: problem_date,
};

let after_window = TimeRange {
    start_timestamp: problem_date,
    end_timestamp: problem_date + 30_days,
};

Step 3: Compute Delta Metrics

pub struct DeltaMetrics {
    pub delta_ccr: f64,              // Cross-Change Rate change
    pub delta_aniso: f64,            // Anisotropy change
    pub new_necks: Vec<EdgeInfo>,    // New bottlenecks
    pub delta_hotspot_shift: f64,    // Hub concentration change
    pub delta_bridge_mass: f64,      // Bridge mass change
    pub new_cycles: usize,           // New circular dependencies
}

Key Metrics Explained:

Cross-Change Rate (CCR)

% of changes in feature that also touched other modules

delta_ccr > 0.1: Feature boundaries are eroding
delta_ccr > 0.2: Significant boundary violation

Anisotropy

Ratio of cross-module to within-module change cost

Healthy: anisotropy > 2.0 (crossing boundaries is 2x costlier)
Problematic: anisotropy < 1.5 (boundaries provide no protection)
delta_aniso < 0: Boundaries weakening

New Necks

Edges that became thin bottlenecks (κ < 0, high betweenness)

Each new neck is a candidate for refactoring.

Hotspot Shift

Change in concentration of churn/centrality

delta_hotspot_shift > 0.1: Architecture collapsing onto hub files

Step 4: Root Cause Ranking

Top Root Cause Edges

Edges ranked by composite score:

rank_score = |κ| + (betweenness × 10) + (cochange_count × 0.1)

Example Output:

{
  "from": "src/features/payments/PaymentForm.tsx",
  "to": "src/shared/api/httpClient.ts",
  "rank_score": 8.4,
  "reasons": [
    "High negative curvature (bridge)",
    "High edge betweenness (bottleneck)",
    "High co-change count (15)"
  ]
}

Top Root Cause Nodes

Nodes ranked by:

rank_score = curvature_mass_incident + delta_betweenness

Where curvature_mass_incident = sum of |κ| for incident edges with κ < 0

Step 5: Boundary Drifts

Pairs of modules that started co-changing frequently

pub struct BoundaryDrift {
    pub cluster1: String,
    pub cluster2: String,
    pub cochange_before: usize,
    pub cochange_after: usize,
    pub drift_score: f64,
}

Interpretation:

Indicates wrong boundaries or missing abstractions
High drift suggests need for interface extraction

Step 6: Diagnosis

Auto-generated summary:

"Significant increase in bridge dependencies;
Feature started changing with external modules more frequently;
3 new bottleneck edges appeared;
Architecture collapsing onto hub modules"

Usage Example

use crate::metrics::topology::ricci_curvature::feature_analysis::*;

// 1. Define feature
let feature_slice = compute_feature_slice(
    graph,
    &node_map,
    "src/features/checkout"
);

// 2. Compute metrics before/after
let before_metrics = FeatureMetrics {
    cross_change_rate: 0.10,
    anisotropy: 2.5,
    hotspot_concentration: 0.20,
    bridge_mass: 0.04,
    neck_edges: vec![],
    cycle_count: 1,
};

let after_metrics = FeatureMetrics {
    cross_change_rate: 0.28,  // ❌ Jumped 18 points
    anisotropy: 1.8,          // ❌ Weakened
    hotspot_concentration: 0.42, // ❌ Doubled
    bridge_mass: 0.09,        // ❌ More than doubled
    neck_edges: vec![...],    // 4 new necks
    cycle_count: 3,           // 2 new cycles
};

// 3. Generate root cause report
let report = generate_root_cause_map(
    &feature_slice,
    graph,
    &node_map,
    &curvatures,
    &edge_betweenness,
    Some(&cochange_data),
    &before_metrics,
    &after_metrics,
    problem_timestamp,
);

// 4. Act on top issues
for edge in &report.top_root_cause_edges {
    println!("Priority {}: {} -> {}",
        edge.rank_score,
        edge.edge_info.from,
        edge.edge_info.to
    );
    println!("Reasons: {:?}", edge.reasons);
}

Real-World Scenarios

Scenario 1: Layer Violation

Signals:

delta_ccr high
New necks between UI and infra
Low delta_aniso

Diagnosis: UI started directly importing infrastructure Surgery: Introduce ports/adapters layer

Scenario 2: Shared Module Explosion

Signals:

delta_hotspot_shift high
Hub nodes in shared/ or utils/
High concentration_top10

Diagnosis: Everything depends on growing shared module Surgery: Split shared by domain

Scenario 3: Feature Coupling

Signals:

High boundary drifts
new_cycles > 0
delta_bridge_mass high

Diagnosis: Two features became coupled through shared state Surgery: Extract common contract/DTO module or merge features

Integration with Existing Metrics

Temporal Dashboard

Track over time:

Timestamp | bridge_mass | entropy_struct | trend_score
----------|-------------|----------------|------------
2024-01   | 0.05       | 0.75          | 1.00
2024-02   | 0.07       | 0.72          | 0.67  ⚠️
2024-03   | 0.06       | 0.74          | 1.00  ✅

Feature Health Report

For each feature:

Feature: src/features/checkout
Status: ⚠️ Degrading
Delta CCR: +0.18 (crossed threshold)
New Necks: 4
Surgery Priority: HIGH

Top Issues:
1. CheckoutForm -> httpClient (score: 8.4)
   → Create ports.ts interface
2. CheckoutFlow -> shared/state (score: 7.2)
   → Split shared state by domain

Configuration

Enable Priority 2 features in config:

metrics:
  - id: ricci_curvature
    enabled: true
    config:
      # Existing Priority 1 config...

      # Priority 2: Entropy tracking
      compute_entropy: true

      # Priority 2: Feature analysis (optional)
      feature_analysis:
        enabled: true
        features:
          - name: "checkout"
            path_prefix: "src/features/checkout"
            problem_date: "2024-03-15"
          - name: "payments"
            path_prefix: "src/features/payments"
            problem_date: "2024-03-20"

References

Paper 1 (RICCI_PERELMAN_1.md): Section 6 - Entropy and monotonicity
Paper BUGFIX (RICCI_PERELMAN_BUGFIX.md): Complete bugfix workflow
Paper MSR_2 (RICCI_PERELMAN_MSR_2.md): Delta metrics and temporal analysis

API Reference

Entropy Module

// Compute all entropy metrics
let entropy = EntropyMetrics::compute(graph, curvatures);

// Track monotonicity over snapshots
let report = compute_monotonicity(&snapshots);

Feature Analysis Module

// Define feature slice
let slice = compute_feature_slice(graph, node_map, "src/features/X");

// Compute delta
let delta = compute_delta_metrics(&slice, &before, &after);

// Generate full root cause report
let report = generate_root_cause_map(
    &slice, graph, node_map, curvatures,
    edge_betweenness, cochange_data,
    &before, &after, timestamp
);

Next Steps

After implementing Priority 2, consider:

Automated temporal tracking: CI pipeline that stores snapshots and computes monotonicity
Feature health dashboard: Visual timeline of feature metrics
Predictive alerts: “Feature X shows early degradation signs”
Surgery effectiveness tracking: Measure before/after surgery impact (Priority 3.6)