Skip to content
Arxo Arxo

Entropy Metrics and Feature Analysis

Entropy Metrics and Feature Analysis (Priority 2)

Section titled “Entropy Metrics and Feature Analysis (Priority 2)”

This document describes Priority 2 features based on Perelman Papers 1, BUGFIX, and MSR_2:

  • Entropy Metrics: Anti-gaming health indicators and long-term architecture health tracking
  • Feature-Level Root Cause Analysis: Temporal debugging workflow for problematic features

These features provide temporal analysis and anti-gaming metrics that complement the core Ricci curvature analysis.

Entropy metrics serve as “anti-gaming” indicators that are difficult to manipulate with local changes. They measure the distribution and concentration of architectural properties, providing stable long-term health signals.

Shannon entropy of degree distribution (normalized to [0,1])

H = -Σ p_i * log₂(p_i) / log₂(n)

where p_i is the probability of degree i

Interpretation:

  • High (> 0.7): Nodes have diverse roles - balanced architecture
  • Medium (0.4-0.7): Some specialization - typical for layered systems
  • Low (< 0.4): High concentration - few nodes dominate

Use case: Track whether architecture is becoming more concentrated over time

Shannon entropy of betweenness centrality distribution

Measures how evenly “flow” (paths) is distributed across nodes.

Interpretation:

  • High: Traffic distributed across many nodes - resilient
  • Low: Traffic concentrated through few bottlenecks - fragile

Use case: Detect architecture collapsing onto hub nodes

Shannon entropy of edge curvature distribution

Measures diversity of edge types (positive vs negative curvature).

Interpretation:

  • High: Mixed edge types - complex boundaries
  • Low: Uniform edges - either all healthy or all problematic

Use case: Distinguish between “uniformly bad” and “mixed quality” architectures

Gini coefficient of degree distribution (0=equal, 1=concentrated)

Classic inequality measure applied to node degrees.

Interpretation:

  • < 0.3: Fairly equal - well-balanced modules
  • 0.3-0.6: Moderate inequality - some hubs expected
  • > 0.6: High inequality - architecture dominated by few nodes

Thresholds:

  • ✅ Good: < 0.4
  • ⚠️ Warning: 0.4-0.6
  • ❌ Critical: > 0.6

Share of total degree held by top 10% of nodes

Direct measure of power concentration.

Interpretation:

  • < 0.3: Power distributed
  • 0.3-0.5: Moderate concentration
  • > 0.5: High concentration - few nodes control architecture

Use case: Quick indicator of hub dominance

Track whether metrics improve or degrade over releases.

pub struct MonotonicityReport {
pub bridge_mass_increases: usize, // Times bridge_mass got worse
pub neg_share_increases: usize, // Times neg_share got worse
pub hub_conc_increases: usize, // Times concentration increased
pub total_violations: usize, // Total regressions
pub trend_score: f64, // 0.0 (all bad) to 1.0 (all good)
}

Usage:

let snapshots = vec![
SnapshotMetrics {
timestamp: Some(1704067200),
bridge_mass: 0.05,
neg_share: 0.15,
hub_concentration: 0.25,
structural_entropy: 0.80,
},
SnapshotMetrics {
timestamp: Some(1706745600),
bridge_mass: 0.08, // ❌ Increased
neg_share: 0.12, // ✅ Decreased
hub_concentration: 0.30, // ❌ Increased
structural_entropy: 0.75,
},
];
let report = compute_monotonicity(&snapshots);
// report.total_violations = 2
// report.trend_score = 0.67 (2 violations out of 6 comparisons)
  1. Hard to fake locally: Improving entropy requires system-wide changes, not local hacks
  2. Scale-invariant: Normalized metrics work for small and large systems
  3. Multi-dimensional: Can’t optimize one without considering others
  4. Temporal signal: Monotonicity tracking catches regressions

Feature-Level Root Cause Analysis (Papers BUGFIX & MSR_2)

Section titled “Feature-Level Root Cause Analysis (Papers BUGFIX & MSR_2)”

When a specific feature/screen/module starts having problems, this workflow helps you understand:

  1. What changed architecturally (delta metrics)
  2. Where the problem is (root cause edges/nodes)
  3. Why it happened (boundary drifts, new necks)
let feature_slice = compute_feature_slice(
graph,
node_map,
"src/features/payments" // Path prefix
);
// Returns: FeatureSlice with nodes and boundary_edges
let before_window = TimeRange {
start_timestamp: problem_date - 30_days,
end_timestamp: problem_date,
};
let after_window = TimeRange {
start_timestamp: problem_date,
end_timestamp: problem_date + 30_days,
};
pub struct DeltaMetrics {
pub delta_ccr: f64, // Cross-Change Rate change
pub delta_aniso: f64, // Anisotropy change
pub new_necks: Vec<EdgeInfo>, // New bottlenecks
pub delta_hotspot_shift: f64, // Hub concentration change
pub delta_bridge_mass: f64, // Bridge mass change
pub new_cycles: usize, // New circular dependencies
}

Key Metrics Explained:

% of changes in feature that also touched other modules

  • delta_ccr > 0.1: Feature boundaries are eroding
  • delta_ccr > 0.2: Significant boundary violation

Ratio of cross-module to within-module change cost

  • Healthy: anisotropy > 2.0 (crossing boundaries is 2x costlier)
  • Problematic: anisotropy < 1.5 (boundaries provide no protection)
  • delta_aniso < 0: Boundaries weakening

Edges that became thin bottlenecks (κ < 0, high betweenness)

Each new neck is a candidate for refactoring.

Change in concentration of churn/centrality

  • delta_hotspot_shift > 0.1: Architecture collapsing onto hub files

Edges ranked by composite score:

rank_score = |κ| + (betweenness × 10) + (cochange_count × 0.1)

Example Output:

{
"from": "src/features/payments/PaymentForm.tsx",
"to": "src/shared/api/httpClient.ts",
"rank_score": 8.4,
"reasons": [
"High negative curvature (bridge)",
"High edge betweenness (bottleneck)",
"High co-change count (15)"
]
}

Nodes ranked by:

rank_score = curvature_mass_incident + delta_betweenness

Where curvature_mass_incident = sum of |κ| for incident edges with κ < 0

Pairs of modules that started co-changing frequently

pub struct BoundaryDrift {
pub cluster1: String,
pub cluster2: String,
pub cochange_before: usize,
pub cochange_after: usize,
pub drift_score: f64,
}

Interpretation:

  • Indicates wrong boundaries or missing abstractions
  • High drift suggests need for interface extraction

Auto-generated summary:

"Significant increase in bridge dependencies;
Feature started changing with external modules more frequently;
3 new bottleneck edges appeared;
Architecture collapsing onto hub modules"
use crate::metrics::topology::ricci_curvature::feature_analysis::*;
// 1. Define feature
let feature_slice = compute_feature_slice(
graph,
&node_map,
"src/features/checkout"
);
// 2. Compute metrics before/after
let before_metrics = FeatureMetrics {
cross_change_rate: 0.10,
anisotropy: 2.5,
hotspot_concentration: 0.20,
bridge_mass: 0.04,
neck_edges: vec![],
cycle_count: 1,
};
let after_metrics = FeatureMetrics {
cross_change_rate: 0.28, // ❌ Jumped 18 points
anisotropy: 1.8, // ❌ Weakened
hotspot_concentration: 0.42, // ❌ Doubled
bridge_mass: 0.09, // ❌ More than doubled
neck_edges: vec![...], // 4 new necks
cycle_count: 3, // 2 new cycles
};
// 3. Generate root cause report
let report = generate_root_cause_map(
&feature_slice,
graph,
&node_map,
&curvatures,
&edge_betweenness,
Some(&cochange_data),
&before_metrics,
&after_metrics,
problem_timestamp,
);
// 4. Act on top issues
for edge in &report.top_root_cause_edges {
println!("Priority {}: {} -> {}",
edge.rank_score,
edge.edge_info.from,
edge.edge_info.to
);
println!("Reasons: {:?}", edge.reasons);
}

Signals:

  • delta_ccr high
  • New necks between UI and infra
  • Low delta_aniso

Diagnosis: UI started directly importing infrastructure Surgery: Introduce ports/adapters layer

Signals:

  • delta_hotspot_shift high
  • Hub nodes in shared/ or utils/
  • High concentration_top10

Diagnosis: Everything depends on growing shared module Surgery: Split shared by domain

Signals:

  • High boundary drifts
  • new_cycles > 0
  • delta_bridge_mass high

Diagnosis: Two features became coupled through shared state Surgery: Extract common contract/DTO module or merge features

Track over time:

Timestamp | bridge_mass | entropy_struct | trend_score
----------|-------------|----------------|------------
2024-01 | 0.05 | 0.75 | 1.00
2024-02 | 0.07 | 0.72 | 0.67 ⚠️
2024-03 | 0.06 | 0.74 | 1.00 ✅

For each feature:

Feature: src/features/checkout
Status: ⚠️ Degrading
Delta CCR: +0.18 (crossed threshold)
New Necks: 4
Surgery Priority: HIGH
Top Issues:
1. CheckoutForm -> httpClient (score: 8.4)
→ Create ports.ts interface
2. CheckoutFlow -> shared/state (score: 7.2)
→ Split shared state by domain

Enable Priority 2 features in config:

metrics:
- id: ricci_curvature
enabled: true
config:
# Existing Priority 1 config...
# Priority 2: Entropy tracking
compute_entropy: true
# Priority 2: Feature analysis (optional)
feature_analysis:
enabled: true
features:
- name: "checkout"
path_prefix: "src/features/checkout"
problem_date: "2024-03-15"
- name: "payments"
path_prefix: "src/features/payments"
problem_date: "2024-03-20"
  • Paper 1 (RICCI_PERELMAN_1.md): Section 6 - Entropy and monotonicity
  • Paper BUGFIX (RICCI_PERELMAN_BUGFIX.md): Complete bugfix workflow
  • Paper MSR_2 (RICCI_PERELMAN_MSR_2.md): Delta metrics and temporal analysis
// Compute all entropy metrics
let entropy = EntropyMetrics::compute(graph, curvatures);
// Track monotonicity over snapshots
let report = compute_monotonicity(&snapshots);
// Define feature slice
let slice = compute_feature_slice(graph, node_map, "src/features/X");
// Compute delta
let delta = compute_delta_metrics(&slice, &before, &after);
// Generate full root cause report
let report = generate_root_cause_map(
&slice, graph, node_map, curvatures,
edge_betweenness, cochange_data,
&before, &after, timestamp
);

After implementing Priority 2, consider:

  1. Automated temporal tracking: CI pipeline that stores snapshots and computes monotonicity
  2. Feature health dashboard: Visual timeline of feature metrics
  3. Predictive alerts: “Feature X shows early degradation signs”
  4. Surgery effectiveness tracking: Measure before/after surgery impact (Priority 3.6)