Tutorial — Anomaly Detection with Topology

1

Data loading

The dataset: industrial bearing sensor signals

2,000 samples from a bearing spinning at 1,480 rpm, with vibrations on three axes, temperature and RPM recorded every millisecond. 60 samples correspond to real fault states.

Dataset loaded in InVariants

2,000 rows · 8 columns · no missing values. Numeric columns: vib_x, vib_y, vib_z, temperatura, rpm. Target: anomalia.

2,000

Total samples

60

Anomalies (3%)

Column	Normal μ	Fault μ	Δ
vib_x	0.009 g	0.089 g	+0.08 g
temperatura	61.97 °C	62.07 °C	+0.1 °C
rpm	1480.2	1480.2	0

The vibration difference is 0.08 g over a ±3 g range. Temperature: 0.1 degree. RPM: identical. No statistical threshold will catch this.

2

Classical statistics · fails

EDA: a bimodal distribution hiding two populations

The histogram of vib_x reveals two peaks — a hint that two populations coexist in the data. But they overlap so heavily that no classical method can exploit this structure.

Histogram of vib_x — bimodal, but inseparable by statistics

Mean ≈ 0.008 g · Std ≈ 1.98 g · kurtosis −1.48 (platykurtic — flatter than normal, consistent with two overlapping populations). Two peaks are clearly visible, yet fault samples are distributed across the entire range alongside normal data. The distributions of temperatura and rpm are even more indistinguishable.

Two populations exist — but statistics cannot separate them

The bimodal shape is a structural clue, but classical first-order metrics (mean, std, range) are virtually identical between normal and fault samples. Any threshold on individual variable values will either miss most faults or generate unacceptable false-positive rates.

3

Z-Score · fails

Z-Score: 0.7% detected, 97% of faults reach production undetected

The most widely deployed method for sensor alert systems. With a ±3σ threshold, it flags only 13 rows out of 2,000. Most are not real faults.

Z-Score: 13 outliers detected out of 2000

EDA → Outliers → Z-Score (±3σ)

13 outliers detected out of 2,000 total rows (0.7%). The "No IQR outliers detected" message confirms there are also no extreme values by interquartile range.

13

Flagged by Z-score

0.7%

Detection fraction

60

Actual faults

"Z-score misses 98% of the anomalies. Topology detects them because it measures the SHAPE of the data, not its values."

Z-score detects deviations in the magnitude of each variable independently. When faults are subtle — 0.08 g difference in vibration — they fall within the normal percentile range. The method has no mechanism to detect changes in the geometric relationship between variables.

Z-Score: effective detection rate < 2%

Deploying z-score alerts on this dataset means more than 9 in 10 faults reach production without an alarm. This is not a threshold-tuning problem — it is a fundamental limitation of the method.

4

Dimensionality reduction · fails

PCA: anomalies lost inside the point cloud

Reducing to 2 principal components preserves maximum variance — but not the topological structure that identifies faults.

PCA 2D — anomalies mixed into normal cloud

PCA 2D colored by anomalia

Purple: normal operation · Yellow: faults. The PCA projection does not separate any of the 60 faults.

PCA projects data in the direction of maximum variance. If faults produce no extra variance — they only alter the internal correlation structure — the projection mixes them with normal data. That is exactly what happens here.

≈ 0

Separation in PCA

~50%

Fault overlap

Yellow points (anomalies) are completely mixed into the purple cloud (normal data). No linear classifier — no distance threshold — can separate them in this space.

Linear reduction: blind to the geometry of the problem

PCA, and any linear projection method, cannot distinguish the faults because the difference between normal and fault states is not in the variance — it is in the curvature of the bearing's orbit.

5

Phase Portrait · reveals the problem

The orbit: the topological signature of a healthy bearing

Plotting vib_x against vib_y in time order reveals an ellipse. When the bearing spins normally, the two axes are 90° out of phase. Faults break that phase quadrature.

Phase Portrait vib_x vs vib_y — ellipse with anomalies outside

Phase Portrait: vib_x ↔ vib_y · colored by anomalia

The blue/purple cloud forms an ellipse — the vibration orbit during normal operation. The red points (anomalies) scatter outside the ellipse or deform its boundary.

Why an ellipse? In a healthy bearing, X and Y vibration are 90° out of phase — like the components of circular motion. Plotting one against the other traces the orbit: a topologically persistent loop, H₁.

H₁

Topological feature

90°

Phase shift — healthy

This is the first visual confirmation that the faults are detectable — just not through individual variable values. The topology of the point cloud — the ellipse, the loop, the hole in phase space — is the structure that changes with a fault.

The insight: faults break the topological orbit

The difference between normal and faulty states is not in the values of vib_x or vib_y individually, but in their geometric relationship. Algebraic topology has the exact mathematical tools to quantify this difference robustly and without labeled examples.

6

Sliding Window TDA · detects all faults

Persistent Homology over time: all 4 fault events found

The algorithm slides an 80-sample window over the time series, computes the topology of each window, and tracks the persistence of the H₁ loop. When the bearing fails, the loop disappears.

Sliding Window TDA — H1 drops at anomaly windows

Sliding Window TDA — H₁ persistence over time

Purple line: H₁ loop persistence in each window. Pink regions: windows with real anomalies (ground truth). The 4 sharp H₁ drops align exactly with the 4 fault events at t ≈ 2.5k, 5k, 10k, and 17k ms.

How the detector works

During normal operation, each 80-sample window contains ~1.5 complete orbits. High H₁ = stable loop.

When the bearing fails, the orbit deforms or collapses. H₁ drops sharply — the detector fires.

No training labels. No manual threshold. The H₁ level during normal operation is the automatic reference.

4 / 4

Faults detected

100%

Recall

0

False negatives

"We don't need to label faults to calibrate this detector. The H₁ level during normal operation is the reference. Any drop below that level is an alarm. No manual threshold. No trained model. Pure mathematics."

Topological TDA: 100% recall with zero labels

All 4 fault events in the sensor dataset are detected by the H₁ persistence drop. The same algorithm, without retraining, would generalize to any fault type that alters the bearing's orbital geometry — including fault modes never seen before.

Method comparison

Same dataset, radically different results

All methods run inside InVariants on identical data. The difference is the type of information each one extracts.

Method	Type	Detects the 60 faults?	Requires labels	Why it fails / works
Z-Score (±3σ)	Statistical	13/2,000 (0.7%)	No	Faults produce no extreme univariate values
IQR	Statistical	0 detected	No	Identical marginal distribution in both classes
PCA 2D	Linear reduction	No separation	No	Faults generate no additional variance
Phase Portrait	Geometric visualization	Visual only, no score	No	Reveals the orbit but gives no automated score
Sliding Window TDA	Topological	4/4 events (100%)	No	Tracks H₁ loop persistence — measures orbital shape

Statistics cannot see the fault.
Topology can.

The dataset: industrial bearing sensor signals

Dataset loaded in InVariants

EDA: a bimodal distribution hiding two populations

Histogram of vib_x — bimodal, but inseparable by statistics

Two populations exist — but statistics cannot separate them

Z-Score: 0.7% detected, 97% of faults reach production undetected

EDA → Outliers → Z-Score (±3σ)

Z-Score: effective detection rate < 2%

PCA: anomalies lost inside the point cloud

PCA 2D colored by anomalia

Linear reduction: blind to the geometry of the problem

The orbit: the topological signature of a healthy bearing

Phase Portrait: vib_x ↔ vib_y · colored by anomalia

The insight: faults break the topological orbit

Persistent Homology over time: all 4 fault events found

Sliding Window TDA — H₁ persistence over time

How the detector works

Topological TDA: 100% recall with zero labels

Same dataset, radically different results

Analyze your own sensor data

Statistics cannot see the fault.Topology can.

The dataset: industrial bearing sensor signals

Dataset loaded in InVariants

EDA: a bimodal distribution hiding two populations

Histogram of vib_x — bimodal, but inseparable by statistics

Two populations exist — but statistics cannot separate them

Z-Score: 0.7% detected, 97% of faults reach production undetected

EDA → Outliers → Z-Score (±3σ)

Z-Score: effective detection rate < 2%

PCA: anomalies lost inside the point cloud

PCA 2D colored by anomalia

Linear reduction: blind to the geometry of the problem

The orbit: the topological signature of a healthy bearing

Phase Portrait: vib_x ↔ vib_y · colored by anomalia

The insight: faults break the topological orbit

Persistent Homology over time: all 4 fault events found

Sliding Window TDA — H₁ persistence over time

How the detector works

Topological TDA: 100% recall with zero labels

Same dataset, radically different results

Analyze your own sensor data

Statistics cannot see the fault.
Topology can.