An industrial bearing spinning at 1,480 rpm. Faults change the shape of the vibration signal, not its magnitude. Z-score misses 98% of anomalies. InVariants finds them all.
2,000 samples from a bearing spinning at 1,480 rpm, with vibrations on three axes, temperature and RPM recorded every millisecond. 60 samples correspond to real fault states.
2,000 rows · 8 columns · no missing values. Numeric columns: vib_x, vib_y, vib_z, temperatura, rpm. Target: anomalia.
| Column | Normal μ | Fault μ | Δ |
|---|---|---|---|
| vib_x | 0.009 g | 0.089 g | +0.08 g |
| temperatura | 61.97 °C | 62.07 °C | +0.1 °C |
| rpm | 1480.2 | 1480.2 | 0 |
The histogram of vib_x reveals two peaks — a hint that two populations coexist in the data. But they overlap so heavily that no classical method can exploit this structure.
Mean ≈ 0.008 g · Std ≈ 1.98 g · kurtosis −1.48 (platykurtic — flatter than normal, consistent with two overlapping populations). Two peaks are clearly visible, yet fault samples are distributed across the entire range alongside normal data. The distributions of temperatura and rpm are even more indistinguishable.
The bimodal shape is a structural clue, but classical first-order metrics (mean, std, range) are virtually identical between normal and fault samples. Any threshold on individual variable values will either miss most faults or generate unacceptable false-positive rates.
The most widely deployed method for sensor alert systems. With a ±3σ threshold, it flags only 13 rows out of 2,000. Most are not real faults.
13 outliers detected out of 2,000 total rows (0.7%). The "No IQR outliers detected" message confirms there are also no extreme values by interquartile range.
Z-score detects deviations in the magnitude of each variable independently. When faults are subtle — 0.08 g difference in vibration — they fall within the normal percentile range. The method has no mechanism to detect changes in the geometric relationship between variables.
Deploying z-score alerts on this dataset means more than 9 in 10 faults reach production without an alarm. This is not a threshold-tuning problem — it is a fundamental limitation of the method.
Reducing to 2 principal components preserves maximum variance — but not the topological structure that identifies faults.
Purple: normal operation · Yellow: faults. The PCA projection does not separate any of the 60 faults.
Yellow points (anomalies) are completely mixed into the purple cloud (normal data). No linear classifier — no distance threshold — can separate them in this space.
PCA, and any linear projection method, cannot distinguish the faults because the difference between normal and fault states is not in the variance — it is in the curvature of the bearing's orbit.
Plotting vib_x against vib_y in time order reveals an ellipse. When the bearing spins normally, the two axes are 90° out of phase. Faults break that phase quadrature.
The blue/purple cloud forms an ellipse — the vibration orbit during normal operation. The red points (anomalies) scatter outside the ellipse or deform its boundary.
This is the first visual confirmation that the faults are detectable — just not through individual variable values. The topology of the point cloud — the ellipse, the loop, the hole in phase space — is the structure that changes with a fault.
The difference between normal and faulty states is not in the values of vib_x or vib_y individually, but in their geometric relationship. Algebraic topology has the exact mathematical tools to quantify this difference robustly and without labeled examples.
The algorithm slides an 80-sample window over the time series, computes the topology of each window, and tracks the persistence of the H₁ loop. When the bearing fails, the loop disappears.
Purple line: H₁ loop persistence in each window. Pink regions: windows with real anomalies (ground truth). The 4 sharp H₁ drops align exactly with the 4 fault events at t ≈ 2.5k, 5k, 10k, and 17k ms.
During normal operation, each 80-sample window contains ~1.5 complete orbits. High H₁ = stable loop.
When the bearing fails, the orbit deforms or collapses. H₁ drops sharply — the detector fires.
No training labels. No manual threshold. The H₁ level during normal operation is the automatic reference.
All 4 fault events in the sensor dataset are detected by the H₁ persistence drop. The same algorithm, without retraining, would generalize to any fault type that alters the bearing's orbital geometry — including fault modes never seen before.
All methods run inside InVariants on identical data. The difference is the type of information each one extracts.
| Method | Type | Detects the 60 faults? | Requires labels | Why it fails / works |
|---|---|---|---|---|
| Z-Score (±3σ) | Statistical | 13/2,000 (0.7%) | No | Faults produce no extreme univariate values |
| IQR | Statistical | 0 detected | No | Identical marginal distribution in both classes |
| PCA 2D | Linear reduction | No separation | No | Faults generate no additional variance |
| Phase Portrait | Geometric visualization | Visual only, no score | No | Reveals the orbit but gives no automated score |
| Sliding Window TDA | Topological | 4/4 events (100%) | No | Tracks H₁ loop persistence — measures orbital shape |
Upload any CSV with time-series sensor data. InVariants computes the topology and detects anomalies without labels in under 30 seconds.