Overview
Persistent homology is a method from topological data analysis (TDA) that characterizes the shape of data by tracking how topological features (connected components, loops, and voids) appear and disappear as a scale parameter varies. The result is a compact summary, called a persistence diagram, that captures multi-scale geometric structure in a way that is stable to noise and independent of coordinate systems.
Dispers brings a statistical perspective to this computation. Rather than attempting to process an entire dataset at once, it uses repeated bootstrap sampling to build up inferences about the topology of the full point cloud. Dispers makes rigorous TDA tractable on datasets that would otherwise be out of reach.
The challenge
Computing persistent homology requires constructing and reducing a simplicial complex whose size grows combinatorially with the number of points. For a point cloud of n points, the number of edges alone is O(n²), and the boundary matrix reduction that produces the persistence diagram carries cubic worst-case complexity. State-of-the-art single-machine libraries such as Ripser and Gudhi are highly optimized, but are fundamentally limited by available RAM and single-node compute.
In practice this means that point clouds in the hundreds of thousands of points push single machine and even some distributed system tools to their limits or beyond, making routine TDA infeasible at full resolution. Such huge datasets are common in geospatial survey data, simulation outputs, and sensor networks.
How it works
Dispers sidesteps the scaling bottleneck through bootstrap sampling. Instead of computing persistent homology on the full dataset, it repeatedly draws random subsamples, computes persistence diagrams on each, and aggregates the results to make statistical inferences about the topology of the underlying distribution. Because each subsample is small enough to fit comfortably in memory, the individual computations are fast and can be distributed across workers in parallel. The bootstrap framework also provides uncertainty estimates alongside topological summaries — a benefit that single-pass methods cannot offer.
Results
By trading a single expensive computation for many cheap ones, Dispers scales persistent homology to point clouds that exceed single-machine memory limits without sacrificing the statistical reliability of the output. The bootstrap approach also surfaces confidence information about which topological features are persistent signal versus noise — making results more interpretable and actionable in downstream analysis.