
Topology and data. (English) Zbl 1172.62002

This long paper is a survey arguing for the use of algebraic topology in data analysis. The paper is elementary enough to be readable by a larger audience than that one of algebraic topologists. It contains many concrete examples and a large bibliography of the subject. In modern data analysis, qualitative information is needed because one wants to understand how these data are organized on a large scale. Numerical information like metrics or coordinates is not relevant, and sometimes is merely meaningless for this kind of analysis. It turns out that topology is exactly that branch of mathematics which deals with qualitative geometric information. Quantitative values like metrics or coordinates are ignored by this kind of approach. Moreover, the functoriality of the algebraic constructions allows to relate local geometric information to global geometric information, very often encoded in an algebraic structure.
The subjects treated in this survey are persistence and homology (a way of attaching an homological signature to point clouds), an imaging technique called Mapper, the multidimensional generalization of persistence, and a chapter on clustering algorithms (a method taking as input a finite metric space and producing as output a partition of the underlying set by clusters) which starts by mentioning an interesting adaptation of the Arrow Impossibility Theorem.


62-07 Data analysis (statistics) (MSC2010)
55N35 Other homology theories in algebraic topology
62H30 Classification and discrimination; cluster analysis (statistical aspects)
55N05 Čech types
