An analysis of classical multidimensional scaling with applications to clustering. (English) Zbl 1508.94042
Summary: Classical multidimensional scaling is a widely used dimension reduction technique. Yet few theoretical results characterizing its statistical performance exist. This paper provides a theoretical framework for analyzing the quality of embedded samples produced by classical multidimensional scaling. This lays a foundation for various downstream statistical analyses, and we focus on clustering noisy data. Our results provide scaling conditions on the signal-to-noise ratio under which classical multidimensional scaling followed by a distance-based clustering algorithm can recover the cluster labels of all samples. Simulation studies confirm these scaling conditions are sharp. Applications to the cancer gene-expression data, the single-cell RNA sequencing data and the natural language data lend strong support to the methodology and theory.
MSC:
94A16 | Informational aspects of data analysis and big data |
94A20 | Sampling theory in information and communication theory |
62H25 | Factor analysis and principal components; correspondence analysis |
62H30 | Classification and discrimination; cluster analysis (statistical aspects) |
92C55 | Biomedical imaging and signal processing |