×

Some results on the curse of dimensionality and sample size recommendations. (English) Zbl 1011.62057

Summary: Multivariate density estimation is well known to be a tremendously difficult problem due to the occurrence of phenomena commonly known as the corner effect and the curse of dimensionality. Specifically, histogram density estimation in high dimensions is plagued by the consequence that sampled observations tend to reside with high probability in low density regions of the sample space. In this article we attempt to quantify two central things: in how many dimensions one starts to really feel the curse of dimensionality, and what sort of sample sizes are needed to do any kind of a reasonable inference in in various dimensions. These questions cannot be formulated in a unique way. So the attempt is to derive a broad spectrum of results, which are then illustrated by extensive computations.
A number of results may be of independent interest in combinatorics and applied probability. Our subjective conclusion after these extensive computations is that in 3 dimensions one often sees the most drastic effects relative to just one less dimension; in 5 dimensions one feels the curse of high dimensions rather strongly; in 10 dimensions, the feasibility of inference with realistic sample sizes basically vanishes. We also give a subjective minimum sample size recommendation based on the number of dimensions. These calculations are different in character from V.A. Epanechnikov [Teor. Veroyatn. Primen. 14, 156-162 (1969; Zbl 0175.17101)].

MSC:

62H12 Estimation in multivariate analysis
62G07 Density estimation
65C60 Computational problems in statistics (MSC2010)

Citations:

Zbl 0175.17101
Full Text: DOI