×

The geometric median and applications to robust mean estimation. (English) Zbl 07892644

Summary: This paper is devoted to the statistical and numerical properties of the geometric median and its applications to the problem of robust mean estimation via the median of means principle. Our main theoretical results include (a) an upper bound for the distance between the mean and the median for general absolutely continuous distributions in \(\mathbb{R}^d\), and examples of specific classes of distributions for which these bounds do not depend on the ambient dimension \(d\); (b) exponential deviation inequalities for the distance between the sample and the population versions of the geometric median, which again depend only on the trace-type quantities and not on the ambient dimension. As a corollary, we deduce improved bounds for the (geometric) median of means estimator that hold for large classes of heavy-tailed distributions. Finally, we address the error of numerical approximation, which is an important practical aspect of any statistical estimation procedure. We demonstrate that the objective function minimized by the geometric median satisfies a “local quadratic growth” condition that allows one to translate suboptimality bounds for the objective function to the corresponding bounds for the numerical approximation to the median itself and propose a simple stopping rule applicable to any optimization method which yields explicit error guarantees. We conclude with the numerical experiments, including the application to estimation of mean values of log-returns for S&P 500 data.

MSC:

62G35 Nonparametric robustness
60E15 Inequalities; stochastic orderings

References:

[1] Alistarh, D., Allen-Zhu, Z., and Li, J., Byzantine stochastic gradient descent, Adv. Neural Inf. Process. Syst., 31 (2018).
[2] Alon, N., Matias, Y., and Szegedy, M., The space complexity of approximating the frequency moments, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, , 1996, pp. 20-29. · Zbl 0922.68057
[3] Bateni, A.-H., Minasyan, A., and Dalalyan, A. S., Nearly Minimax Robust Estimator of the Mean Vector by Iterative Spectral Dimension Reduction, preprint, arXiv:2204.02323, 2022.
[4] Beck, A. and Sabach, S., Weiszfeld’s method: Old and new results, J. Optim. Theory Appl., 164 (2015), pp. 1-40. · Zbl 1316.49001
[5] Boucheron, S., Lugosi, G., and Massart, P., Concentration Inequalities: A Nonasymptotic Theory of Independence, Oxford University Press, 2013. · Zbl 1279.60005
[6] Bouhata, D. and Moumen, H., Byzantine Fault Tolerance in Distributed Machine Learning: A Survey, preprint, arXiv:2205.02572, 2022.
[7] Cardot, H., Cénac, P., and Godichon-Baggioni, A., Online estimation of the geometric median in Hilbert spaces: Nonasymptotic confidence balls, Ann. Statist., 45 (2017), pp. 591-614. · Zbl 1371.62027
[8] Cardot, H., Cénac, P., and Zitt, P-A, Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm, Bernoulli, 19 (2013), pp. 18-43. · Zbl 1259.62068
[9] Chaudhuri, P., On a geometric notion of quantiles for multivariate data, J. Amer. Statist. Assoc., 91 (1996), pp. 862-872. · Zbl 0869.62040
[10] Chen, Y., Su, L., and Xu, J., Distributed statistical machine learning in adversarial settings: Byzantine gradient descent, in Proceedings of the ACM Conference on Measurement and Analysis of Computing Systems, , Vol. 1, 2017, pp. 1-25.
[11] Cherapanamjeri, Y., Flammarion, N., and Bartlett, P. L., Fast mean estimation with sub-Gaussian rates, in Proceedings of the Conference on Learning Theory, , PMLR, 2019, pp. 786-806.
[12] Cohen, M. B., Lee, Y. T., Miller, G., Pachocki, J., and Sidford, A., Geometric median in nearly linear time, in Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, , 2016, pp. 9-21. · Zbl 1377.68267
[13] Depersin, J. and Lecué, G., Robust sub-Gaussian estimation of a mean vector in nearly linear time, Ann. Statist., 50 (2022), pp. 511-536. · Zbl 1486.62077
[14] Diakonikolas, I. and Kane, D., Recent advances in algorithmic high-dimensional robust statistics, in Beyond the Worst-Case Analysis of Algorithms, Cambridge University Press, 2021.
[15] Dirksen, S., Tail bounds via generic chaining, Electron. J. Probab., 20 (2015), 53. · Zbl 1327.60048
[16] Giné, E. and Nickl, R., Mathematical Foundations of Infinite-Dimensional Statistical Models, , Cambridge University Press, 2015.
[17] Gini, C. and Galvani, L., Di talune estensioni dei concetti di media ai caratteri qualitativi, Metron, 8 (1929), pp. 3-209. · JFM 56.1084.03
[18] Gluskin, E. and Milman, V., Note on the geometric-arithmetic mean inequality, in Geometric Aspects of Functional Analysis: Israel Seminar 2001-2002, Springer, 2003, pp. 131-135. · Zbl 1035.26020
[19] Haldane, J. B. S., Note on the median of a multivariate distribution, Biometrika, 35 (1948), pp. 414-417. · Zbl 0032.03601
[20] Hopkins, S. B., Mean estimation with sub-Gaussian rates in polynomial time, Ann. Statist., 48 (2020), pp. 1193-1213. · Zbl 1454.62162
[21] Hsu, D. and Sabato, S., Loss minimization and parameter estimation with heavy tails, J. Mach. Learn. Res., 17 (2016), 18. · Zbl 1360.62380
[22] Jerrum, M. R., Valiant, L. G., and Vazirani, V. V., Random generation of combinatorial structures from a uniform distribution, Theoret. Comput. Sci., 43 (1986), pp. 169-188. · Zbl 0597.68056
[23] Juškevičius, T. and Lee, J., Small Ball Probabilities, Maximum Density and Rearrangements, preprint, arXiv:1503.09190, 2015.
[24] Kärkkäinen, T. and Ayrämö, S., On computation of spatial median for robust data mining, in Evolutionary and Deterministic Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems, EUROGEN, Munich, 2005.
[25] Kemperman, J., The median of a finite measure on a Banach space, in Statistical Data Analysis Based on the \(L_1\)-Norm and Related Methods, North-Holland, Amsterdam, 1987, pp. 217-230.
[26] Koltchinskii, V., Spatial quantiles and their Bahadur-Kiefer representations, in Asymptotic Statistics: Proceedings of the Fifth Prague Symposium, , Springer, 1994, pp. 361-367.
[27] Koltchinskii, V. I., \(M\)-estimation, convexity and quantiles, Ann. Statist., 25 (1997), pp. 435-477. · Zbl 0878.62037
[28] Konen, D., Recovering a Probability Measure From Its Multivariate Spatial Rank, preprint, arXiv:2208.11551, 2022.
[29] Latala, R. and Oleszkiewicz, K., Small ball probability estimates in terms of width, Studia Math., 169 (2005), pp. 305-314. · Zbl 1073.60043
[30] Lerasle, M. and Oliveira, R. I., Robust Empirical Mean Estimators, preprint, arXiv:1112.3914, 2011.
[31] Lugosi, G. and Mendelson, S., Sub-Gaussian estimators of the mean of a random vector, Ann. Statist., 47 (2019), pp. 783-794. · Zbl 1417.62192
[32] Lugosi, G. and Mendelson, S., Mean estimation and regression under heavy-tailed distributions: A survey, Found. Comput. Math., 19 (2019), pp. 1145-1190. · Zbl 1431.62123
[33] Lugosi, G. and Mendelson, S., Multivariate mean estimation with direction-dependent accuracy, J. Eur. Math. Soc. (JEMS), 26 (2024), pp. 2211-2247. · Zbl 07855544
[34] Madiman, M., Melbourne, J., and Xu, P., Rogozin’s Convolution Inequality for Locally Compact Groups, preprint, arXiv:1705.00642, 2017.
[35] Mendelson, S., Learning without concentration, J. ACM, 62 (2015), 21. · Zbl 1333.68232
[36] Mendelson, S. and Zhivotovskiy, N., Robust covariance estimation under \({L}_4-{L}_2\) norm equivalence, Ann. Statist., 48 (2020), pp. 1648-1664. · Zbl 1451.62084
[37] Minsker, S., Geometric median and robust estimation in Banach spaces, Bernoulli, 21 (2015), pp. 2308-2335. · Zbl 1348.60041
[38] Minsker, S., U-statistics of growing order and sub-Gaussian mean estimators with sharp constants, Math. Stat. Learn., 7 (2024), pp. 1-39. · Zbl 07854614
[39] Nemirovski, A. and Yudin, D., Problem Complexity and Method Efficiency in Optimization, John Wiley and Sons, 1983. · Zbl 0501.90062
[40] Nesterov, Y., Lectures on Convex Optimization, , Springer, 2018. · Zbl 1427.90003
[41] Oliveira, R. I., The lower tail of random quadratic forms with applications to ordinary least squares, Probab. Theory Related Fields, 166 (2016), pp. 1175-1194. · Zbl 1360.60075
[42] Ostresh, L. M., On the convergence of a class of iterative methods for solving the Weber location problem, Oper. Res., 26 (1978), pp. 597-609. · Zbl 0396.90073
[43] Overton, M. L., A quadratically convergent method for minimizing a sum of Euclidean norms, Math. Program., 27 (1983), pp. 34-63. · Zbl 0536.65053
[44] Pillutla, K., Kakade, S. M., and Harchaoui, Z., Robust aggregation for federated learning, IEEE Trans. Signal Process., 70 (2022), pp. 1142-1154. · Zbl 07911322
[45] Rio, E., Moment inequalities for sums of dependent random variables under projective conditions, J. Theoret. Probab., 22 (2009), pp. 146-163. · Zbl 1160.60312
[46] Romon, G., Statistical Properties of Approximate Geometric Quantiles in Infinite-Dimensional Banach Spaces, preprint, arXiv:2211.00035, 2022.
[47] Rudelson, M. and Vershynin, R., Small ball probabilities for linear images of high-dimensional distributions, Int. Math. Res. Not. IMRN, 2015 (2015), pp. 9594-9617. · Zbl 1330.60029
[48] Vardi, Y. and Zhang, C.-H., The multivariate \({L}_1\)-median and associated data depth, Proc. Natl. Acad. Sci. USA, 97 (2000), pp. 1423-1426. · Zbl 1054.62067
[49] Weber, A., Uber den standort der industrien (Alfred Weber’s Theory of the Location of Industries), University of Chicago, 1929.
[50] Weiszfeld, E., Sur un problème de minimum dans l’espace, Tohoku Math. J., 42 (1936), pp. 274-280. · JFM 62.0711.03
[51] Yang, T. and Lin, Q., RSG: Beating subgradient method without smoothness and strong convexity, J. Mach. Learn. Res., 19 (2018), 6. · Zbl 1444.90092
[52] Zhivotovskiy, N., Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle, Electron. J. Probab., 29 (2024), 13. · Zbl 1531.60019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.