×

Distributed nonparametric function estimation: optimal rate of convergence and cost of adaptation. (English) Zbl 1486.62099

Summary: Distributed minimax estimation and distributed adaptive estimation under communication constraints for Gaussian sequence model and white noise model are studied. The minimax rate of convergence for distributed estimation over a given Besov class, which serves as a benchmark for the cost of adaptation, is established. We then quantify the exact communication cost for adaptation and construct an optimally adaptive procedure for distributed estimation over a range of Besov classes.
The results demonstrate significant differences between nonparametric function estimation in the distributed setting and the conventional centralized setting. For global estimation, adaptation in general cannot be achieved for free in the distributed setting. The new technical tools to obtain the exact characterization for the cost of adaptation can be of independent interest.

MSC:

62G08 Nonparametric regression and quantile regression
62C20 Minimax procedures in statistical decision theory
62G07 Density estimation
62G20 Asymptotic properties of nonparametric inference

References:

[1] ABRAMOVICH, F., BENJAMINI, Y., DONOHO, D. L. and JOHNSTONE, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584-653. · Zbl 1092.62005 · doi:10.1214/009053606000000074
[2] BARNES, L. P., HAN, Y. and ÖZGÜR, A. (2019). Learning distributions from their samples under communication constraints. CoRR. Available at 1902.02890.
[3] Battey, H., Fan, J., Liu, H., Lu, J. and Zhu, Z. (2018). Distributed testing and estimation under sparse high dimensional models. Ann. Statist. 46 1352-1382. · Zbl 1392.62060 · doi:10.1214/17-AOS1587
[4] BLAHUT, R. E. and BLAHUT, R. E. (1987). Principles and Practice of Information Theory 1. Addison-Wesley, Reading, MA. · Zbl 0681.94001
[5] Braverman, M., Garg, A., Ma, T., Nguyen, H. L. and Woodruff, D. P. (2016). Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In STOC’16—Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing 1011-1020. ACM, New York. · Zbl 1373.68235 · doi:10.1145/2897518.2897582
[6] Brown, L. D. and Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist. 24 2384-2398. · Zbl 0867.62022 · doi:10.1214/aos/1032181159
[7] BROWN, L. D., CAI, T. T., LOW, M. G. and ZHANG, C.-H. (2002). Asymptotic equivalence theory for nonparametric regression with random design. Ann. Statist. 30 688-707. · Zbl 1029.62044 · doi:10.1214/aos/1028674838
[8] BROWN, L. D., CARTER, A. V., LOW, M. G. and ZHANG, C.-H. (2004). Equivalence theory for density estimation, Poisson processes and Gaussian white noise with drift. Ann. Statist. 32 2074-2097. · Zbl 1062.62083 · doi:10.1214/009053604000000012
[9] BROWN, L., CAI, T., ZHANG, R., ZHAO, L. and ZHOU, H. (2010). The root-unroot algorithm for density estimation as implemented via wavelet block thresholding. Probab. Theory Related Fields 146 401-433. · Zbl 1180.62055 · doi:10.1007/s00440-008-0194-2
[10] CAI, T. T. (1999). Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Ann. Statist. 27 898-924. · Zbl 0954.62047 · doi:10.1214/aos/1018031262
[11] Cai, T. T. (2008). On information pooling, adaptability and superefficiency in nonparametric function estimation. J. Multivariate Anal. 99 421-436. · Zbl 1148.62020 · doi:10.1016/j.jmva.2006.11.010
[12] CAI, T. T. and WEI, H. (2020a). Distributed gaussian mean estimation under communication constraints: Optimal rates and communication-efficient algorithms. ArXiv preprint. Available at arXiv:2001.08877.
[13] CAI, T. T. and WEI, H. (2022). Supplement to “Distributed nonparametric function estimation: Optimal rate of convergence and cost of adaptation.” https://doi.org/10.1214/21-AOS2124SUPP
[14] CAI, T. T. and ZHOU, H. H. (2009). A data-driven block thresholding approach to wavelet estimation. Ann. Statist. 37 569-595. · Zbl 1162.62032 · doi:10.1214/07-AOS538
[15] DEISENROTH, M. and NG, J. W. (2015). Distributed Gaussian processes. In Proceedings of the 32nd International Conference on Machine Learning (F. Bach and D. Blei, eds.). Proceedings of Machine Learning Research 37 1481-1490. PMLR, Lille, France.
[16] DIAKONIKOLAS, I., GRIGORESCU, E., LI, J., NATARAJAN, A., ONAK, K. and SCHMIDT, L. (2017). Communication-efficient distributed learning of discrete distributions. In Advances in Neural Information Processing Systems 6391-6401.
[17] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. · Zbl 0815.62019 · doi:10.1093/biomet/81.3.425
[18] Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200-1224. · Zbl 0869.62024
[19] Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879-921. · Zbl 0935.62041 · doi:10.1214/aos/1024691081
[20] Fan, J., Wang, D., Wang, K. and Zhu, Z. (2019). Distributed estimation of principal eigenspaces. Ann. Statist. 47 3009-3031. · Zbl 1450.62067 · doi:10.1214/18-AOS1713
[21] GARG, A., MA, T. and NGUYEN, H. (2014). On communication cost of distributed statistical estimation and dimensionality. In Advances in Neural Information Processing Systems 2726-2734.
[22] Hall, P., Kerkyacharian, G. and Picard, D. (1999). On the minimax optimality of block thresholded wavelet estimators. Statist. Sinica 9 33-49. · Zbl 0915.62028
[23] HAN, Y., ÖZGÜR, A. and WEISSMAN, T. (2021). Geometric lower bounds for distributed parameter estimation under communication constraints. IEEE Trans. Inf. Theory 67 8248-8263. · Zbl 1489.94005 · doi:10.1109/TIT.2021.3108952
[24] IBRAGIMOV, I. and KHASMINSKII, R. (1997). Some estimation problems in infinite-dimensional Gaussian white noise. In Festschrift for Lucien Le Cam 259-274. Springer, New York. · Zbl 0908.62083
[25] JOHNSTONE, I. M. (2017). Gaussian Estimation: Sequence and Wavelet Models. Manuscript.
[26] Jordan, M. I., Lee, J. D. and Yang, Y. (2019). Communication-efficient distributed statistical inference. J. Amer. Statist. Assoc. 114 668-681. · Zbl 1420.62097 · doi:10.1080/01621459.2018.1429274
[27] Kleiner, A., Talwalkar, A., Sarkar, P. and Jordan, M. I. (2014). A scalable bootstrap for massive data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 795-816. · Zbl 07555464 · doi:10.1111/rssb.12050
[28] LEE, J. D., LIU, Q., SUN, Y. and TAYLOR, J. E. (2017). Communication-efficient sparse regression. J. Mach. Learn. Res. 18 Paper No. 5. · Zbl 1434.62157
[29] Meyer, Y. (1992). Wavelets and Operators. Cambridge Studies in Advanced Mathematics 37. Cambridge Univ. Press, Cambridge. · Zbl 0776.42019
[30] MICHALOWICZ, J., NICHOLS, J. and BUCHOLTZ, F. (2008). Calculation of differential entropy for a mixed Gaussian distribution. Entropy 10 200-206.
[31] NUSSBAUM, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist. 24 2399-2430. · Zbl 0867.62035 · doi:10.1214/aos/1032181160
[32] SHANNON, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27 379-423. · Zbl 1154.94303 · doi:10.1002/j.1538-7305.1948.tb01338.x
[33] SZABÓ, B. and VAN ZANTEN, H. (2020). Adaptive distributed methods under communication constraints. Ann. Statist. 48 2347-2380. · Zbl 1455.62097 · doi:10.1214/19-AOS1890
[34] SZABO, B. and VAN ZANTEN, H. (2020). Distributed function estimation: Adaptation using minimal communication. ArXiv preprint. Available at arXiv:2003.12838. · Zbl 1455.62097
[35] ZHANG, Y., DUCHI, J., JORDAN, M. I. and WAINWRIGHT, M. J. (2013). Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In Advances in Neural Information Processing Systems 2328-2336.
[36] ZHU, Y. and LAFFERTY, J. (2018). Distributed nonparametric regression under communication constraints. ArXiv preprint. Available at arXiv:1803.01302
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.