×

Isotonic regression with unknown permutations: statistics, computation and adaptation. (English) Zbl 1486.62119

Summary: Motivated by models for multiway comparison data, we consider the problem of estimating a coordinatewise isotonic function on the domain \([0,1]^d\) from noisy observations collected on a uniform lattice, but where the design points have been permuted along each dimension. While the univariate and bivariate versions of this problem have received significant attention, our focus is on the multivariate case \(d\ge 3\). We study both the minimax risk of estimation (in empirical \(L_2\) loss) and the fundamental limits of adaptation (quantified by the adaptivity index) to a family of piecewise constant functions. We provide a computationally efficient Mirsky partition estimator that is minimax optimal while also achieving the smallest adaptivity index possible for polynomial time procedures. Thus, from a worst-case perspective and in sharp contrast to the bivariate case, the latent permutations in the model do not introduce significant computational difficulties over and above vanilla isotonic regression. On the other hand, the fundamental limits of adaptation are significantly different with and without unknown permutations: Assuming a hardness conjecture from average-case complexity theory, a statistical-computational gap manifests in the former case. In a complementary direction, we show that natural modifications of existing estimators fail to satisfy at least one of the desiderata of optimal worst-case statistical performance, computational efficiency and fast adaptation. Along the way to showing our results, we improve adaptation results in the special case \(d=2\) and establish some properties of estimators for vanilla isotonic regression, both of which may be of independent interest.

MSC:

62G08 Nonparametric regression and quantile regression
62H12 Estimation in multivariate analysis
62G20 Asymptotic properties of nonparametric inference
62C20 Minimax procedures in statistical decision theory

References:

[1] ABBE, E. and MONTANARI, A. (2013). Conditional random fields, planted constraint satisfaction and entropy concentration. In Approximation, Randomization, and Combinatorial Optimization. Lecture Notes in Computer Science 8096 332-346. Springer, Heidelberg. · Zbl 1383.68057 · doi:10.1007/978-3-642-40328-6_24
[2] ABID, A. and ZOU, J. (2018). A stochastic expectation-maximization approach to shuffled linear regression. In 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 470-477. IEEE.
[3] Airoldi, E. M., Costa, T. B. and Chan, S. H. (2013). Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In Advances in Neural Information Processing Systems 692-700.
[4] ANDERSEN, C. M. and BRO, R. (2003). Practical aspects of PARAFAC modeling of fluorescence excitation-emission data. J. Chemom. 17 200-215.
[5] ANGELINI, M. C., CALTAGIRONE, F., KRZAKALA, F. and ZDEBOROVÁ, L. (2015). Spectral detection on sparse hypergraphs. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) 66-73. IEEE.
[6] BALABDAOUI, F., DOSS, C. R. and DUROT, C. (2020). Unlinked monotone regression. arXiv preprint. Available at arXiv:2007.00830. · Zbl 07415115
[7] BALLINGER, T. P. and WILCOX, N. T. (1997). Decisions, error and heterogeneity. Econ. J. 107 1090-1105.
[8] BALTRUNAS, L., MAKCINSKAS, T. and RICCI, F. (2010). Group recommendations with rank aggregation and collaborative filtering. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10 119-126. ACM, New York, USA.
[9] BARAK, B., HOPKINS, S., KELNER, J., KOTHARI, P. K., MOITRA, A. and POTECHIN, A. (2019). A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM J. Comput. 48 687-735. · Zbl 1421.68056 · doi:10.1137/17M1138236
[10] BEHR, M. and MUNK, A. (2017). Minimax estimation in linear models with unknown finite alphabet design. arXiv preprint. Available at arXiv:1711.04145. · Zbl 1374.94040
[11] BELLEC, P. C. and TSYBAKOV, A. B. (2015). Sharp oracle bounds for monotone and convex regression through aggregation. J. Mach. Learn. Res. 16 1879-1892. · Zbl 1351.62088
[12] BEN-AKIVA, M. E., LERMAN, S. R. and LERMAN, S. R. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand 9. MIT press, Cambridge.
[13] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780-1815. · Zbl 1277.62155 · doi:10.1214/13-AOS1127
[14] Birgé, L. and Massart, P. (1993). Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 113-150. · Zbl 0805.62037 · doi:10.1007/BF01199316
[15] Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324-345. · Zbl 0047.12903 · doi:10.2307/2334029
[16] BRENNAN, M. and BRESLER, G. (2020). Reducibility and statistical-computational gaps from secret leakage. In Proceedings of 33rd Conference on Learning Theory (J. Abernethy and S. Agarwal, eds.). Proceedings of Machine Learning Research, PMLR 125 648-847.
[17] BRENNAN, M., BRESLER, G. and HULEIHEL, W. (2018). Reducibility and computational lower bounds for problems with planted sparse structure. In Proceedings of the 31st Conference on Learning Theory (S. Bubeck, V. Perchet and P. Rigollet, eds.). Proceedings of Machine Learning Research, PMLR 75 48-166.
[18] Brunk, H. D. (1955). Maximum likelihood estimates of monotone parameters. Ann. Math. Stat. 26 607-616. · Zbl 0066.38503 · doi:10.1214/aoms/1177728420
[19] CARPENTIER, A. and SCHLUETER, T. (2016). Learning relationships between data obtained independently. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics 658-666.
[20] Chan, S. and Airoldi, E. (2014). A consistent histogram estimator for exchangeable graph models. In International Conference on Machine Learning 208-216.
[21] CHANDUKALA, S. R., KIM, J., OTTER, T. and ALLENBY, G. M. (2008). Choice Models in Marketing: Economic Assumptions, Challenges and Trends. Now Publishers, Hanover. · Zbl 1154.90524
[22] Chatterjee, S. (2015). Matrix estimation by universal singular value thresholding. Ann. Statist. 43 177-214. · Zbl 1308.62038 · doi:10.1214/14-AOS1272
[23] Chatterjee, S., Guntuboyina, A. and Sen, B. (2015). On risk bounds in isotonic and other shape restricted regression problems. Ann. Statist. 43 1774-1800. · Zbl 1317.62032 · doi:10.1214/15-AOS1324
[24] Chatterjee, S., Guntuboyina, A. and Sen, B. (2018). On matrix estimation under monotonicity constraints. Bernoulli 24 1072-1100. · Zbl 1419.62135 · doi:10.3150/16-BEJ865
[25] CHATTERJEE, S. and MUKHERJEE, S. (2019). Estimation in tournaments and graphs under monotonicity constraints. IEEE Trans. Inf. Theory 65 3525-3539. · Zbl 1432.62160 · doi:10.1109/TIT.2019.2893911
[26] CHEN, X., BENNETT, P. N., COLLINS-THOMPSON, K. and HORVITZ, E. (2013). Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining 193-202. ACM, New York.
[27] COLLIER, O. and DALALYAN, A. S. (2016). Minimax rates in permutation estimation for feature matching. J. Mach. Learn. Res. 17 6. · Zbl 1360.62262
[28] CRASWELL, N., ZOETER, O., TAYLOR, M. and RAMSEY, B. (2008). An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining 87-94.
[29] DAWID, A. P. and SKENE, A. M. (1979). Maximum likelihood estimation of observer error-rates using the em algorithm. J. R. Stat. Soc. Ser. C. Appl. Stat. 28 20-28.
[30] DEGROOT, M. H. and GOEL, P. K. (1980). Estimation of the correlation coefficient from a broken random sample. Ann. Statist. 8 264-278. · Zbl 0446.62049
[31] Deng, H. and Zhang, C.-H. (2020). Isotonic regression in multi-dimensional spaces and graphs. Ann. Statist. 48 3672-3698. · Zbl 1462.62208 · doi:10.1214/20-AOS1947
[32] Dwork, C., Kumar, R., Naor, M. and Sivakumar, D. (2001). Rank aggregation methods for the web. In Proceedings of the 10th International Conference on World Wide Web 613-622. ACM, New York.
[33] DYKSTRA, R. L. and ROBERTSON, T. (1982). An algorithm for isotonic regression for two or more independent variables. Ann. Statist. 10 708-716. · Zbl 0485.65099
[34] FARIAS, V. F., JAGABATHULA, S. and SHAH, D. (2013). A nonparametric approach to modeling choice with limited data. Manage. Sci. 59 305-322.
[35] FEIGE, U. and KRAUTHGAMER, R. (2003). The probable value of the Lovász-Schrijver relaxations for maximum independent set. SIAM J. Comput. 32 345-370. · Zbl 1021.05073 · doi:10.1137/S009753970240118X
[36] FISHBURN, P. C. (1973). Binary choice probabilities: On the varieties of stochastic transitivity. J. Math. Psych. 10 327-352. · Zbl 0277.92008 · doi:10.1016/0022-2496(73)90021-7
[37] FLAMMARION, N., MAO, C. and RIGOLLET, P. (2019). Optimal rates of statistical seriation. Bernoulli 25 623-653. · Zbl 1442.62084 · doi:10.3150/17-bej1000
[38] Fokianos, K., Leucht, A. and Neumann, M. H. (2020). On integrated \[{L^1}\] convergence rate of an isotonic regression estimator for multivariate observations. IEEE Trans. Inf. Theory 66 6389-6402. · Zbl 1452.62282
[39] Gao, C., Han, F. and Zhang, C.-H. (2020). On estimation of isotonic piecewise constant signals. Ann. Statist. 48 629-654. · Zbl 1450.62034 · doi:10.1214/18-AOS1792
[40] Gao, C., Lu, Y. and Zhou, H. H. (2015). Rate-optimal graphon estimation. Ann. Statist. 43 2624-2652. · Zbl 1332.60050 · doi:10.1214/15-AOS1354
[41] GHOSHDASTIDAR, D. and DUKKIPATI, A. (2017). Consistency of spectral hypergraph partitioning under planted partition model. Ann. Statist. 45 289-315. · Zbl 1360.62330 · doi:10.1214/16-AOS1453
[42] HAN, Q. (2021+). Set structured global empirical risk minimizers are rate optimal in general dimensions. Ann. Statist. to appear. · Zbl 1478.62081
[43] Han, Q., Wang, T., Chatterjee, S. and Samworth, R. J. (2019). Isotonic regression in general dimensions. Ann. Statist. 47 2440-2471. · Zbl 1437.62124 · doi:10.1214/18-AOS1753
[44] HARDOUIN, J.-B. and MESBAH, M. (2004). Clustering binary variables in subscales using an extended Rasch model and Akaike information criterion. Comm. Statist. Theory Methods 33 1277-1294. · Zbl 1114.62368 · doi:10.1081/STA-120030149
[45] HOPKINS, S. (2018). Statistical Inference and the Sum of Squares Method. ProQuest LLC, Ann Arbor, MI. Thesis (Ph.D.)—Cornell University.
[46] HSU, D. J., SHI, K. and SUN, X. (2017). Linear regression without correspondence. In Advances in Neural Information Processing Systems 1531-1540.
[47] JERRUM, M. (1992). Large cliques elude the Metropolis process. Random Structures Algorithms 3 347-359. · Zbl 0754.60018 · doi:10.1002/rsa.3240030402
[48] Kim, A. K. H. and Samworth, R. J. (2016). Global rates of convergence in log-concave density estimation. Ann. Statist. 44 2756-2779. · Zbl 1360.62157 · doi:10.1214/16-AOS1480
[49] KÖK, A. G., FISHER, M. L. and VAIDYANATHAN, R. (2015). Assortment planning: Review of literature and industry practice. In Retail Supply Chain Management: Quantitative Models and Empirical Studies (N. Agrawal and S. A. Smith, eds.) 175-236. Springer US, Boston, MA.
[50] KOLDA, T. G., BADER, B. W. and KENNY, J. P. (2005). Higher-Order Web Link Analysis Using Multilinear Algebra. Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE.
[51] KRANTZ, D. H. (1965). The Scaling of Small and Large Color Differences Ph.D. thesis, University of Pennsylvania.
[52] KUNISKY, D., WEIN, A. S. and BANDEIRA, A. S. (2019). Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. arXiv preprint. Available at arXiv:1907.11636.
[53] KYNG, R., RAO, A. and SACHDEVA, S. (2015). Fast, provable algorithms for isotonic regression in all \[{\ell_p}\]-norms. In Advances in Neural Information Processing Systems 2719-2727.
[54] LEPSKI, O. (1991). On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35 454-466. · Zbl 0745.62083
[55] Lepski, O. V. and Spokoiny, V. G. (1997). Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist. 25 2512-2546. · Zbl 0894.62041 · doi:10.1214/aos/1030741083
[56] LIU, A. and MOITRA, A. (2020). Better algorithms for estimating non-parametric models in crowd-sourcing and rank aggregation. In Proceedings of 33rd Conference on Learning Theory (J. Abernethy and S. Agarwal, eds.). Proceedings of Machine Learning Research, PMLR 125 2780-2829.
[57] Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley, New York. · Zbl 0093.31708
[58] LUO, Y. and ZHANG, A. R. (2020). Tensor clustering with planted structures: Statistical optimality and computational limits. arXiv preprint. Available at arXiv:2005.10743v2.
[59] MA, R., CAI, T. T. and LI, H. (2021+). Optimal estimation of bacterial growth rates based on a permuted monotone matrix. Biometrika. to appear. · Zbl 07459723
[60] Ma, Z. and Wu, Y. (2015). Computational barriers in minimax submatrix detection. Ann. Statist. 43 1089-1116. · Zbl 1328.62354 · doi:10.1214/14-AOS1300
[61] MAO, C., PANANJADY, A. and WAINWRIGHT, M. J. (2018). Breaking the \[1/\sqrt{n}\] barrier: Faster rates for permutation-based models in polynomial time. In Proceedings of the 31st Conference on Learning Theory (S. Bubeck, V. Perchet and P. Rigollet, eds.). Proceedings of Machine Learning Research, PMLR 75 2037-2042.
[62] MAO, C., PANANJADY, A. and WAINWRIGHT, M. J. (2020). Towards optimal estimation of bivariate isotonic matrices with unknown permutations. Ann. Statist. 48 3183-3205. · Zbl 1490.62129 · doi:10.1214/19-AOS1925
[63] MARSCHAK, J. and DAVIDSON, D. (1957). Experimental tests of stochastic decision theory Technical report, Cowles Foundation for Research in Economics, Yale University.
[64] McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In Frontiers in Econometrics (P. Zarembka, ed.) 105-142. Academic Press, San Diego.
[65] MCFADDEN, D. (1981). Econometric models of probabilistic choice. In Structural Analysis of Discrete Data with Econometric Applications (C. Manski and D. McFadden, eds.) 198-272. MIT press, Cambridge. · Zbl 0598.62145
[66] MCLAUGHLIN, D. H. and LUCE, R. D. (1965). Stochastic transitivity and cancellation of preferences between bitter-sweet solutions. Psychon. Sci. 2 89-90.
[67] MIRSKY, L. (1971). A dual of Dilworth’s decomposition theorem. Amer. Math. Monthly 78 876-877. · Zbl 0263.06002 · doi:10.2307/2316481
[68] MÖCKS, J. (1988). Decomposing event-related potentials: A new topographic components model. Biol. Psychol. 26 199-215. · doi:10.1016/0301-0511(88)90020-8
[69] NEGAHBAN, S., OH, S., THEKUMPARAMPIL, K. K. and XU, J. (2018). Learning from comparisons and choices. J. Mach. Learn. Res. 19 40. · Zbl 1461.68192
[70] NEMIROVSKIĬ, A. S., POLYAK, B. T. and TSYBAKOV, A. B. (1985). The rate of convergence of nonparametric estimates of maximum likelihood type. Problemy Peredachi Informatsii 21 17-33. · Zbl 0616.62048
[71] NGUYEN, A., PIECH, C., HUANG, J. and GUIBAS, L. (2014). Codewebs: Scalable homework search for massive open online programming courses. In Proceedings of the 23rd International Conference on World Wide Web 491-502.
[72] PANANJADY, A. and SAMWORTH, R. J (2022). Supplement to “Isotonic regression with unknown permutations: Statistics, computation and adaptation.” https://doi.org/10.1214/21-AOS2107SUPP
[73] PANANJADY, A., WAINWRIGHT, M. J. and COURTADE, T. A. (2017). Denoising linear models with permuted data. In 2017 IEEE International Symposium on Information Theory (ISIT) 446-450. IEEE.
[74] PANANJADY, A., WAINWRIGHT, M. J. and COURTADE, T. A. (2018). Linear regression with shuffled data: Statistical and computational limits of permutation recovery. IEEE Trans. Inf. Theory 64 3286-3300. · Zbl 1395.62204 · doi:10.1109/TIT.2017.2776217
[75] PANANJADY, A., MAO, C., MUTHUKUMAR, V., WAINWRIGHT, M. J. and COURTADE, T. A. (2020). Worst-case versus average-case design for estimation from partial pairwise comparisons. Ann. Statist. 48 1072-1097. · Zbl 1452.62561 · doi:10.1214/19-AOS1838
[76] PITSOULIS, L. and PARDALOS, P. M. (2001). Quadratic assignment problem. In Encyclopedia of Optimization (C. A. Floudas and P. M. Pardalos, eds.) 2075-2107. Springer US, Boston, MA.
[77] PLACKETT, R. L. (1975). The analysis of permutations. J. Roy. Statist. Soc. Ser. C 24 193-202. · doi:10.2307/2346567
[78] RAKHLIN, A., SRIDHARAN, K. and TSYBAKOV, A. B. (2017). Empirical entropy, minimax regret and minimax risk. Bernoulli 23 789-824. · Zbl 1380.62176 · doi:10.3150/14-BEJ679
[79] Rigollet, P. and Weed, J. (2019). Uncoupled isotonic regression via minimum Wasserstein deconvolution. Inf. Inference 8 691-717. · Zbl 1471.62335 · doi:10.1093/imaiai/iaz006
[80] ROBERTSON, T. and WRIGHT, F. T. (1975). Consistency in generalized isotonic regression. Ann. Statist. 3 350-362. · Zbl 0305.62044
[81] SHAH, N. B., BALAKRISHNAN, S. and WAINWRIGHT, M. J. (2019a). Feeling the bern: Adaptive estimators for Bernoulli probabilities of pairwise comparisons. IEEE Trans. Inf. Theory 65 4854-5874. · Zbl 1432.62234 · doi:10.1109/TIT.2019.2903249
[82] SHAH, N. B., BALAKRISHNAN, S. and WAINWRIGHT, M. J. (2019b). Low permutation-rank matrices: Structural properties and noisy completion. J. Mach. Learn. Res. 20 101. · Zbl 1434.68451
[83] SHAH, N. B., BALAKRISHNAN, S. and WAINWRIGHT, M. J. (2021). A permutation-based model for crowd labeling: Optimal estimation and robustness. IEEE Trans. Inf. Theory 67 4162-4184. · Zbl 1475.62170
[84] SHAH, N. B., BALAKRISHNAN, S., GUNTUBOYINA, A. and WAINWRIGHT, M. J. (2017). Stochastically transitive models for pairwise comparisons: Statistical and computational issues. IEEE Trans. Inf. Theory 63 934-959. · Zbl 1364.94253 · doi:10.1109/TIT.2016.2634418
[85] STOUT, Q. F. (2015). Isotonic regression for multiple independent variables. Algorithmica 71 450-470. · Zbl 1312.62084 · doi:10.1007/s00453-013-9814-z
[86] TVERSKY, A. (1972). Elimination by aspects: A theory of choice. Psychol. Rev. 79 281. · Zbl 0249.92010
[87] UNNIKRISHNAN, J., HAGHIGHATSHOAR, S. and VETTERLI, M. (2018). Unlabeled sensing with random linear measurements. IEEE Trans. Inf. Theory 64 3237-3253. · Zbl 1395.94168 · doi:10.1109/TIT.2018.2809002
[88] VAN SCHUUR, W. H. (2003). Mokken scale analysis: Between the Guttman scale and parametric item response theory. Polit. Anal. 11 139-163.
[89] van de Geer, S. A. (2000). Applications of Empirical Process Theory. Cambridge Series in Statistical and Probabilistic Mathematics 6. Cambridge Univ. Press, Cambridge. · Zbl 0953.62049
[90] WANG, T., BERTHET, Q. and SAMWORTH, R. J. (2016). Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist. 44 1896-1930. · Zbl 1349.62254 · doi:10.1214/15-AOS1369
[91] Zhang, C.-H. (2002). Risk bounds in isotonic regression. Ann. Statist. 30 528-555. · Zbl 1012.62045 · doi:10.1214/aos/1021379864
[92] Zhang, A. and Xia, D. (2018). Tensor SVD: Statistical and computational limits. IEEE Trans. Inf. Theory 64 7311-7338. · Zbl 1432.62176 · doi:10.1109/TIT.2018.2841377
[93] ZHOU, D., HUANG, J. and SCHÖLKOPF, B. (2007). Learning with hypergraphs: Clustering, classification, and embedding. In Advances in Neural Information Processing Systems 1601-1608.
[94] Zhou, J., Bhattacharya, A., Herring, A. H. and Dunson, D. B. (2015). Bayesian factorizations of big sparse tensors. J. Amer. Statist. Assoc. 110 1562-1576 · Zbl 1373.62282 · doi:10.1080/01621459.2014.983233
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.