×

Simplex-based multinomial logistic regression with diverging numbers of categories and covariates. (English) Zbl 07764892

Summary: Multinomial logistic regression models are popular in multicategory classification analysis, but existing models suffer several intrinsic drawbacks. In particular, the parameters cannot be determined uniquely because of the over-specification. Although additional constraints have been imposed to refine the model, such mod- ifications can be inefficient and complicated. In this paper, we propose a novel and efficient simplex-based multinomial logistic regression technique, seamlessly connecting binomial and multinomial cases under a unified framework. Compared with existing models, our model has fewer parameters, is free of any constraints, and can be solved efficiently using the Fisher scoring algorithm. In addition, the proposed model enjoys several theoretical advantages, including Fisher consistency and sharp comparison inequality. Under mild conditions, we establish the asymptotical normality and convergence for the new model, even when the numbers of categories and covariates increase with the sample size. The proposed framework is illustrated by means of extensive simulations and real applications.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Albert, A. and Anderson, J. A. (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika. 71, 1-10. · Zbl 0543.62020
[2] Anderson, J. A. and Blair, V. (1982). Penalized maximum likelihood estimation in logistic regression and discrimination. Biometrika. 69, 123-136. · Zbl 0486.62032
[3] Anderson, J. A. (1972). Separate sample logistic discrimination. Biometrika. 59, 19-35. · Zbl 0231.62080
[4] Baker, S. G. (1994). The multinomial-poisson transformation. J. Roy. Stat. Soc. Series D. 43, 495-504.
[5] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101, 138-156. · Zbl 1118.62330
[6] Berger, A. L., Della Pietra, V. J. and Della Pietra, S. A. (1996). A maximum entropy approach to natural language processing. Comput. Linguist. 22, 39-71.
[7] Böhning, D. (1992). Multinomial logistic regression algorithm. Ann. Inst. Stat. Math. 44, 197-200. · Zbl 0763.62038
[8] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press. · Zbl 1058.90049
[9] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer. · Zbl 1273.62015
[10] Cramer, J. S. (2003). Logit Models From Economics and Other Fields. Cambridge University Press. · Zbl 1027.62057
[11] de Jong, V. M. T., Eijkemans, M. J. C., van Calster, B., Timmerman, D., Moons, K. G. M., Steyerberg, E. W. et al. (2019). Sample size considerations and predictive performance of multinomial logistic prediction models. Stat. Med. 38, 1503-1702.
[12] Dekel, O. and Shamir, O. (2010). Multiclass-multilabel classification with more classes than examples. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics PMLR 9, 137-144.
[13] Deng, J., Berg, A. C., Li, K. and Li, F.-F. (2010). What does classifying more than 10,000 image categories tell us? In Computer Vision -ECCV 2010 Part V ECCV 2010, 71-84. Springer.
[14] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Li, F.-F. (2009). ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 248-255.
[15] Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Stat. 13, 342-368. · Zbl 0594.62058
[16] Fang, J. and Yi, G. Y. (2021). Matrix-variate logistic regression with measurement error. Biometrika. 108, 83-97. · Zbl 1462.62462
[17] Faraway, J. J. (2016). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. 2nd Edition. Chapman and Hall/CRC. · Zbl 1353.62002
[18] Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions. 4th Edition. John Wiley & Sons. · Zbl 1258.62012
[19] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28, 337-407. · Zbl 1106.62323
[20] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1-22.
[21] Fu, S., Zhang, S. and Liu, Y.(2018). Adaptively weighted large-margin angle-based classifiers. J. Multivar. Anal. 166, 282-299. · Zbl 1499.62210
[22] Gao, Q., Du, X., Zhou, X. and Xie, F. (2018). Asymptotic properties of maximum quasi-likelihood estimators in generalized linear models with diverging number of covariates. J. Syst. Sci. Complex. 31, 1362-1376. · Zbl 1409.62150
[23] Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning. MIT Press. · Zbl 1373.68009
[24] Hastie, T., Tibshirani, R. and Friedman, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition. Springer. · Zbl 1273.62005
[25] Hastie, T., Tibshirani, R. and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press. · Zbl 1319.68003
[26] He, X. and Shao, Q.-M. (2000). On parameters of increasing dimensions. J. Multivar. Anal. 73, 120-135. · Zbl 0948.62013
[27] Hill, S. I. and Doucet, A. (2007). A framework for kernel-based multi-category classification. J. Artif. Intell. Res. 30, 525-564. · Zbl 1182.68197
[28] Hosmer, D. W., Lemeshow, S. and Sturdivant, R. X. (2013). Applied Logistic Regression. 2nd Edition. John Wiley & Sons. · Zbl 1276.62050
[29] Krishnapuram, B., Carin, L. Figueiredo, M. A. and Hartemink, A. J. (2005). Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Trans. Pattern. Anal. Mach. Intell. 27, 957-968.
[30] Lang, J. B. (1996). On the comparison of multinomial and poisson log-linear models. J. Roy. Stat. Soc. Series B. 58, 253-266. · Zbl 0850.62567
[31] Lange, K. and Wu, T. T. (2008). An MM algorithm for multicategory vertex discriminant analysis. J. Comput. Graph. Stat. 17, 527-544.
[32] Lemeshow, S. and Hosmer, D. W. (2014). Logistic regression in practice. In Wiley StatsRef: Statistics Reference Online, 1-15. Wiley Online Library.
[33] Liang, H. and Du, P. (2012). Maximum likelihood estimation in logistic regression models with a diverging number of covariates. Electronic J. of Statistics. 6, 1838-1846. · Zbl 1295.62021
[34] Liu, T. Y., Yang, Y., Wan, H., Zhou, Q., Gao, B., Zeng, H. J. et al. (2005). An experimental study on large-scale web categorization. In International Conference on World Wide Web, 1106-1107. Association for Computing Machinery.
[35] Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Conference on Natural Language Learning, 1-7. Association for Computational Linguistics.
[36] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. CRC Press. · Zbl 0744.62098
[37] Mo, W. and Liu, Y. (2021). Supervised learning. In Wiley StatsRef: Statistics Reference Online, 1-20. Wiley Online Library.
[38] Mroueh, Y., Poggio, T. Rosasco, L. and Slotine, J.-J. E. (2012). Multiclass learning with sim-plex coding. In Proceedings of the 25th International Conference on Neural Information Processing Systems 2, 2789-2797.
[39] Ng, A., Ngiam, J., Foo, C. Y., Mai, Y. and Suen, C. (2013). Softmax Regression. http://ufldl. stanford.edu/wiki/index.php/Softmax_Regression, UFLDL Tutorial.
[40] Nilsback, M.-E. and Zisserman, A. (2008). Automated flower classification over a large number of classes. In Proceedings of the 6th Indian Conference on Computer Vision, Graphics & Image Processing, 722-729.
[41] Portnoy, S. (1985). Asymptotic behavior of M -estimators of p regression parameters when p 2 /n is large; II. normal approximation. Ann. Stat. 13, 1403-1417. · Zbl 0601.62026
[42] Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Stat. 16, 356-366. · Zbl 0637.62026
[43] Powers, S., Hastie, T. and Tibshirani, R. (2018). Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball. Stat. Model. 18, 388-410. · Zbl 07289515
[44] Price, B. S., Geyer, C. J. and Rothman, A. J. (2019). Automatic response category combination in multinomial logistic regression. J. Comput. Graph. Stat. 28, 758-766. · Zbl 07499092
[45] R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. Web: https://www.R-project.org.
[46] Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press. · Zbl 0853.62046
[47] Tewari, A. and Bartlett, P. L. (2007). On the consistency of multiclass classification methods. J. Mach. Learn. Res. 8, 1007-1025. · Zbl 1222.62079
[48] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Stat. 32, 135-166. · Zbl 1105.62353
[49] Tutz, G. (2011). Regression for Categorical Data. Cambridge University Press. · Zbl 1304.62021
[50] Tutz, G., Pößnecker, W. and Uhlmann, L. (2015). Variable selection in general multinomial logit models. Comput. Stat. Data Anal. 82, 207-222. · Zbl 1507.62170
[51] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. · Zbl 0910.62001
[52] Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates. Ann. Stat. 39, 389-417. · Zbl 1209.62138
[53] Yee, T. W. (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. Springer. · Zbl 1380.62006
[54] Zahid, F. M. and Tutz, G. (2013a). Multinomial logit models with implicit variable selection. Adv. Data Anal. Classif. 7, 393-416. · Zbl 1306.62169
[55] Zahid, F. M. and Tutz, G. (2013b). Ridge estimation for multinomial logit models with sym-metric side constraints. Comput. Stat. 28, 1017-1034. · Zbl 1305.65087
[56] Zhang, C. and Liu, Y. (2014). Multicategory angle-based large-margin classification. Biometrika. 101, 625-640. · Zbl 1335.62110
[57] Zhang, C., Pham, M., Fu, S. and Liu, Y. (2018). Robust multicategory support vector machines using difference convex algorithm. Math. Program. 169, 277-305. · Zbl 1397.90319
[58] Zhang, T. (2004). Statistical analysis of some multi-category large margin classification methods. J. Mach. Learn. Res. 5, 1225-1251. · Zbl 1222.68344
[59] Zhang, Y., Zhou, H., Zhou, J. and Sun, W. (2017). Regression models for multivariate count data. J. Comput. Graph. Stat. 26, 1-13.
[60] Zhu, J. and Hastie, T. (2004). Classification of gene microarrays by penalized logistic regression. Biostatistics 5, 427-443. · Zbl 1154.62406
[61] Zhu, J. and Hastie, T. (2005). Kernel logistic regression and the import vector machine. J. Comput. Graph. Stat. 14, 185-205.
[62] Piao Chen
[63] Yufeng Liu
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.