Document Zbl 07707227

Is a classification procedure good enough? – A goodness-of-fit assessment tool for classification learning. (English) Zbl 07707227

J. Am. Stat. Assoc. 118, No. 542, 1115-1125 (2023).

Summary: In recent years, many nontraditional classification methods, such as random forest, boosting, and neural network, have been widely used in applications. Their performance is typically measured in terms of classification accuracy. While the classification error rate and the like are important, they do not address a fundamental question: Is the classification method underfitted? To our best knowledge, there is no existing method that can assess the goodness of fit of a general classification procedure. Indeed, the lack of a parametric assumption makes it challenging to construct proper tests. To overcome this difficulty, we propose a methodology called BAGofT that splits the data into a training set and a validation set. First, the classification procedure to assess is applied to the training set, which is also used to adaptively find a data grouping that reveals the most severe regions of underfitting. Then, based on this grouping, we calculate a test statistic by comparing the estimated success probabilities and the actual observed responses from the validation set. The data splitting guarantees that the size of the test is controlled under the null hypothesis, and the power of the test goes to one as the sample size increases under the alternative hypothesis. For testing parametric classification models, the BAGofT has a broader scope than the existing methods since it is not restricted to specific parametric models (e.g., logistic regression). Extensive simulation studies show the utility of the BAGofT when assessing general classification procedures and its strengths over some existing methods when testing parametric classification models. Supplementary materials for this article are available online.

Cited in 1 Document

MSC:

62-XX

Statistics

Keywords:

adaptive partition; classification procedure; goodness-of-fit test

Software:

MobileNetV2; STUKEL; Fashion-MNIST; ResourceSelection; XGBoost; rms; randomForest; GRPtests; xgboost

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	Bondell, H. D., “Testing Goodness-of-Fit in Logistic Case-Control Studies, Biometrika, 94, 487-495 (2007) · Zbl 1132.62020 · doi:10.1093/biomet/asm033
[2]	Bondell, H. D.; Krishna, A.; Ghosh, S. K., “Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models, Biometrics, 66, 1069-1077 (2010) · Zbl 1233.62134 · doi:10.1111/j.1541-0420.2010.01391.x
[3]	Canary, J. D.; Blizzard, L.; Barry, R. P.; Hosmer, D. W.; Quinn, S. J., “A Comparison of the Hosmer-Lemeshow, Pigeon-Heyse, and Tsiatis Goodness-of-Fit Tests for Binary Logistic Regression Under Two Grouping Methods, Communications in Statistics-Simulation and Computation, 46, 1871-1894 (2017) · Zbl 1364.62105 · doi:10.1080/03610918.2015.1017583
[4]	Chen, T.; Guestrin, C., Xgboost: A Scalable Tree Boosting System, 785-794 (2016)
[5]	Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M., Geng, Y., and Li, Y. (2020), xgboost: Extreme Gradient Boosting, R package version 1.2.0.1.
[6]	Fahrmexr, L., “Maximum Likelihood Estimation in Misspecified Generalized Linear Models, Statistics, 21, 487-502 (1990) · doi:10.1080/02331889008802259
[7]	Farrington, C. P., “On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data, Journal of the Royal Statistical Society: Series B, 58, 349-360 (1996) · Zbl 0866.62040
[8]	Harrell, F. E. Jr (2019), rms: Regression Modeling Strategies, R package version 5.1-3.1.
[9]	He, X.; Yang, X.; Zhang, S.; Zhao, J.; Zhang, Y.; Xing, E.; Xie, P., Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans, medRxiv (2020)
[10]	Hosmer, D. W.; Lemeshow, S., “Goodness of Fit Tests for the Multiple Logistic Regression Model, Communications in Statistics-Theory and Methods, 9, 1043-1069 (1980) · Zbl 0447.62025 · doi:10.1080/03610928008827941
[11]	Janková, J., Shah, R. D., Büehlmann, P., and Samworth, R. J. (2019), GRPtests: Goodness-of-Fit Tests in High-Dimensional GLMs, R package version 0.1.0.
[12]	Janková, J.; Shah, R. D.; Bühlmann, P.; Samworth, R. J., “Goodness-of-Fit Testing in High-Dimensional Generalized Linear Models,”, Journal of the Royal Statistical Society, Series B (2020) · Zbl 07554773
[13]	Le Cessie, S.; Van Houwelingen, J. C., “A Goodness-of-Fit Test for Binary Regression Models, Based on Smoothing Methods, Biometrics, 1267-1282 (1991) · Zbl 0825.62833 · doi:10.2307/2532385
[14]	Lei, J., “Cross-Validation With Confidence, Journal of the American Statistical Association, 115, 1978-1997 (2020) · Zbl 1453.62580 · doi:10.1080/01621459.2019.1672556
[15]	Lele, S. R., Keim, J. L., and Solymos, P. (2019), ResourceSelection: Resource Selection (Probability) Functions for Use-Availability Data, R package version 0.3-5.
[16]	Liaw, A.; Wiener, M., “Classification and Regression by randomForest, R News, 2, 18-22 (2002)
[17]	Liu, Y.; Nelson, P. I.; Yang, S.-S., “An Omnibus Lack of Fit Test in Logistic Regression With Sparse Data, Statistical Methods & Applications, 21, 437-452 (2012) · Zbl 1332.62179
[18]	Lu, C.; Yang, Y., “On Assessing Binary Regression Models Based on Ungrouped Data, Biometrics, 75, 5-12 (2019) · Zbl 1436.62595 · doi:10.1111/biom.12969
[19]	McCullagh, P., “On the Asymptotic Distribution of Pearson’s Statistic in Linear Exponential-Family Models, International Statistical Review, 61-67 (1985) · Zbl 0575.62020 · doi:10.2307/1402880
[20]	Orme, C., “The Calculation of the Information Matrix Test for Binary Data Models, The Manchester School, 56, 370-376 (1988) · doi:10.1111/j.1467-9957.1988.tb01339.x
[21]	Osius, G.; Rojek, D., “Normal Goodness-of-fit Tests for Multinomial Models With Large Degrees of Freedom, Journal of the American Statistical Association, 87, 1145-1152 (1992) · Zbl 0765.62052 · doi:10.1080/01621459.1992.10476271
[22]	Pigeon, J. G.; Heyse, J. F., “An Improved Goodness of Fit Statistic for Probability Prediction Models, Biometrical Journal, 41, 71-82 (1999) · Zbl 0915.62036 · doi:10.1002/(SICI)1521-4036(199903)41:1<71::AID-BIMJ71>3.0.CO;2-O
[23]	Pulkstenis, E.; Robinson, T. J., “Two Goodness-of-fit Tests for Logistic Regression Models With Continuous Covariates, Statistics in Medicine, 21, 79-93 (2002) · doi:10.1002/sim.943
[24]	Ribeiro, M.; Grolinger, K.; Capretz, M. A., Mlaas: Machine Learning as a Service, 896-902 (2015)
[25]	Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C., MobileNetV2: Inverted Residuals and Linear Bottlenecks, 4510-4520 (2018)
[26]	Shigemizu, D., “Risk Prediction Models for Dementia Constructed by Supervised Principal Component Analysis Using miRNA Expression Data, Communications Biology, 2, 77 (2019) · doi:10.1038/s42003-019-0324-7
[27]	Stukel, T. A., “Generalized Logistic Models, Journal of the American Statistical Association, 83, 426-431 (1988) · doi:10.1080/01621459.1988.10478613
[28]	Székely, G. J.; Rizzo, M. L.; Bakirov, N. K., “Measuring and Testing Dependence by Correlation of Distances, The Annals of Statistics, 35, 2769-2794 (2007) · Zbl 1129.62059 · doi:10.1214/009053607000000505
[29]	White, H., “Maximum Likelihood Estimation of Misspecified Models, Econometrica, 1-25 (1982) · Zbl 0478.62088 · doi:10.2307/1912526
[30]	Xian, X.; Wang, X.; Ding, J.; Ghanadan, R., Assisted Learning: A Framework for Multiple-Organization Learning, Proc. NeurIPS (spotlight) (2020)
[31]	Xiao, H.; Rasul, K.; Vollgraf, R., Fashion-mnist: A Novel Image Dataset for Benchmarking Machine Learning Algorithms, arXiv:1708.07747 (2017)
[32]	Xie, X.-J.; Pendergast, J.; Clarke, W., “Increasing the Power: A Practical Approach to Goodness-of-Fit Test for Logistic Regression Models With Continuous Predictors, Computational Statistics & Data Analysis, 52, 2703-2713 (2008) · Zbl 1452.62323
[33]	Yang, Y., “Minimax Nonparametric Classification-Part I: Rates of Convergence, IEEE Transactions on Information Theory, 45, 2271-2284 (1999) · Zbl 0962.62026
[34]	Yang, Y., “Comparing Learning Methods for Classification, Statistica Sinica, 635-657 (2006) · Zbl 1096.62071
[35]	Yin, G.; Ma, Y., “Pearson-Type Goodness-of-fit Test With Bootstrap Maximum Likelihood Estimation, Electronic Journal of Statistics, 7, 412 (2013) · Zbl 1337.62188 · doi:10.1214/13-EJS773
[36]	Yu, Y.; Feng, Y., “Modified Cross-Validation for Penalized High-Dimensional Linear Regression Models, Journal of Computational and Graphical Statistics, 23, 1009-1027 (2014) · doi:10.1080/10618600.2013.849200

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.