×

A computational approach to nonparametric regression: bootstrapping CMARS method. (English) Zbl 1343.62025

Summary: Bootstrapping is a computer-intensive statistical method which treats the data set as a population and draws samples from it with replacement. This resampling method has wide application areas especially in mathematically intractable problems. In this study, it is used to obtain the empirical distributions of the parameters to determine whether they are statistically significant or not in a special case of nonparametric regression, conic multivariate adaptive regression splines (CMARS), a statistical machine learning algorithm. CMARS is the modified version of the well-known nonparametric regression model, multivariate adaptive regression splines (MARS), which uses conic quadratic optimization. CMARS is at least as complex as MARS even though it performs better with respect to several criteria. To achieve a better performance of CMARS with a less complex model, three different bootstrapping regression methods, namely, random-X, fixed-X and wild bootstrap are applied on four data sets with different size and scale. Then, the performances of the models are compared using various criteria including accuracy, precision, complexity, stability, robustness and computational efficiency. The results imply that bootstrap methods give more precise parameter estimates although they are computationally inefficient and that among all, random-X resampling produces better models, particularly for medium size and scale data sets.

MSC:

62G08 Nonparametric regression and quantile regression
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Aldrin, M. (2006). Improved predictions penalizing both slope and curvature in additive models. Computational Statistics and Data Analysis, 50(2), 267-284. · Zbl 1431.62134 · doi:10.1016/j.csda.2004.08.002
[2] Aster, R. C., Borchers, B., & Thurber, C. (2012). Parameter estimation and inverse problems. Burlington: Academic Press. · Zbl 1273.35306
[3] Austin, P. (2008). Using the bootstrap to improve estimation and confidence intervals for regression coefficients selected using backwards variable elimination. Statistics in Medicine, 27(17), 3286-3300. · doi:10.1002/sim.3104
[4] Batmaz, İ., Yerlikaya-Özkurt, F., Kartal-Koç, E., Köksal, G., Weber, G. W. (2010). Evaluating the CMARS performance for modeling nonlinearities. In Proceedings of the 3rd global conference on power control and optimization, gold coast (Australia), vol. 1239, pp. 351-357.
[5] Çelik, G. (2010). Parameter estimation in generalized partial linear models with conic quadratic programming. Master Thesis, Graduate School of Applied Mathematics, Department of Scientific Computing, METU, Ankara, Turkey.
[6] Chernick, M. (2008). Bootstrap methods: A guide for practitioners and researchers. New York: Wiley. · Zbl 1136.62029
[7] Cortez, P., & Morais., A. (2007). Data mining approach to predict forest fires using meteorological data. In J. Neves, M. F. Santos, J. Machado (ed.), New trends in artificial intelligence, proceedings of the 13th EPIA 2007 - Portuguese conference on artificial intelligence, December, Guimarães (Portugal), pp. 512-523.
[8] Deconinck, E., Zhang, M. H., Petitet, F., Dubus, E., Ijjaali, I., Coomans, D., et al. (2008). Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: A case study. Analytica Chimica Acta, 609(1), 13-23. · doi:10.1016/j.aca.2007.12.033
[9] Denison, D. G. T., Mallick, B. K., & Smith, F. M. (1998). Bayesian MARS. Statistics and Computing, 8(4), 337-346. · doi:10.1023/A:1008824606259
[10] Efron, B. (1988). Computer-intensive methods in statistical regression. Society for Industrial and Applied Mathematics, 30(3), 421-449. · Zbl 0661.62061
[11] Efron, B., & Tibshirani, R. J. (1991). Statistical data analysis in the computer age. Science, 253, 390-395. · doi:10.1126/science.253.5018.390
[12] Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall. · Zbl 0835.62038 · doi:10.1007/978-1-4899-4541-9
[13] Flachaire, E. (2005). Bootstrapping heteroskedastic regression models: Wild bootstrap vs. pairs bootstrap. Computational Statistics and Data Analysis, 49(2), 361-376. · Zbl 1429.62153 · doi:10.1016/j.csda.2004.05.018
[14] Fox, J. (2002). Bootstrapping regression models. An R and S-plus companion to applied regression: Web appendix to the book. Sage, CA: Thousand Oaks.
[15] Freedman, D. A. (1981). Bootstrapping regression models. The Annals of Statistics, 9(6), 1218-1228. · Zbl 0449.62046 · doi:10.1214/aos/1176345638
[16] Friedman, J. (1991). Multivariate adaptive regression splines. Annals of Statistics, 19(1), 1-67. · Zbl 0765.62064 · doi:10.1214/aos/1176347963
[17] Gentle, J. E. (2009). Computational statistics. New York: Springer. · Zbl 1179.62001 · doi:10.1007/978-0-387-98144-4
[18] Ghasemi, J. B., & Zolfonoun, E. (2013). Application of principal component analysis-multivariate adaptive regression splines for the simultaneous spectrofluorimetric determination of dialkyltins in micellar media. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 115, 357-363. · doi:10.1016/j.saa.2013.06.054
[19] Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning, data mining, inference and prediction. New York: Springer. · Zbl 0973.62007
[20] Hjorth, J. S. U. (1994). Computer intensive statistical methods: Validation model selection and bootstrap. New York: Chapman & Hall. · Zbl 0829.62001
[21] Holmes, C. C., & Denison, D. G. T. (2003). Classification with bayesian MARS. Machine Learning, 50, 159-173. · Zbl 1048.62061 · doi:10.1023/A:1020254013004
[22] Kartal, E. (2007). Metamodeling complex systems using linear and nonlinear regression methods. Master Thesis, Graduate School of Natural and Applied Sciences, Department of Statistics, METU, Ankara, Turkey. · Zbl 1416.65169
[23] Kriner, M. (2007). Survival analysis with multivariate adaptive regression splines. Dissertation, LMU Munchen: Faculty of Mathematics, Computer Science and Statistics, Munchen. · Zbl 1185.62181
[24] Lee, Y., & Wu, H. (2012). MARS approach for global sensitivity analysis of differential equation models with applications to dynamics of influenza infection. Bulletin of Mathematical Biology, 74, 73-90. · Zbl 1318.92034 · doi:10.1007/s11538-011-9664-2
[25] Lin, C. J., Chen, H. F., & Lee, T. S. (2011). Forecasting tourism demand using time series, artificial neural networks and multivariate adaptive regression splines: evidence from Taiwan. International Journal of Business Administration, 2(2), 14-24. · doi:10.5430/ijba.v2n2p14
[26] Martinez, W. L., & Martinez, A. R. (2002). Computational statistics handbook with Matlab. New York: Chapman & Hall. · Zbl 0986.62104
[27] MATLAB Version 7.8.0 (2009). The math works, USA.
[28] Milborrow, S. (2009). Earth: Multivariate adaptive regression spline models. · Zbl 1429.62153
[29] Montgomery, D. C., Peck, E. A., & Vining, G. G. (2006). Introduction to linear regression analysis. New York: Wiley. · Zbl 1229.62092
[30] MOSEK, Version 6. A very powerful commercial software for CQP, ApS, Denmark. http://www.mosek.com. Accessed Jan 7, 2011.
[31] Osei-Bryson, K. M. (2004). Evaluation of decision trees: A multi-criteria approach. Computers & Operational Research, 31, 1933-1945. · Zbl 1068.68055 · doi:10.1016/S0305-0548(03)00156-4
[32] Özmen, A., Weber, G. W., Batmaz, İ., & Kropat, E. (2011). RCMARS: Robustification of CMARS with different scenarios under polyhedral uncertainty set. Communications in Nonlinear Science and Numerical Simulation (CNSNS), 16(12), 4780-4787. · Zbl 1416.65169 · doi:10.1016/j.cnsns.2011.04.001
[33] Salibian-Barrera, M., & Zamar, R. Z. (2002). Bootstrapping robust estimates of regression. The Annals of Statistics, 30(2), 556-582. · Zbl 1012.62028 · doi:10.1214/aos/1021379865
[34] Sezgin-Alp, O. S., Büyükbebeci, E., Iscanoglu Cekic, A., Yerlikaya-Özkurt, F., Taylan, P., & Weber, G.-W. (2011). CMARS and GAM & CQP—modern optimization methods applied to international credit default prediction. Journal of Computational and Applied Mathematics (JCAM), 235, 4639-4651. · Zbl 1217.91203 · doi:10.1016/j.cam.2010.04.039
[35] Taylan, P., Weber, G.-W., & Yerlikaya-Özkurt, F. (2010). A new approach to multivariate adaptive regression spline by using Tikhonov regularization and continuous optimization. TOP (the Operational Research Journal of SEIO (Spanish Statistics and Operations Research Society), 18(2), 377-395. · Zbl 1208.41007
[36] Weber, G. W., Batmaz, İ., Köksal, G., Taylan, P., & Yerlikaya-Özkurt, F. (2012). CMARS: A new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization. Inverse Problems in Science and Engineering, 20(3), 371-400. · Zbl 1254.65020 · doi:10.1080/17415977.2011.624770
[37] Wegman, E. (1988). Computational statistics: A new agenda for statistical theory and practice. Journal of the Washington Academy of Sciences, 78, 310-322.
[38] Yazıcı, C. (2011). A computational approach to nonparametric regression: Bootstrapping CMARS method. Master Thesis, Graduate School of Natural and Applied Sciences, Department of Statistics, METU, Ankara, Turkey. · Zbl 1343.62025
[39] Yazıcı, C., Yerlikaya-Özkurt, F., & Batmaz, İ. (2011). A computational approach to nonparametric regression: Bootstrapping CMARS method. In ERCIM’11:4th international conference of the ERCIM W&G on computing and statistics. London, UK. December 17-19. Book of Abstracts, 129. · Zbl 1343.62025
[40] Yeh, I.-C. (2007). Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites, 29(6), 474-480. · doi:10.1016/j.cemconcomp.2007.02.001
[41] Yerlikaya, F. (2008). A new contribution to nonlinear robust regression and classification with mars and its applications to data mining for quality control in manufacturing. Master Thesis, Graduate School of Applied Mathematics, Department of Scientific Computing, METU, Ankara, Turkey.
[42] Yerlikaya-Özkurt, F., Batmaz, İ., & Weber, G.-W. (2014). A review of conic multivariate adaptive regression splines (CMARS): A powerful tool for predictive data mining, to appear as chapter in book. In D. Zilberman, A. Pinto, (eds.) Springer volume modeling, optimization, dynamics and bioeconomy, series springer proceedings in mathematics.
[43] Yetere-Kurşun, & A., Batmaz, İ. (2010). Comparison of regression methods by employing bootstrapping methods. COMPSTAT2010: 19th international conference on computational statistics. Paris, France. August 22-27. Book of Abstracts, 92.
[44] York, T. P., Eaves, L. J., Van Den Oord, E., & JC, G. (2006). Multivariate adaptive regression splines: A powerful method for detecting disease-risk relationship differences among subgroups. Statistics in Medicine, 25(8), 1355-1367. · doi:10.1002/sim.2292
[45] Zakeri, I. F., Adolph, A. L., Puyau, M. R., Vohra, F. A., & Butte, N. F. (2010). Multivariate adaptive regression splines models for the prediction of energy expenditure in children and adolescents. Journal of Applied Physchology, 108, 128-136.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.