×

Optimizing logistic regression coefficients for discrimination and calibration using estimation of distribution algorithms. (English) Zbl 1154.62053

Summary: Logistic regression is a simple and efficient supervised learning algorithm for estimating the probability of an outcome or class variable. In spite of its simplicity, logistic regression has shown very good performance in a range of fields. It is widely accepted in a range of fields because its results are easy to interpret. Fitting the logistic regression model usually involves using the principle of maximum likelihood. The Newton-Raphson algorithm is the most common numerical approach for obtaining the coefficients maximizing the likelihood of the data.
This work presents a novel approach for fitting the logistic regression model based on estimation of distribution algorithms (EDAs), a tool for evolutionary computation. EDAs are suitable not only for maximizing the likelihood, but also for maximizing the area under the receiver operating characteristic curve (AUC). Thus, we tackle the logistic regression problem from a double perspective: likelihood-based to calibrate the model and AUC-based to discriminate between the different classes. Under these two objectives of calibration and discrimination, the Pareto front can be obtained in our EDA framework. These fronts are compared with those yielded by a multiobjective EDA recently introduced in the literature.

MSC:

62J12 Generalized linear models (logistic models)
65C60 Computational problems in statistics (MSC2010)
90C59 Approximation methods and heuristics in mathematical programming
90C29 Multi-objective and goal programming

Software:

UCI-ml; R

References:

[1] Baumgartner C, Böhm C, Baumgartner D, Marini G, Weinberger K, Olgemöller B, Liebl B, Roscher AA (2004) Supervised machine learning techniques for the classification of metabolic disorders in newborns. Bioinformatics 20(17):2985–2996 · doi:10.1093/bioinformatics/bth343
[2] Blanco R, Inza I, Larrañaga P (2003) Learning Bayesian networks in the space of structures by estimation of distribution algorithms. Int J Intell Syst 18:205–220 · Zbl 1028.68162 · doi:10.1002/int.10084
[3] Bouckaert R, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai H, Srikant R, Zhang C (eds) PAKDD. LNAI, vol 3056. Springer, Berlin, pp 3–12
[4] Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159 · doi:10.1016/S0031-3203(96)00142-2
[5] Brier G (1950) Verification of forecasts expressed in terms of probabilities. Monthly Weather Rev 78:1–3 · doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
[6] Deb K, Sinha A, Kukkonen S (2006) Multi-objective test problems, linkages, and evolutionary methodologies. In: GECCO-2006, Genetic and evolutionary computation conference, vol 2. ACM Press, New York, pp 1141–1148
[7] Fawcett T (2003) ROC graphs: Notes and practical considerations for data mining researchers. Technical report, HPL 2003-4, HP Labs
[8] Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading · Zbl 0721.68056
[9] Hajek J, Zidak ZB, Sen PK (1999) Theory of rank tests, 2nd edn. Academic Press, San Diego
[10] Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
[11] Harrell FE, Lee KL, Califf R, Pryor D, Rosati R (1984) Regression modelling strategies for improved prognostic prediction. Stat Med 3:143–152 · doi:10.1002/sim.4780030207
[12] Harrell FE, Lee KL, Mark DB (1996) Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15:361–387 · doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
[13] Hilden J (1991) The area under the ROC curve and its competitors. Med Decis Mak 11(2):95–101 · doi:10.1177/0272989X9101100204
[14] Horton NJ, Brown ER, Qian L (2004) Use of R as a toolbox for mathematical statistics exploration. Am Stat 58(4):343–357 · doi:10.1198/000313004X5572
[15] Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York · Zbl 0967.62045
[16] Ihaka R, Gentleman R (1996) R: A language for data analysis and graphics. J Comput Graph Stat 5:229–314
[17] Inza I, Larrañaga P, Etxeberria R, Sierra B (2000) Feature subset selection by Bayesian network-based optimization. J Artif Intell Res 123(1–2):157–184 · Zbl 0952.68118 · doi:10.1016/S0004-3702(00)00052-7
[18] Kiang MY (2003) A comparative assessment of classification methods. Decis Support Syst 35:441–454 · doi:10.1016/S0167-9236(02)00110-0
[19] Larrañaga P, Lozano JA (2002) Estimation of distribution algorithms. A new tool for evolutionary computation. Kluwer Academic, Dordrecht · Zbl 0979.00024
[20] Larrañaga P, Etxeberria R, Lozano JA, Peña JM (2000) Optimization in continuous domains by learning and simulation of Gaussian networks. In: Workshop in optimization by building and using probabilistic models within the 2000 genetic and evolutionary computation conference, GECCO 2000, pp 201–204
[21] Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of ROC curves in biomedical informatics. J Biomed Inform 38:404–415 · doi:10.1016/j.jbi.2005.02.008
[22] Lozano JA, Larrañaga P, Inza I, Bengoetxea E (2006) Towards a new evolutionary computation. Advances in estimation of distribution algorithms. Springer, New York · Zbl 1089.68121
[23] McLachlan G (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York · Zbl 1108.62317
[24] Minka T (2003) A comparison of numerical optimizers for logistic regression. Technical report, 758, Carnegie Mellon University
[25] Nakamichi R, Imoto S, Miyano S (2004) Case-control study of binary disease trait considering interactions between SNPs and environmental effects using logistic regression. In: Fourth IEEE symposium on bioinformatics and bioengineering, vol 21, pp 73–78
[26] Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases
[27] Ng A, Jordan M (2001) On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In: Proceedings of NIPS, vol 14, pp 841–848
[28] Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford · Zbl 1039.62105
[29] Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings 15th international conference on machine learning. Morgan Kaufmann, San Mateo, pp 445–453
[30] R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0
[31] Romero T, Larrañaga P, Sierra B (2004) Learning Bayesian networks in the space of orderings with estimation of distribution algorithms. Int J Pattern Recogn Artif Intell 4(18):607–625 · doi:10.1142/S0218001404003332
[32] Ryan TP (1997) Modern regression methods. Wiley, New York
[33] Steuer RE (1986) Multiple criteria optimization: Theory, computation, and application. Wiley, New York · Zbl 0663.90085
[34] Steyerberg E, Borsboom G, van Houwelingen H, Eijkemans M, Habbema J (2004) Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med 23(10):2567–2586 · doi:10.1002/sim.1844
[35] Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B 36:111–147 · Zbl 0308.62063
[36] Thisted RA (1988) Elements of statistical computing. Chapman and Hall, London · Zbl 0663.62001
[37] van den Hout WB (2003) The area under an ROC curve with limited information. Med Decis Mak 23:160–166 · doi:10.1177/0272989X03251246
[38] Vinterbo S, Ohno-Machado L (1999a) A genetic algorithm to select variables in logistic regression: Example in the domain of myocardial infarctio. J Am Med Inform Assoc 6:984–988
[39] Vinterbo S, Ohno-Machado L (1999b). A recalibration method for predictive models with dichotomous outcomes. In: Predictive models in medicine: Some methods for construction and adaptation. PhD thesis, Norwegian University of Science and Technology
[40] Winker P, Gilli M (2004) Applications of optimization heuristics to estimation and modelling problems. Computat Stat Data Anal 47:211–223 · Zbl 1429.62034 · doi:10.1016/j.csda.2003.11.026
[41] Zhang Q, Zhou A, Jin Y (2008) RM-MEDA: A regularity model based multiobjective estimation of distribution algorithms. IEEE Trans Evol Comput 12(1):41–63 · doi:10.1109/TEVC.2007.894202
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.