×

Robust weighted kernel logistic regression in imbalanced and rare events data. (English) Zbl 1247.62190

Summary: Recent developments in computing and technology, along with the availability of large amounts of raw data, have contributed to the creation of many effective techniques and algorithms in the fields of pattern recognition and machine learning. The main objectives for developing these algorithms include identifying patterns within the available data or making predictions, or both. Great success has been achieved with many classification techniques in real-life applications. With regard to binary data classification in particular, analysis of data containing rare events or disproportionate class distributions poses a great challenge to industry and to the machine learning community. This study examines rare events (REs) with binary dependent variables containing many more non-events (zeros) than events (ones). These variables are difficult to predict and to explain as has been evidenced in the literature. This research combines rare events corrections to Logistic Regression (LR) with truncated Newton methods and applies these techniques to Kernel Logistic Regression (KLR). The resulting model, Rare Event Weighted Kernel Logistic Regression (RE-WKLR), is a combination of weighting, regularization, approximate numerical methods, kernelization, bias correction, and efficient implementation, all of which are critical to enabling RE-WKLR to be an effective and powerful method for predicting rare events. Comparing RE-WKLR to SVM and TR-KLR, using non-linearly separable, small and large binary rare event datasets, we find that RE-WKLR is as fast as TR-KLR and much faster than SVM. In addition, according to the statistical significance test, RE-WKLR is more accurate than both SVM and TR-KLR.

MSC:

62J12 Generalized linear models (logistic models)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence

Software:

fda (R); LIBSVM; UCI-ml

References:

[1] Amemiya, T., Advanced Econometrics (1985), Harvard University Press
[2] Asuncion, A., Newman, D.J., 2007. UCI machine learning repository, University of California, Irvine. School of Information and Computer Sciences. http://www.ics.uci.edu/ mlearn/MLRepository.html; Asuncion, A., Newman, D.J., 2007. UCI machine learning repository, University of California, Irvine. School of Information and Computer Sciences. http://www.ics.uci.edu/ mlearn/MLRepository.html
[3] Bai, S.B., Wang, J., Zhang, F.Y., Pozdnoukhov, A., Kanevski, M., 2008. Prediction of landslide susceptibility using logistic regression: a case study in Bailongjiang river basin, China. In: Fuzzy Systems and Knowledge Discovery, Fourth International Conference on, vol. 4, pp. 647-651.; Bai, S.B., Wang, J., Zhang, F.Y., Pozdnoukhov, A., Kanevski, M., 2008. Prediction of landslide susceptibility using logistic regression: a case study in Bailongjiang river basin, China. In: Fuzzy Systems and Knowledge Discovery, Fourth International Conference on, vol. 4, pp. 647-651.
[4] Ben-Akiva, M.; Lerman, S., Discrete Choice Analysis: Theory and Application to Travel Demand (1985), The MIT Press
[5] Berk, R., Statistical Learning from a Regression Perspective (2008), Springer · Zbl 1258.62047
[6] Busser, B., Daelemans, W., Bosch, A., 1999. Machine learning of word pronunciation: the case against abstraction. In: Proceedings of the Sixth European Conference on Speech Communication and Technology, Eurospeech99, pp. 2123-2126.; Busser, B., Daelemans, W., Bosch, A., 1999. Machine learning of word pronunciation: the case against abstraction. In: Proceedings of the Sixth European Conference on Speech Communication and Technology, Eurospeech99, pp. 2123-2126.
[7] Cameron, A. C.; Trivedi, P. K., Microeconometrics: Methods and Applications (2005), Cambridge University Press · Zbl 1156.62092
[8] Canu, S., Smola, A.J., 2005. Kernel methods and the exponential family. In: ESANN, pp. 447-454.; Canu, S., Smola, A.J., 2005. Kernel methods and the exponential family. In: ESANN, pp. 447-454.
[9] Chang, C.-C., Lin, C.-J., 2001. LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm; Chang, C.-C., Lin, C.-J., 2001. LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm
[10] Chan, P. K.; Stolfo, S. J., Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection, (Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (1998), AAAI Press), 164-168
[11] Cowan, G., Statistical Data Analysis (1998), Oxford University Press
[12] Cramer, J. S., Logit Models from Economics and Other Fields (2003), Cambridge University Press · Zbl 1027.62057
[13] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines and other kernel-based learning methods (2000), Cambridge University Press
[14] Drummond, C., Holte, R.C., 2003. C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling, pp. 1-8.; Drummond, C., Holte, R.C., 2003. C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling, pp. 1-8.
[15] Eeckhaut, M. V.D.; Vanwalleghem, T.; Poesen, J.; Govers, G.; Verstraeten, G.; Vandekerckhove, L., Prediction of landslide susceptibility using rare events logistic regression: a case-study in the Flemish Ardennes (Belgium), Geomorphology, 76, 3-4, 392-410 (2006)
[16] Efron, B.; Tibshirani, R. J., An Introduction to the Bootstrap (1994), Chapman & Hall/CRC
[17] Ferraty, F.; Vieu, P., Nonparametric Functional Data Analysis: Theory and Practice (2006), Springer · Zbl 1119.62046
[18] Gorman, R. P.; Sejnowski, T. J., Analysis of hidden units in a layered network trained to classify sonar targets, Neural Networks, 1, 75-89 (1988)
[19] Haberman, S.J., 1976. Generalized residuals for log-linear models. In: Proceedings of the 9th International Biometrics Conference, Boston.; Haberman, S.J., 1976. Generalized residuals for log-linear models. In: Proceedings of the 9th International Biometrics Conference, Boston.
[20] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2001), Springer-Verlag · Zbl 0973.62007
[21] Heinze, G.; Schemper, M., A solution to the problem of monotone likelihood in cox regression, Biometrics, 57, 114-119 (2001) · Zbl 1209.62024
[22] Hosmer, D. W.; Lemeshow, S., Applied Logistic Regression (2000), Wiley · Zbl 0967.62045
[23] Imbens, G. W.; Lancaster, T., Efficient estimation and stratified sampling, Journal of Econometrics, 74, 289-318 (1996) · Zbl 0864.62084
[24] Jaakkola, T.; Haussler, D., Probabilistic Kernel Regression Models (1999)
[25] Japkowicz, N., 2000. Learning from imbalanced data sets: a comparison of various strategies, Tech. Rep. University of DalTech/Dalhousie.; Japkowicz, N., 2000. Learning from imbalanced data sets: a comparison of various strategies, Tech. Rep. University of DalTech/Dalhousie.
[26] Jensen, D.; Cohen, P. R., Multiple comparison in induction algorithms, Machine Learning, 38, 309-338 (2000) · Zbl 0954.68083
[27] Karsmakers, P., Pelckmans, K., Suykens, J.A.K., 2007. Multi-class kernel logistic regression: a fixed-size implementation. In: International Joint Conference on Neural Networks. pp. 1756-1761.; Karsmakers, P., Pelckmans, K., Suykens, J.A.K., 2007. Multi-class kernel logistic regression: a fixed-size implementation. In: International Joint Conference on Neural Networks. pp. 1756-1761.
[28] Keele, L. J., Semiparametric Regression for the Social Sciences (2008), Wiley · Zbl 1144.62109
[29] King, G.; Zeng, L., Explaining rare events in international relations, International Organization, 55, 3, 693-715 (2001)
[30] King, G.; Zeng, L., Improving forecast of state failure, World Politics, 53, 4, 623-658 (2001)
[31] King, G.; Zeng, L., Logistic regression in rare events data, Political Analysis, 9, 137-163 (2001)
[32] Komarek, P., 2004. Logistic regression for data mining and high-dimensional classification. Ph.D. Thesis. Carnegie Mellon University.; Komarek, P., 2004. Logistic regression for data mining and high-dimensional classification. Ph.D. Thesis. Carnegie Mellon University.
[33] Komarek, P., Moore, A., 2005. Making logistic regression a core data mining tool: a practical investigation of accuracy, speed, and simplicity. Tech. Rep. Carnegie Mellon University.; Komarek, P., Moore, A., 2005. Making logistic regression a core data mining tool: a practical investigation of accuracy, speed, and simplicity. Tech. Rep. Carnegie Mellon University.
[34] Kubat, M.; Holte, R. C.; Matwin, S., Machine learning for the detection of oil spills in satellite radar images, (Machine Learning (1998)), 195-215
[35] Kurgan, L. A.; Cios, K. J.; Tadeusiewicz, R.; Ogiela, M.; Goodenday, L. S., Knowledge discovery approach to automated cardiac spect diagnosis, Artificial Intelligence in Medicine, 32, 2, 149-169 (2001)
[36] Lakshmanan, V.; Stumpf, G.; Witts, A., A neural network for detecting and diagnosing tornadic circulations using the mesocyclone detection and near storm environment algorithms, (21st International Conference on Information Processing Systems (2005), American Meteorological Society: American Meteorological Society San Diego, CA), CD-ROM, J52.2
[37] Lewis, J. M.; Lakshmivarahan, S.; Dhall, S., Dynamic Data Assimilation: A Least Squares Approach (2006), Cambridge University Press · Zbl 1268.62003
[38] Lin, C.-J.; Weng, R. C.; Keerthi, S. S., Trust region Newton methods for large-scale logistic regression, (ICML ’07: Proceedings of the 24th International Conference on Machine Learning. ICML ’07: Proceedings of the 24th International Conference on Machine Learning, Corvalis, Oregon (2007), ACM: ACM New York, NY, USA), 561-568
[39] Maalouf, M.; Trafalis, T. B., Kernel logistic regression using truncated Newton method, (Dagli, C. H.; Enke, D. L.; Bryden, K. M.; Ceylan, H.; Gen, M., Intelligent Engineering Systems Through Artificial Neural Networks, Vol. 18 (2008), ASME Press: ASME Press New York, NY, USA), 455-462
[40] (Maimon, O.; Rokach, L., Data Mining and Knowledge Discovery Handbook (2005), Springer) · Zbl 1087.68029
[41] Maiti, T.; Pradhan, V., A comparative study of the bias corrected estimates in logistic regression, Statistical Methods in Medical Research, 17, 6, 621-634 (2008)
[42] Maloof, M.A., 2003. Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II.; Maloof, M.A., 2003. Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 Workshop on Learning from Imbalanced Data Sets II.
[43] Malouf, R., 2002. A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of Conference on Natural Language Learning, vol. 6.; Malouf, R., 2002. A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of Conference on Natural Language Learning, vol. 6.
[44] Manski, C. F.; Lerman, S. R., The estimation of choice probabilities from choice based samples, Econometrica, 45, 8, 1977-1988 (1977) · Zbl 0372.62094
[45] McCullagh, P.; Nelder, J., (Generalized Linear Model (1989), Chapman and Hall/CRC) · Zbl 0744.62098
[46] (Milgate, M.; Eatwell, J.; Newman, P. K., Econometrics (1990), W. W. Norton & Company)
[47] Minka, T.P., 2003. A comparison of numerical optimizers for logistic regression. Tech. Rep. Department of Statistics, Carnegie Mellon University.; Minka, T.P., 2003. A comparison of numerical optimizers for logistic regression. Tech. Rep. Department of Statistics, Carnegie Mellon University.
[48] Park, M. Y.; Hastie, T., Penalized logistic regression for detecting gene interactions, Biostatistics, 9, 1, 30-50 (2008) · Zbl 1274.62853
[49] Prati, R.C., Batista, G.E.A.P.A., Monard, M.C., 2004. Learning with class skews and small disjuncts. In: SBIA, pp. 296-306.; Prati, R.C., Batista, G.E.A.P.A., Monard, M.C., 2004. Learning with class skews and small disjuncts. In: SBIA, pp. 296-306. · Zbl 1105.68391
[50] Quigley, J.; Bedford, T.; Walls, L., Estimating rate of occurrence of rare events with empirical Bayes: a railway application, Reliability Engineering & System Safety, 92, 5, 619-627 (2007)
[51] Ramsay, J.; Silverman, B. W., Functional Data Analysis (2005), Springer · Zbl 1079.62006
[52] Seiffert, C.; Khoshgoftaar, T. M.; Hulse, J. V.; Napolitano, A., Mining data with rare events: a case study, (ICTAI ’07: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007, vol. 2 (2007), IEEE Computer Society: IEEE Computer Society Washington, DC, USA), 132-139
[53] Shawe-Taylor, J.; Cristianini, N., Kernel Methods for Pattern Analysis (2004), Cambridge University Press
[54] Sigillito, V. G.; Wing, S. P.; Hutton, L. V.; Baker, K. B., Classification of radar returns from the ionosphere using neural networks, Johns Hopkins APL Technical Digest, 10, 262-266 (1989)
[55] Smith, J. W.; Everhart, J. E.; Dickson, W. C.; Johannes, R. S., Using the adap learning algorithm to forecast the onset of diabetes mellitus, (Proceedings of the Symposium on Computer Applications and Medical Care (1988), IEEE Computer Society Press)
[56] Trafalis, T.B., Ince, H., Richman, M.B., 2003. Tornado detection with support vector machines. International Conference on Computational Science. pp. 289-298.; Trafalis, T.B., Ince, H., Richman, M.B., 2003. Tornado detection with support vector machines. International Conference on Computational Science. pp. 289-298.
[57] Tsoucas, P., Rare events in series of queues, Journal of Applied Probability, 29, 168-175 (1992) · Zbl 0765.60100
[58] Van-Hulse, J.; Khoshgoftaar, T. M.; Napolitano, A., Experimental perspectives on learning from imbalanced data, (ICML ’07: Proceedings of the 24th International Conference on Machine Learning (2007), ACM: ACM New York, NY, USA), 935-942
[59] Vapnik, V., The Nature of Statistical Learning (1995), Springer: Springer NY · Zbl 0833.62008
[60] Wang, S.; Wang, T., Precision of warm’s weighted likelihood for a polytomous model in computerized adaptive testing, Applied Psychological Measurement, 25, 4, 317-331 (2001)
[61] Wang, Y., Witten, I.H., 2002. Modeling for optimal probability prediction. In: ICML, pp. 650-657.; Wang, Y., Witten, I.H., 2002. Modeling for optimal probability prediction. In: ICML, pp. 650-657.
[62] Weiss, G. M., Mining with rarity: a unifying framework, SIGKDD Explorations Newsletter, 6, 1, 7-19 (2004)
[63] Weiss, G. M.; Hirsh, H., Learning to predict extremely rare events, (AAAI Workshop on Learning from Imbalanced Data Sets (2000), AAAI Press), 64-68
[64] Xie, Y.; Manski, C. F., The logit model and response-based samples, Sociological Methods & Research, 17, 283-302 (1989)
[65] Zadrozny, B., Learning and evaluating classifiers under sample selection bias, (ICML ’04: Proceedings of the Twenty-First International Conference on Machine learning (2004), ACM: ACM New York, NY, USA), 114
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.