Document Zbl 1386.68147

Zaidi, Nayyar A.; Webb, Geoffrey I.; Carman, Mark J.; Petitjean, François; Cerquides, Jesús

\(\text{ALR}^n\): accelerated higher-order logistic regression. (English) Zbl 1386.68147

Mach. Learn. 104, No. 2-3, 151-194 (2016).

Summary: This paper introduces Accelerated Logistic Regression: a hybrid generative-discriminative approach to training Logistic Regression with high-order features. We present two main results: (1) that our combined generative-discriminative approach significantly improves the efficiency of Logistic Regression and (2) that incorporating higher order features (i.e. features that are the Cartesian products of the original features) reduces the bias of Logistic Regression, which in turn significantly reduces its error on large datasets. We assess the efficacy of Accelerated Logistic Regression by conducting an extensive set of experiments on 75 standard datasets. We demonstrate its competitiveness, particularly on large datasets, by comparing against state-of-the-art classifiers including Random Forest and Averaged \(n\)-Dependence Estimators.

Cited in 3 Documents

MSC:

68T05	Learning and adaptive systems in artificial intelligence
62H30	Classification and discrimination; cluster analysis (statistical aspects)
62J02	General nonlinear regression

Keywords:

higher-order logistic regression; low-bias classifiers; generative-discriminative learning

Software:

COFFIN; UCI-ml; gss; LBFGS-B; PRMLT; GitHub; Vowpal Wabbit

Cite Review PDF

Full Text: DOI

References:

[1]	Bishop, C. (2006). Pattern recognition and machine learning. Berlin: Springer. · Zbl 1107.68072
[2]	Boyd, S., & Vandenberghe, L. (2008). Convex optimization. Cambridge: Cambridge Unversity Press. · Zbl 1058.90049
[3]	Brain, D., & Webb, G. I. (2002). The need for low bias algorithms in classification learning from small data sets. In: PKDD, pp. 62-73. · Zbl 1020.68555
[4]	Breiman, L. (2001). Random forests. Machine Learning, 45, 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[5]	Byrd, R., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16(5), 1190-1208. · Zbl 0836.65080 · doi:10.1137/0916069
[6]	Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 87-102. · Zbl 0767.68084
[7]	Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80, 27-38. · Zbl 0769.62021 · doi:10.1093/biomet/80.1.27
[8]	Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml.
[9]	Ganz, J., & Reinsel, D. (2012). The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. Framingham: International Data Corporation. https://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf.
[10]	Genkin, A., Lewis, D., & Madigan, M. (2012). Large-scale bayesian logistic regression for text categorization. Technometrics, 49, 291-304. · doi:10.1198/004017007000000245
[11]	Greiner, R., Su, X., Shen, B., & Zhou, W. (2005). Structural extensions to logistic regression: Discriminative parameter learning of belief net classifiers. Machine Learning, 59, 297-322. · Zbl 1101.68759
[12]	Greiner, R., & Zhou, W. (2002). Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. In Eighteenth Annual National Conference on Artificial Intelligence (AAAI), pp. 167-173.
[13]	Hauck, W., Anderson, S., & Marcus, S. (1998). Should we adjust for covariates in nonlinear regression analysis of randomised trials? Controlled Clinical Trials, 19, 249-256. · doi:10.1016/S0197-2456(97)00147-5
[14]	Hill, T., & Lewicki, P. (2013). Statistics: Methods and applications. Dell
[15]	Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In ICML, pp. 275-283.
[16]	Langford, J., Li, L., & Strehl, A. (2007). Vowpal wabbit online learning project. https://github.com/JohnLangford/vowpal_wabbit/wiki
[17]	Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press. · Zbl 0907.62001
[18]	Lin, X., Wahba, G., Xiang, D., Gao, F., Klein, R., & Klein, B. (1998). Smoothing spline anova models for large data sets with bernoulli observations and the randomized gacv. Tech. rep., Technical Report 998, Department of Statistics, University of Wisconsin, Madison WI. · Zbl 1105.62358
[19]	Lint, J. H., & Wilson, M. R. (1992). A course in combinatorics. Cambridge: Cambridge University Press. · Zbl 0769.05001
[20]	Liu, H., & Motoda, H. (1998). Feature extraction, construction and selection: A data mining perspective. Berlin: Springer. · Zbl 0912.00012 · doi:10.1007/978-1-4615-5725-8
[21]	Martinez, A., Chen, S., Webb, G. I., & Zaidi, N. A. (2016). Scalable learning of Bayesian network classifiers. Journal of Machine Learning Research, 17, 1-35. · Zbl 1360.68694
[22]	Mitchell, T. M. (1980). The need for biases in learning generalizations. Technical Report CBM-TR-117, Rutgers University, Department of Computer Science, New Brunswick, NJ.
[23]	Neuhaus, J., & Jewell, N. (1993). A geometric approach to assess bias due to omitted covariates in generalized linear models. Biometrika, 80, 807-815. · Zbl 0800.62428 · doi:10.1093/biomet/80.4.807
[24]	Pazzani, M. J. (1996). Constructive induction of cartesian product attributes. In: Proceedings of the information, statistics and induction in science conference (ISIS96, pp. 66-77)
[25]	Pernkopf, F., & Bilmes, J. (2005). Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In International Conference on Machine Learning, pp. 657-664.
[26]	Pernkopf, F., & Wohlmayr, M. (2009). On discriminative parameter learning of Bayesian network classifiers. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 221-237. · Zbl 1295.68187
[27]	Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., & Tirri, H. (2005). On discriminative Bayesian network classifiers and logistic regression. Machine Learning, 59(3), 267-296. · Zbl 1101.68785
[28]	Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In: Proceedings of the second international conference on knowledge discovery and data mining, pp. 334-338. Menlo Park, CA: AAAI Press.
[29]	Smola, A., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In International Conference on Machine Learning, pp. 911-918.
[30]	Sonnenburg, S., & Franc, V. (2010). COFFIN: A computational framework for linear SVMs. In International Conference on Machine Learning, pp. 999-1006.
[31]	Steinwart, I. (2004). Sparseness of support vector machines—Some asymptotically sharp bounds. In Advances in Neural Information Processing Systems 16. · Zbl 1094.68082
[32]	Stinson, D. R. (2003). Combinatorial designs: Constructions and analysis. Berlin: Springer. · Zbl 1031.05001
[33]	Su, J., Zhang, H., Ling, C., & Matwin, S. (2008). Discriminative parameter learning for Bayesian networks. In International Conference on Machine Learning, pp. 1016-1023.
[34]	Szilard, N., Jonasson, J., Genell, A., & Steineck, G. (2009). Bias in odds ratios by logistic regression modelling and sample size. BMC Medical Research Methodology, 9(1), 1-5. · doi:10.1186/1471-2288-9-1
[35]	Webb, G. I. (2000). Multiboosting: A technique for combining boosting and wagging. Machine Learning, 40(2), 159-196. · doi:10.1023/A:1007659514849
[36]	Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: Averaged one-dependence estimators. Machine Learning, 58(1), 5-24. · Zbl 1075.68078 · doi:10.1007/s10994-005-4258-6
[37]	Webb, G. I., Boughton, J., Zheng, F., Ting, K. M., & Salem, H. (2011). Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification. Machine Learning. doi:10.1007/s10994-011-5263-6. · Zbl 1238.68136
[38]	Williams, C., & Seeger, M. (2001). Using the Nyström method to speed up kernel machines. In Advances in Neural Information Processing Systems 13, pp. 682-688.
[39]	Zaidi, N. A., Carman, M. J., Cerquides, J., & Webb, G. I. (2014). Naive-Bayes inspired effective pre-conditioners for speeding-up logistic regression. In IEEE international conference on data mining, pp. 1097-1102.
[40]	Zaidi, N. A., Cerquides, J., Carman, M. J., & Webb, G. I. (2013). Alleviating naive Bayes attribute independence assumption by attribute weighting. Journal of Machine Learning Research, 14, 1947-1988. · Zbl 1317.68199
[41]	Zaidi, N. A., & Webb, G. I. (2012). Fast and efficient single pass Bayesian learning. In Advances in knowledge discovery and data mining, pp. 149-160.
[42]	Zhu, J., & Hastie, T. (2001). Kernel logistic regression and the import vector machine. In NIPS, pp. 1081-1088.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.