×

Quadratic convergence of smoothing Newton’s method for 0/1 loss optimization. (English) Zbl 1483.90126

Summary: It has been widely recognized that the 0/1-loss function is one of the most natural choices for modelling classification errors, and it has a wide range of applications including support vector machines and 1-bit compressed sensing. Due to the combinatorial nature of the 0/1 loss function, methods based on convex relaxations or smoothing approximations have dominated the existing research and are often able to provide approximate solutions of good quality. However, those methods are not optimizing the 0/1 loss function directly and hence no optimality has been established for the original problem. This paper aims to study the optimality conditions of the 0/1 function minimization and for the first time to develop Newton’s method that directly optimizes the 0/1 function with a local quadratic convergence under reasonable conditions. Extensive numerical experiments demonstrate its superior performance as one would expect from Newton-type methods.

MSC:

90C26 Nonconvex programming, global optimization
90C30 Nonlinear programming
65K05 Numerical mathematical programming methods

References:

[1] S. M. Bajgier and A. V. Hill, An experimental comparison of statistical and linear programming approaches to the discriminant problem, Decis. Sci., 13 (1982), pp. 604-618.
[2] A. Beck and Y. C. Eldar, Sparsity constrained nonlinear optimization: Optimality conditions and algorithms, SIAM J. Optim., 23 (2013), pp. 1480-1509. · Zbl 1295.90051
[3] A. Beck and N. Hallak, On the minimization over sparse symmetric sets: Projections, optimality conditions, and algorithms, Math. Oper. Res., 41 (2016), pp. 196-223. · Zbl 1334.90129
[4] S. Ben-David, N. Eiron, and P. M. Long, On the difficulty of approximately maximizing agreements, J. Comput. System Sci., 66 (2003), pp. 496-514. · Zbl 1053.68054
[5] P. T. Boufounos and R. G. Baraniuk, 1-bit compressive sensing, in Proceedings of the 42nd Annual Conference on Information Sciences and Systems, IEEE, 2008, pp. 16-21.
[6] J. P. Brooks, Support vector machines with the ramp loss and the hard margin loss, Oper. Res., 59 (2011), pp. 467-479. · Zbl 1228.90057
[7] J. P. Brooks and E. K. Lee, Analysis of the consistency of a mixed integer programming-based multi-category constrained discriminant model, Ann Oper. Res., 174 (2010), pp. 147-168. · Zbl 1185.90156
[8] E. Carrizosa, B. Martin-Barragan, and D. R. Morales, Binarized support vector machines, INFORMS J. Comput., 22 (2010), pp. 154-167. · Zbl 1243.62088
[9] C. C. Chang and C. J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2 (2011), pp. 1-27.
[10] X. Chen, L. Qi, and D. Sun, Global and superlinear convergence of the smoothing Newton method and its application to general box constrained variational inequalities, Math. Comp., 67 (1998), pp. 519-540. · Zbl 0894.90143
[11] C. Cortes and V. Vapnik, Support-vector networks, Mach. Learn., 20 (1995), pp. 273-297. · Zbl 0831.68098
[12] D. Dai, L. Shen, Y. Xu, and N. Zhang, Noisy 1-bit compressive sensing: Models and algorithms, Appl. Comput. Harmon. Anal., 40 (2016), pp. 1-32. · Zbl 1331.65085
[13] T. Evgeniou, M. Pontil, and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math., 13 (2000), pp. 1-50. · Zbl 0939.68098
[14] Q. Fan, Y. Jiao, and X. Lu, A primal dual active set algorithm with continuation for compressed sensing, IEEE Trans. Signal Process., 62 (2014), pp. 6276-6285. · Zbl 1394.94175
[15] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, LIBLINEAR: A library for large linear classification, J. Mach. Learn. Res., 9 (2008), pp. 1871-1874. · Zbl 1225.68175
[16] V. Feldman, V. Guruswami, P. Raghavendra, and Y. Wu, Agnostic learning of monomials by halfspaces is hard, SIAM J. Comput., 41 (2012), pp. 1558-1590. · Zbl 1261.68063
[17] J. H. Friedman, On bias, variance, 0/1 loss, and the curse-of-dimensionality, Data Min. Knowl. Discov., 1 (1997), pp. 55-77.
[18] A. K. Han, Non-parametric analysis of a generalized regression model: The maximum rank correlation estimator, J. Econometrics, 35 (1987), pp. 303-316. · Zbl 0638.62063
[19] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, 2009. · Zbl 1273.62005
[20] M. Hintermüller, K. Ito, and K. Kunisch, The primal-dual active set strategy as a semismooth Newton method, SIAM J. Optim., 13 (2002), pp. 865-888. · Zbl 1080.90074
[21] J. Huang, Y. Jiao, X. Lu, and L. Zhu, Robust decoding from 1-bit compressive sampling with ordinary and regularized least squares, SIAM J. Sci. Comput., 40 (2018), pp. A2062-A2086. · Zbl 1395.49033
[22] X. Huang, L. Shi, M. Yan, and J. A. Suykens, Pinball loss minimization for one-bit compressive sensing: Convex models and algorithms, Neurocomputing, 314 (2018), pp. 275-283.
[23] Z. Huang, D. Sun, and G. Zhao, A smoothing Newton-type algorithm of stronger convergence for the quadratically constrained convex quadratic programming, Comput. Optim. Appl., 35 (2006), pp. 199-237. · Zbl 1151.90509
[24] K. Ito and K. Kunisch, Semi-smooth Newton methods for state-constrained optimal control problems, Systems Control Lett., 50 (2003), pp. 221-228. · Zbl 1157.49311
[25] L. Jacques, J. N. Laska, P. T. Boufounos, and R. G. Baraniuk, Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors, IEEE Trans. Inform. Theory, 59 (2013), pp. 2082-2102. · Zbl 1364.94130
[26] M. Lai, Y. Xu, and W. Yin, Improved iteratively reweighted least squares for unconstrained smoothed \(\ell_q\) minimization, SIAM J. Numer. Anal., 51 (2013), pp. 927-957. · Zbl 1268.49038
[27] L. Li and H.-T. Lin, Optimizing 0/1 loss for perceptrons by random coordinate descent, in Proceedings of the International Joint Conference on Neural Networks, IEEE, 2007, pp. 749-754.
[28] J. Liittschwager and C. Wang, Integer programming solution of a classification problem, Manag. Sci., 24 (1978), pp. 1515-1525. · Zbl 0491.90056
[29] Z. Lu, Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method, preprint, arXiv:1509.08581, 2015.
[30] H. Lütkepohl, Handbook of Matrices, Vol. 1, Wiley, Chichester, 1996. · Zbl 0856.15001
[31] S. Ma and J. Huang, Regularized ROC method for disease classification and biomarker selection with microarray data, Bioinformatics, 21 (2005), pp. 4356-4362.
[32] T. Nguyen and S. Sanner, Algorithms for direct 0-1 loss optimization in binary classification, in Proceedings of the International Conference on Machine Learning, 2013, pp. 1085-1093.
[33] K. Pelckmans, J. Suykens, T. Gestel, J. Brabanter, L. Lukas, B. Hamers, B. Moor, and J. Vandewalle, A MATLAB/C Toolbox for Least Square Support Vector Machines, ESATSCD-SISTA technical report, 2002, pp. 2-145. · Zbl 1017.93004
[34] H.-D. Qi and L. Liao, A smoothing Newton method for general nonlinear complementarity problems, Comput. Optim. Appl., 17 (2000), pp. 231-253. · Zbl 0987.90078
[35] R. T. Rockafellar and R. J. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer, New York, 2009.
[36] P. A. Rubin, Solving mixed integer classification problems by decomposition, Ann. Oper. Res., 74 (1997), pp. 51-64. · Zbl 0888.90160
[37] J. A. Suykens and J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett., 9 (1999), pp. 293-300.
[38] Y. Tang, X. Li, Y. Xu, S. Liu, and S. Ouyang, A mixed integer programming approach to maximum margin 0-1 loss classification, in Proceedings of the International Radar Conference, IEEE, 2014, pp. 1-6.
[39] B. Ustun and C. Rudin, Supersparse linear integer models for optimized medical scoring systems, Mach. Learn., 102 (2016), pp. 349-391. · Zbl 1406.62144
[40] H. Wang, Y. Shao, S. Zhou, C. Zhang, and N. Xiu, Support vector machine classifier via \(l_{0/1}\) soft-margin loss, IEEE Trans. Pattern Anal. Mach. Intell., 2021.
[41] E. W. Weisstein, Heaviside Step Function, https://mathworld.wolfram.com/, 2002.
[42] Y. Wu and Y. Liu, Robust truncated hinge loss support vector machines, J. Amer. Statist. Assoc., 102 (2007), pp. 974-983. · Zbl 1469.62293
[43] M. Xie, Y. Xue, and U. Roshan, Stochastic coordinate descent for 01 loss and its sensitivity to adversarial attacks, in Proceedings of the 18th IEEE International Conference on Machine Learning and Applications, IEEE, 2019, pp. 299-304.
[44] M. Yan, Y. Yang, and S. Osher, Robust \(1\)-bit compressive sensing using adaptive outlier pursuit, IEEE Trans. Signal Process., 60 (2012), pp. 3868-3875. · Zbl 1393.94723
[45] S. Zhai, T. Xia, M. Tan, and S. Wang, Direct \(0-1\) Loss Minimization and Margin Maximization with Boosting, in Proceedings of Advances in Neural Information Processing Systems, 2013, pp. 872-880.
[46] S. Zhou, N. Xiu, and H.-D. Qi, Global and quadratic convergence of Newton hard-thresholding pursuit, J. Mach. Learn. Res., 22 (2021), pp. 1-45. · Zbl 1539.65071
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.