×

V-shaped interval insensitive loss for ordinal classification. (English) Zbl 1383.62175

Summary: We address a problem of learning ordinal classifiers from partially annotated examples. We introduce a V-shaped interval-insensitive loss function to measure discrepancy between predictions of an ordinal classifier and a partial annotation provided in the form of intervals of candidate labels. We show that under reasonable assumptions on the annotation process the Bayes risk of the ordinal classifier can be bounded by the expectation of an associated interval-insensitive loss. We propose several convex surrogates of the interval-insensitive loss which are used to formulate convex learning problems. We described a variant of the cutting plane method which can solve large instances of the learning problems. Experiments on a real-life application of human age estimation show that the ordinal classifier learned from cheap partially annotated examples can achieve accuracy matching the results of the so-far used supervised methods which require expensive precisely annotated examples.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI

References:

[1] Antoniuk, K., Franc, V., & Hlavac, V. (2013). Mord: Multi-class classifier for ordinal regression. In Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases (ECML/PKDD) (pp. 96-111). · Zbl 1222.68321
[2] Boyd, S., & Vandenberghe, L. (2004). Convex optimization. New York, NY: Cambridge University Press. · Zbl 1058.90049 · doi:10.1017/CBO9780511804441
[3] Chang, K., Chen, C., & Hung, Y. (2011). Ordinal hyperplane ranker with cost sensitivities for age estimation. In Proceedings of computer vision and pattern recognition (CVPR).
[4] Chu, W., & Ghahramani, Z. (2005). Preference learning with gaussian processes. In Proceedings of the international conference on machine learning (ICML). · Zbl 1222.68170
[5] Chu, W., & Keerthi, S.S. (2005). New approaches to support vector ordinal regression. In Proceedings of the international conference on machine learning (ICML) (pp. 145-152).
[6] Cour, T., Sapp, B., & Taskar, B. (2011). Learning from partial labels. Journal of Machine Learning Research, 12, 1225-1261. · Zbl 1280.68162
[7] Crammer, K., & Singer, Y. (2001). Pranking with ranking. In Advances in neural information processing systems (NIPS) (pp. 641-647). · Zbl 0483.62056
[8] Dembczyński, K., Kotlowski, W., & Slowinski, R. (2008). Ordinal classification with decision rules. In Mining complex data. Lecture notes in computer science, 4944, 169-181. · Zbl 1280.68162
[9] Dempster, A., Laird, N., & Rubin, D. (1997). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1(39), 1-38. · Zbl 0364.62022
[10] Do, T.-M.-T., & Artières, T. (2009). Large margin training for hidden markov models with partially observed states. In Proceedings of the international conference on Machine Learning (ICML). · Zbl 1222.62079
[11] Franc, V., Sonnenburg, S., & Werner, T. (2012). Cutting-plane methods in machine learning (chapter 7, pp. 185-218). The MIT Press, Cambridge, USA.
[12] Fu, L., & Simpson, D.G. (2002). Conditional risk models for ordinal response data: Simultaneous logistic regression analysis and generalized score test. Journal of Statistical Planning and Inference, 108(1-2), 201-217. · Zbl 1015.62070
[13] Gondzio, J., du Merle, O., Sarkissian, R., & Vial, J.-P. (1996). ACCPM—A library for convex optimization based on an analytic center cutting plane method. European Journal of Operational Research, 94, 206-211.
[14] Guo, G., & Mu, G. (2010). Human age estimation: What is the influence across race and gender? In Proceedings of conference on computer vision and pattern recognition workshops (CVPRW).
[15] Huang, G.B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst. · Zbl 1242.68253
[16] Jie, L., & Orabona, F. (2010). Learning from candidate labeling sets. In Proceedings of advances in neural information processing systems (NIPS).
[17] Kotlowski, W., Dembczynski, K., Greco, S., & Slowinski, R. (2008). Stochastic dominance-based rough set model for ordinal classification. Journal of Information Sciences, 178(21), 4019-4037. · Zbl 1173.68734 · doi:10.1016/j.ins.2008.06.013
[18] Kumar, N., Berg, A. C., Belhumeur, P. N., & Nayar, S. K. (2009). Attribute and simile classifiers for face verification. In Proceedings of international conference on computer vision (ICCV).
[19] Li, L., & Lin, H.-T. (2006). Ordinal regression by extended binary classification. In Proceedings of advances in neural information processing systems (NIPS).
[20] Lou, X., & Hamprecht, F.A. (2012). Structured learning from partial annotations. In Proceedings of the international conference on machine learning (ICML).
[21] McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statical Society, 42(2), 109-142. · Zbl 0483.62056
[22] Minear, M., & Park, D. (2004). A lifespan database of adult facial stimuli. Behavior Research Methods, Instruments, & Computers: A Journal of the Psychonomic Society, 36, 630-633. · doi:10.3758/BF03206543
[23] Ramanathan, N., Chellappa, R., & Biswas, S. (2009). Computational methods for modeling facial aging: Asurvey. Journal of Visual Languages and Computing, 20, 131-144.
[24] Rennie, J. D., & Srebro, N. (2005). Loss functions for preference levels: Regression with discrete ordered labels. In Proceedings of the IJCAI multidisciplinary workshop on advances in preference handling.
[25] Ricanek, K., & Tesafaye, T. (2006). Morph: A longitudial image database of normal adult age-progression. In Proceedings of automated face and gesture recognition. · Zbl 1222.62079
[26] Schlesinger, M. (1968). A connection between learning and self-learning in the pattern recognition (in Russian). Kibernetika, 2, 81-88.
[27] Shashua, A., & Levin, A. (2002). Ranking with large margin principle: Two approaches. In Proceedings of advances in neural information processing systems (NIPS).
[28] Sonnenburg, S., & Franc, V. (2010). Coffin: A computational framework for linear SVMs. In Proceedings of the international conference on machine learning (ICML).
[29] Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., et al. (2010). The shogun machine learning toolbox. Journal of Machine Learning Research, 11, 1799-1802. · Zbl 1242.68003
[30] Teo, C. H., Vishwanthan, S., Smola, A. J., & Le, Q. V. (2010). Bundle methods for regularized risk minimization. Journal of Machine Learning Research, 11, 311-365. · Zbl 1242.68253
[31] Tewari, A., & Bartlett, P. (2007). On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8, 1007-1025. · Zbl 1222.62079
[32] Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y., & Singer, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453-1484. · Zbl 1222.68321
[33] Uřičář, M., Franc, V., & Hlaváč, V. (2012). Detector of facial landmarks learned by the structured output SVM. In Proceedings of the international conference on computer vision theory and applications (VISAPP) (Vol. 1, pp. 547-556).
[34] Vapnik, V. N. (1998). Statistical learning theory. Adaptive and learning systems. New York, New York: Wiley. · Zbl 0935.62007
[35] Zhang, T. (2004). Statistical behaviour and consistency of classification methods based on convex risk minimization. Annals of Statistics, 31(1), 56-134. · Zbl 1105.62323
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.