×

Regularized margin-based conditional log-likelihood loss for prototype learning. (English) Zbl 1202.68348

Summary: The classification performance of nearest prototype classifiers largely relies on the prototype learning algorithm. The Minimum Classification Error (MCE) method and the soft nearest prototype classifier method are two important algorithms using misclassification loss. This paper proposes a new prototype learning algorithm based on the Conditional Log-likelihood Loss (CLL), which is based on the discriminative model called LOG-likelihood of Margin (LOGM). A regularization term is added to avoid over-fitting in training as well as to maximize the hypothesis margin. The CLL in the LOGM algorithm is a convex function of margin, and so, shows better convergence than the MCE. In addition, we show the effects of distance metric learning with both prototype-dependent weighting and prototype-independent weighting. Our empirical study on the benchmark datasets demonstrates that the LOGM algorithm yields higher classification accuracies than the MCE, generalized learning vector quantization, soft nearest prototype classifier and the robust soft learning vector quantization, and moreover, the LOGM with prototype-dependent weighting achieves comparable accuracies to the support vector machine classifier.

MSC:

68T10 Pattern recognition, speech recognition
68T05 Learning and adaptive systems in artificial intelligence

Software:

PRMLT; UCI-ml; LVQ_PAK; Torch
Full Text: DOI

References:

[1] Duda, R. O.; Hart, P. E.; Stork, D. G., Pattern Classification (2001), Wiley Interscience: Wiley Interscience New York · Zbl 0968.68140
[2] Chang, C.-L., Finding prototypes for nearest neighbor classifiers, IEEE Trans. Comput., 23, 11, 1179-1184 (1974) · Zbl 0292.68044
[3] Kohonen, T., Improved versions of learning vector quantization, Neural Networks, 1, 17-21, 545-550 (1990)
[4] A. Sato, K. Yamada, Generalized learning vector quantization, in: Advances in Neural Information Processing Systems, 1995, pp. 423-429.; A. Sato, K. Yamada, Generalized learning vector quantization, in: Advances in Neural Information Processing Systems, 1995, pp. 423-429.
[5] Bezdek, J.; Reichherzer, T.; Lim, G.; Attikiouzel, Y., Multiple-prototype classifier design, IEEE Trans. System Man Cybernet. Part C, 28, 1, 67-79 (1998)
[6] Kuncheva, L.; Bezdek, J., Nearest prototype classification: clustering, genetic algorithms, or random search?, IEEE Trans. System Man Cybernet. Part C, 28, 1, 160-164 (1998)
[7] Liu, C.-L.; Nakagawa, M., Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition, Pattern Recognition, 34, 3, 601-615 (2001) · Zbl 0991.68071
[8] Martín-Valdivia, M. T.; Vega, M. G.; López, L. A.U., LVQ for text categorization using a multilingual linguistic resource, Neurocomputing, 55, 3-4, 665-679 (2003)
[9] P. Schneider, M. Biehl, B. Hammer, Advanced metric adaptation in generalized LVQ for classification of mass spectrometry data, in: Proceeding of the Workshop on Self Organizing Maps, 2007.; P. Schneider, M. Biehl, B. Hammer, Advanced metric adaptation in generalized LVQ for classification of mass spectrometry data, in: Proceeding of the Workshop on Self Organizing Maps, 2007.
[10] T. Kohonen, J. Hynninen, J. Kangas, J. Laaksonen, K. Torkkola, LVQ PAK: the learning vector quantization program package, Technical Report, Helsinki University of Technology, 1995.; T. Kohonen, J. Hynninen, J. Kangas, J. Laaksonen, K. Torkkola, LVQ PAK: the learning vector quantization program package, Technical Report, Helsinki University of Technology, 1995.
[11] K. Crammer, R. Gilad-Bachrach, A. Navot, N. Tishby, Margin analysis of the LVQ algorithm, in: Advances in Neural Information Processing Systems, 2002, pp. 462-469.; K. Crammer, R. Gilad-Bachrach, A. Navot, N. Tishby, Margin analysis of the LVQ algorithm, in: Advances in Neural Information Processing Systems, 2002, pp. 462-469.
[12] Qin, A.; Suganthan, P., Initialization insensitive LVQ algorithm based on cost-function adaptation, Pattern Recognition, 38, 5, 773-776 (2005) · Zbl 1074.68591
[13] Hammer, B.; Villmann, T., Generalized relevance learning vector quantization, Neural Network, 15, 8-9, 1059-1068 (2002)
[14] Paredes, R.; Vidal, E., Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization, Pattern Recogn., 39, 2, 180-188 (2006) · Zbl 1080.68647
[15] Pedreira, C. E., Learning vector quantization with training data selection, IEEE Trans. Pattern Anal. Mach. Intell., 28, 1, 157-162 (2006)
[16] Juang, B.-H.; Katagiri, S., Discriminative learning for minimum error classification, IEEE Trans. Signal Processing, 40, 12, 3043-3054 (1992) · Zbl 0784.62049
[17] Seo, S.; Bode, M.; Obermayer, K., Soft nearest prototype classification, IEEE Trans. Neural Networks, 14, 2, 390-398 (2003)
[18] Seo, S.; Obermayer, K., Soft learning vector quantization, Neural Comput., 15, 7, 1589-1604 (2003) · Zbl 1085.68135
[19] Bartlett, P. L.; Jordan, M. I.; McAuliffe, J. D., Convexity, classification, and risk bounds, J. Am. Statist. Assoc., 101, 473, 138-156 (2006) · Zbl 1118.62330
[20] X. Jin, C.-L. Liu, X. Hou, Prototype learning by margin-based conditional log-likelihood loss, In: Proceedings of the 19th ICPR, Tampa, USA, 2008.; X. Jin, C.-L. Liu, X. Hou, Prototype learning by margin-based conditional log-likelihood loss, In: Proceedings of the 19th ICPR, Tampa, USA, 2008. · Zbl 1202.68348
[21] Bishop, C. M., Pattern Recognition and Machine Learning (2006), Springer: Springer New York · Zbl 1107.68072
[22] Ney, H., On the probabilistic interpretation of neural network classifiers and discriminative training criteria, IEEE Trans. Pattern Anal. Mach. Intell., 17, 2, 107-119 (1995)
[23] D. Grossman, P. Domingos, Learning bayesian network classifiers by maximizing conditional likelihood, in: Proceedings of the 21th ICML, 2004, pp. 361-378.; D. Grossman, P. Domingos, Learning bayesian network classifiers by maximizing conditional likelihood, in: Proceedings of the 21th ICML, 2004, pp. 361-378.
[24] Liu, C.-L.; Sako, H.; Fujisawa, H., Discriminative learning quadratic discriminant function for handwriting recognition, IEEE Trans. Neural Networks, 15, 2, 430-444 (2004)
[25] Baras, J. S.; LaVigna, A., Convergence of Kohonen’s learning vector quantization, Neural Networks, 3, 17-21, 17-20 (1990)
[26] Kosmatopoulos, B.; Christodoulou, M. A., Convergence properties of a class of learning vector quantization algorithms, IEEE Trans. Image Processing, 5, 2, 361-368 (1996)
[27] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Statist., 22, 400-407 (1951) · Zbl 0054.05901
[28] L. Bottou, On-line learning and stochastic approximations, in: On-line Learning in Neural Networks, Cambridge University Press, Cambridge, 1999, pp. 9-42.; L. Bottou, On-line learning and stochastic approximations, in: On-line Learning in Neural Networks, Cambridge University Press, Cambridge, 1999, pp. 9-42. · Zbl 0968.68127
[29] Vapnik, V. N., The Nature of Statistical Learning Theory (1995), Springer: Springer New York · Zbl 0833.62008
[30] T. Joachims, Making large-scale support vector machine learning practical, in: Advances in Kernel Methods: Support Vector Learning, The MIT Press, 1999, pp. 169-184.; T. Joachims, Making large-scale support vector machine learning practical, in: Advances in Kernel Methods: Support Vector Learning, The MIT Press, 1999, pp. 169-184.
[31] C. Blake, C. Merz, UCI machine learning repository, University of California Irvine \(\langle\) http://www.ics.uci.edu/mlearn/MLRepository.html \(\rangle \); C. Blake, C. Merz, UCI machine learning repository, University of California Irvine \(\langle\) http://www.ics.uci.edu/mlearn/MLRepository.html \(\rangle \)
[32] K. Lang, Newsweeder: learning to filter netnews, in: Proceedings of the 12th ICML, 1995, pp. 331-339.; K. Lang, Newsweeder: learning to filter netnews, in: Proceedings of the 12th ICML, 1995, pp. 331-339.
[33] Porter, M., An algorithm for suffix stripping, Program, 14, 3, 130-137 (1980)
[34] Sebastiani, F., Machine learning in automated text categorization, ACM Comput. Surv., 34, 1, 1-47 (2002)
[35] Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in: Proceedings of the 14th ICML, 1997, pp. 412-420.; Y. Yang, J.O. Pedersen, A comparative study on feature selection in text categorization, in: Proceedings of the 14th ICML, 1997, pp. 412-420.
[36] Friedman, N.; Geiger, D.; Goldszmidt, M., Bayesian network classifiers, Mach. Learning, 29, 2-3, 131-163 (1997) · Zbl 0892.68077
[37] Wang, W.; Xu, Z.; Lu, W.; Zhang, X., Determination of the spread parameter in the Gaussian kernel for classification and regression, Neurocomputing, 55, 3-4, 643-663 (2003)
[38] R. Collobert, S. Bengio, J. Mariéthoz, Torch: a modular machine learning software library, Technical Report IDIAP-RR 02-46, IDIAP, 2002.; R. Collobert, S. Bengio, J. Mariéthoz, Torch: a modular machine learning software library, Technical Report IDIAP-RR 02-46, IDIAP, 2002.
[39] Demšar, J., Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30 (2006) · Zbl 1222.68184
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.