×

Robust supervised classification with mixture models: learning from data with uncertain labels. (English) Zbl 1175.68313

Summary: In the supervised classification framework, human supervision is required for labeling a set of learning data which are then used for building the classifier. However, in many applications, human supervision is either imprecise, difficult or expensive. In this paper, the problem of learning a supervised multi-class classifier from data with uncertain labels is considered and a model-based classification method is proposed to solve it. The idea of the proposed method is to confront an unsupervised modeling of the data with the supervised information carried by the labels of the learning data in order to detect inconsistencies. The method is able afterward to build a robust classifier taking into account the detected inconsistencies into the labels. Experiments on artificial and real data are provided to highlight the main features of the proposed method as well as an application to object recognition under weak supervision.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition

Software:

SIFT

References:

[1] Banfield, J.; Raftery, A., Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821 (1993) · Zbl 0794.62034
[2] Bashir, S.; Carter, E., High breakdown mixture discriminant analysis, Journal of Multivariate Analysis, 93, 1, 102-111 (2005) · Zbl 1087.62076
[3] Bellman, R., Dynamic Programming (1957), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 0077.13605
[4] Bensmail, H.; Celeux, G., Regularized Gaussian discriminant analysis through eigenvalue decomposition, Journal of the American Statistical Association, 91, 1743-1748 (1996) · Zbl 0885.62068
[5] Bouveyron, C.; Girard, S.; Schmid, C., High-dimensional data clustering, Computational Statistics and Data Analysis, 52, 1, 502-519 (2007) · Zbl 1452.62433
[6] Bouveyron, C.; Kannala, J.; Schmid, C.; Girard, S., Object localization by subspace clustering of local descriptors, (5th Indian Conference on Computer Vision, Graphics and Image Processing, India (2006)), 457-467
[7] Brodley, C.; Friedl, M., Identifying mislabeled training data, Journal of Artificial Intelligence Research, 11, 131-167 (1999) · Zbl 0924.68158
[8] Celeux, G.; Govaert, G., Parsimonious Gaussian models in cluster analysis, Pattern Recognition, 28, 781-793 (1995)
[9] d’Alche Buc, F.; Dagan, I.; Quinonero, J., The 2005 Pascal visual object classes challenge, (Proceedings of the 1st PASCAL Challenges Workshop (2006), Springer: Springer Berlin)
[10] Dasarathy, B., Noising around the neighbourhood: a new system structure and classification rule for recognition in partially exposed environments, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 67-71 (1980)
[11] Gamberger, D.; Lavrac, N.; Groselj, C., Experiments with noise filtering in a medical domain, (16th International Conference on Machine Learning, USA (1999)), 143-151
[12] Gates, G., The reduced nearest neighbor rule, IEEE Transactions on Information Theory, 18, 3, 431-433 (1972)
[13] I. Guyon, N. Matic, V. Vapnik, Discovering informative patterns and data cleaning, Advances in Knowledge Discovery and Data Mining (1996) 181-203.; I. Guyon, N. Matic, V. Vapnik, Discovering informative patterns and data cleaning, Advances in Knowledge Discovery and Data Mining (1996) 181-203.
[14] Hastie, T.; Tibshirani, R., Discriminant analysis by Gaussian mixtures, Journal of the Royal Statistical Society B, 58, 155-176 (1996) · Zbl 0850.62476
[15] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2001), Springer: Springer New York · Zbl 0973.62007
[16] Hawkins, D.; McLachlan, G., High-breakdown linear discriminant analysis, Journal of the American Statistical Association, 92, 437, 136-143 (1997) · Zbl 0889.62052
[17] John, G., Robust decision trees: removing outliers from databases, (First conference on Knowledge Discovery and Data Mining (1995)), 174-179
[18] Lawrence, N.; Schölkopf, B., Estimating a kernel Fisher discriminant in the presence of label noise, (Proceedings of 18th International Conference on Machine Learning (2001), Morgan Kaufmann: Morgan Kaufmann San Francisco, CA), 306-313
[19] Li, Y.; Wessels, L.; de Ridder, D.; Reinders, M., Classification in the presence of class noise using a probabilistic kernel Fisher method, Pattern Recognition, 40, 12, 3349-3357 (2007) · Zbl 1123.68363
[20] Lowe, D., Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60, 2, 91-110 (2004)
[21] McLachlan, G., Discriminant Analysis and Statistical Pattern Recognition (1992), Wiley: Wiley New York · Zbl 0850.62481
[22] Mikolajczyk, K.; Schmid, C., Scale and affine invariant interest point detectors, International Journal of Computer Vision, 60, 1, 63-86 (2004)
[23] Mingers, J., An empirical comparison of pruning methods for decision tree induction, Journal of Machine Learning, 4, 2, 227-243 (1989)
[24] Quinlan, J., Bagging, boosting and C4.5, (13th National Conference on Artificial Intelligence, USA (1996)), 725-730
[25] Rousseeuw, P. J.; Leroy, A., Robust Regression and Outlier Detection (1987), Wiley: Wiley New York · Zbl 0711.62030
[26] Sakakibara, Y., Noise-tolerant occam algorithms and their applications to learning decision trees, Journal of Machine Learning, 11, 1, 37-62 (1993) · Zbl 0770.68100
[27] Schapire, R., The strength of weak learnability, Machine Learning, 5, 197-227 (1990)
[28] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 461-464 (1978) · Zbl 0379.62005
[29] Vannoorenbergue, P.; Denoeux, T., Handling uncertain labels in multiclass problems using belief decision trees, (Proceedings of IPMU’2002 (2002))
[30] Wilson, D.; Martinez, T., Instance pruning techniques, (14th International Conference on Machine Learning, USA (1997)), 404-411
[31] Zeng, X.; Martinez, T., A noise filtering method using neural networks, (IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications (2003)), 26-31
[32] Zhu, X.; Wu, X.; Chen, Q., Eliminating class noise in large datasets, (20th ICML International Conference on Machine Learning, USA (2003)), 920-927
[33] Dempster, A.; Laird, N.; Rubin, D., Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39, 1, 1-38 (1977) · Zbl 0364.62022
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.