×

A non-intrusive correction algorithm for classification problems with corrupted data. (English) Zbl 1476.62014

Summary: A novel correction algorithm is proposed for multi-class classification problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction procedure to the model prediction. The correction procedure can be coupled with any approximators, such as logistic regression, neural networks of various architectures, etc. When the training dataset is sufficiently large, we theoretically prove (in the limiting case) and numerically show that the corrected models deliver correct classification results as if there is no corruption in the training data. For datasets of finite size, the corrected models produce significantly better recovery results, compared to the models without the correction algorithm. All of the theoretical findings in the paper are verified by our numerical examples.

MSC:

62-08 Computational methods for problems pertaining to statistics
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68R01 General topics of discrete mathematics in relation to computer science

References:

[1] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/
[2] Brooks, JP, Support vector machines with the ramp loss and the hard margin loss, Oper. Res., 59, 467-479 (2011) · Zbl 1228.90057 · doi:10.1287/opre.1100.0854
[3] Chang, L.-B.: Partial order relations for classification comparisons. The Canadian Journal of Statistics. doi:10.1002/cjs.11524 (2019) · Zbl 1492.62034
[4] Chollet, F., et al.: Keras. https://keras.io (2015)
[5] Frénay, B.; Verleysen, M., Classification in the presence of label noise: a survey, IEEE Trans Neural Netw Learn Syst, 25, 845-869 (2014) · doi:10.1109/TNNLS.2013.2292894
[6] Ghosh, A., Kumar, H., Sastry, P.: Robust loss functions under label noise for deep neural networks. In: Thirty-First AAAI Conference on Artificial Intelligence, arXiv:1712.09482v1 (2017)
[7] Ghosh, A.; Manwani, N.; Sastry, P., Making risk minimization tolerant to label noise, Neurocomputing, 160, 93-107 (2015) · doi:10.1016/j.neucom.2014.09.081
[8] Graves, A., Mohamed, A.-r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645-6649 (2013)
[9] Haber, E.; Ruthotto, L., Stable architectures for deep neural networks, Inverse Probl., 34, 014004 (2017) · Zbl 1426.68236 · doi:10.1088/1361-6420/aa9a90
[10] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778 (2016)
[11] Hendrycks, D., Mazeika, M., Wilson, D., Gimpel, K.: Using trusted data to train deep networks on labels corrupted by severe noise. In: Advances in Neural Information Processing Systems 31, pp. 10477-10486 (2018)
[12] Jiang, L., Zhou, Z., Leung, T., Li, L.-J., Fei-Fei, L.: Mentornet: regularizing very deep neural networks on corrupted labels. arXiv:1712.05055 (2017)
[13] Khetan, A., Lipton, Z.C., Anandkumar, A.: Learning from noisy singly-labeled data. arXiv:1712.04577 (2017)
[14] Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
[15] Larsen, J., Nonboe, L., Hintz-Madsen, M., Hansen, L.K.: Design of robust neural network classifiers. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), vol. 2, IEEE, pp. 1205-1208 (1998)
[16] LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems 2 (NIPS 1990), pp. 396-404 (1990)
[17] Li, B., Wang, Y., Singh, A., Vorobeychik, Y.: Data poisoning attacks on factorization-based collaborative filtering. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 1885-1893 (2016)
[18] Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., Li, L.-J.: Learning from noisy labels with distillation. In: Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 1910-1918 (2017)
[19] Liu, T.; Tao, D., Classification with noisy labels by importance reweighting, IEEE Trans Pattern Anal Mach Intell, 38, 447-461 (2016) · doi:10.1109/TPAMI.2015.2456899
[20] Long, PM; Servedio, RA, Random classification noise defeats all convex potential boosters, Mach Learn, 78, 287-304 (2010) · Zbl 1470.68139 · doi:10.1007/s10994-009-5165-z
[21] Manwani, N.; Sastry, P., Noise tolerance under risk minimization, IEEE Trans Cybern, 43, 1146-1151 (2013) · doi:10.1109/TSMCB.2012.2223460
[22] Masnadi-Shirazi, H., Vasconcelos, N.: On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in Neural Information Processing Systems 21 (NIPS 2008), pp. 1049-1056 (2008)
[23] Menon, A., Van Rooyen, B., Ong, C.S., Williamson, B.: Learning from corrupted binary labels via class-probability estimation. In: International Conference on Machine Learning, pp. 125-134 (2015)
[24] Mnih, V., Hinton, G.E.: Learning to label aerial images from noisy data. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp. 567-574 (2012)
[25] Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807-814 (2010)
[26] Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems 26, pp. 1196-1204 (2013) · Zbl 1467.68151
[27] Nettleton, DF; Orriols-Puig, A.; Fornells, A., A study of the effect of different types of noise on the precision of supervised learning techniques, Artif Intell Rev, 33, 275-306 (2010) · doi:10.1007/s10462-010-9156-z
[28] Northcutt, C.G., Wu, T., Chuang, I.L.: Learning with confident examples: rank pruning for robust classification with noisy labels. arXiv:1705.01936 (2017)
[29] Patrini, G., Rozza, A., Krishna Menon, A., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944-1952 (2017)
[30] Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. arXiv:1803.09050 (2018)
[31] Steinhardt, J., Koh, P.W.W., Liang, P.S.: Certified defenses for data poisoning attacks. In: Advances in Neural Information Processing Systems NIPS, pp. 3520-3532 (2017)
[32] Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R.: Training convolutional networks with noisy labels. arXiv:1406.2080 (2014)
[33] Van Rooyen, B., Menon, A., Williamson, R.C.: Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10-18 (2015)
[34] Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 839-847 (2017)
[35] Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (2017)
[36] Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2691-2699 (2015)
[37] Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv:1611.03530 (2016)
[38] Zhang, J., Yang, Y.: Robustness of regularized linear classification methods in text categorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, ACM, pp. 190-197 (2003)
[39] Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: The 32nd Conference on Neural Information Processing Systems, pp. 8792-8802 (2018)
[40] Zhu, X.; Wu, X., Class noise vs. attribute noise: a quantitative study, Artif Intell Rev, 22, 177-210 (2004) · Zbl 1069.68587 · doi:10.1007/s10462-004-0751-8
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.