Abstract
Making an accurate classifier needs accurate labeling, and accurate labeling needs accurate domain knowledge, experience and criteria, that is, experts to label. In reality, having such experts label all data that we need is often impossible because it requires of the high cost, and sometimes we have to make use of ’cheaper’ data labeled by non-experts. In such case, experts’ and non-experts’ data are not discriminated in learning, even if mislabeled data in non-experts’ data may make the resultant classifier poor. In this paper, we propose a classification method utilizing reliably labeled data. We utilize the previous knowledge of how reliable persons have given the labels, and set the degrees of label confidence on non-experts’ data based on neighboring reliable experts data. The degrees of confidence are reflected in learning as data with higher confidence make a greater contribution to the classifier. With these assumptions, the results of experiments with publicly available data suggest that our method can make a more precise classifier than the conventional method that adopts all data equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Newman, D.J., Asuncion, A.: UCI machine learning repository (2007)
Akkus, A., Guvenir, H.A.: K nearest neighbor classification on feature projections. In: Proc. 13th International Conf. on Machine Learning, pp. 12–19 (1996)
Bay, S.D.: Combining nearest neighbor classifiers through multiple feature subsets. In: Proc. 15th International Conf. on Machine Learning, pp. 37–45 (1998)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: Proc. 16th International Conf. on Machine Learning, pp. 97–105 (1999)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. 13th International Conf. on Machine Learning, pp. 148–156 (1996)
Masnadi-Shirazi, H., Vasconcelos, N.: Asymmetric boosting. In: Proc. 24th International Conf. on Machine Learning, pp. 609–619 (2007)
Ting, K.M.: A comparative study of cost-sensitive boosting algorithms. In: Proc. 17th International Conf. on Machine Learning, pp. 983–990 (2000)
Wang, F., Zhang, C., Shen, H.C., Wang, J.: Semi-supervised classification using linear neighborhood propagation. CVPR 1, 160–167 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nakata, K., Sakurai, S., Orihara, R. (2008). Classification Method Utilizing Reliably Labeled Data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85563-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-85563-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85562-0
Online ISBN: 978-3-540-85563-7
eBook Packages: Computer ScienceComputer Science (R0)