×

A family of admissible heuristics for \(\mathrm{A}^*\) to perform inference in probabilistic classifier chains. (English) Zbl 1412.68189

Summary: Probabilistic classifier chains have recently gained interest in multi-label classification, due to their ability to optimally estimate the joint probability of a set of labels. The main hindrance is the excessive computational cost of performing inference in the prediction stage. This pitfall has opened the door to propose efficient inference alternatives that avoid exploring all the possible solutions. The \(\epsilon \)-approximate algorithm, beam search and Monte Carlo sampling are appropriate techniques, but only \(\epsilon \)-approximate algorithm with \(\epsilon =0\) theoretically guarantees reaching an optimal solution in terms of subset 0/1 loss. This paper offers another alternative based on heuristic search that keeps such optimality. It consists of applying the \(\mathrm{A}^*\) algorithm providing an admissible heuristic able to explore fewer nodes than the \(\epsilon \)-approximate algorithm with \(\epsilon =0\). A preliminary study has already coped with this goal, but at the expense of the high computational time of evaluating the heuristic and only for linear models. In this paper, we propose a family of heuristics defined by a parameter that controls the trade-off between the number of nodes explored and the cost of computing the heuristic. Besides, a certain value of the parameter provides a method that is also suitable for the non-linear case. The experiments reported over several benchmark datasets show that the number of nodes explored remains quite steady for different values of the parameter, although the time considerably increases for high values. Hence, low values of the parameter give heuristics that theoretically guarantee exploring fewer nodes than the \(\epsilon \)-approximate algorithm with \(\epsilon =0\) and show competitive computational time. Finally, the results exhibit the good behavior of the \(\mathrm{A}^*\) algorithm using these heuristics in complex situations such as the presence of noise.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)

Software:

ML-KNN
Full Text: DOI

References:

[1] Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1-3. · doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
[2] Cheng, W., & Hüllermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2-3), 211-225. · Zbl 1470.68091 · doi:10.1007/s10994-009-5127-5
[3] Dembczyński, K., Cheng, W., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. ICML, 2010, 279-286.
[4] Dembczyński, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2012). On label dependence and loss minimization in multi-label classification. Machine Learning, 88(1-2), 5-45. · Zbl 1243.68237 · doi:10.1007/s10994-012-5285-8
[5] Dembczynski, K.; Waegeman, W.; Hllermeier, E.; Raedt, LD (ed.); Bessire, C. (ed.); Dubois, D. (ed.); Doherty, P. (ed.); Frasconi, P. (ed.); Heintz, F. (ed.); Lucas, PJF (ed.), An analysis of chaining in multi-label classification, No. 242, 294-299 (2012), Amsterdam · Zbl 1327.68189
[6] Elisseeff, A., & Weston, J. (2005). A kernel method for multi-labelled classification. In: ACM conference on research and development in information retrieval (pp. 274-281).
[7] Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100-107. · doi:10.1109/TSSC.1968.300136
[8] Kumar, A., Vembu, S., Menon, A. K., & Elkan, C. (2012). Learning and inference in probabilistic classifier chains with beam search. ECML/PKDD, 2012, 665-680.
[9] Kumar, A., Vembu, S., Menon, A. K., & Elkan, C. (2013). Beam search algorithms for multilabel learning. Machine Learning, 92(1), 65-89. · Zbl 1273.68301 · doi:10.1007/s10994-013-5371-6
[10] Lin, C. J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for logistic regression. Journal of Machine Learning Research, 9(Apr), 627-650. · Zbl 1225.68195
[11] McCallum, A. K. (1999). Multi-label text classification with a mixture model trained by EM. In: AAAI 99 workshop on text learning.
[12] Mena, D., Montañés, E., Quevedo, J. R., & del Coz, J. J. (2015). An overview of inference methods in probabilistic classifier chains for multi-label classification. Technical Report: Artificial Intelligence Center, University of Oviedo, Spain, Campus de Viesques, Gijón.
[13] Mena, D., Montañés, E., Quevedo, J. R., & del Coz, J. J. (2015). Using a* for inference in probabilistic classifier chains. In: IJCAI 2015, Proceedings of the 20th international joint conference on artificial intelligence (pp. 3707-3713), Buenos Aires, Argentina, July 25-31.
[14] Montañés, E., Quevedo, J. R., & del Coz, J. J. (2011). Aggregating independent and dependent models to learn multi-label classifiers. In: ECML/PKDD’11—Volume Part II (pp. 484-500). Springer-Verlag. · Zbl 1326.68251
[15] Montañés, E., Senge, R., Barranquero, J., Quevedo, J. R., del Coz, J. J., & Hüllermeier, E. (2014). Dependent binary relevance models for multi-label classification. Pattern Recognition, 47(3), 1494-1508. Handwriting Recognition and other PR Applications
[16] Pearl, J. (1984). Heuristics: Intelligent search strategies for computer problem solving. Boston, MA: Addison-Wesley Longman Publishing Co. Inc.
[17] Read, J., Martino, L., & Luengo, D. (2014). Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recognition, 47(3), 1535-1546. · Zbl 1326.68251 · doi:10.1016/j.patcog.2013.10.006
[18] Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier chains for multi-label classification. In: ECML/PKDD’09, LNCS (pp. 254-269). Springer
[19] Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333-359. · doi:10.1007/s10994-011-5256-5
[20] Russell, S. J., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd ed.). New York: Pearson Education. · Zbl 0835.68093
[21] Senge, R., Barranquero, J., del Coz, J. J., & Hüllermeier, E. (2012a). Rectifying classifier chains for multi-label classification. Technical Report.
[22] Senge, R., del Coz, J. J., & Hüllermeier, E. (2012b). On the problem of error propagation in classifier chains for multi-label classification. In: Conference of the German classification society on data analysis, machine learning and knowledge discovery.
[23] Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In: ECML/PKDD’07, LNCS (pp. 406-417). Springer.
[24] Zhang, M. L., & Zhou, Z. H. (2007). Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038-2048. · Zbl 1111.68629 · doi:10.1016/j.patcog.2006.12.019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.