×

Efficient parameter learning of Bayesian network classifiers. (English) Zbl 1454.62183

Summary: Recent advances have demonstrated substantial benefits from learning with both generative and discriminative parameters. On the one hand, generative approaches address the estimation of the parameters of the joint distribution – \(\mathrm{P}(y,\mathbf{x})\), which for most network types is very computationally efficient (a notable exception to this are Markov networks) and on the other hand, discriminative approaches address the estimation of the parameters of the posterior distribution – and, are more effective for classification, since they fit \(\mathrm{P}(y|\mathbf{x})\) directly. However, discriminative approaches are less computationally efficient as the normalization factor in the conditional log-likelihood precludes the derivation of closed-form estimation of parameters. This paper introduces a new discriminative parameter learning method for Bayesian network classifiers that combines in an elegant fashion parameters learned using both generative and discriminative methods. The proposed method is discriminative in nature, but uses estimates of generative probabilities to speed-up the optimization process. A second contribution is to propose a simple framework to characterize the parameter learning task for Bayesian network classifiers. We conduct an extensive set of experiments on 72 standard datasets and demonstrate that our proposed discriminative parameterization provides an efficient alternative to other state-of-the-art parameterizations.

MSC:

62H22 Probabilistic graphical models
68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H12 Estimation in multivariate analysis
Full Text: DOI

References:

[1] Buntine, W. (1994). Operations forlearning with graphical models. Journal of Artificial Intelligence Research, 2, 159-225.
[2] Byrd, R., Lu, P., & Nocedal, J. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific and Statistical Computing, 16(5), 1190-1208. · Zbl 0836.65080 · doi:10.1137/0916069
[3] Carvalho, A., Roos, T., Oliveira, A., & Myllymaki, P. (2011). Discriminative learning of Bayesian networks via factorized conditional log-likelihood. Journal of Machine Learning Research. · Zbl 1280.68158
[4] Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3), 462-467. · Zbl 0165.22305 · doi:10.1109/TIT.1968.1054142
[5] Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1-30. · Zbl 1222.68184
[6] Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 87-102. · Zbl 0767.68084
[7] Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml.
[8] Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2), 131-163. · Zbl 0892.68077 · doi:10.1023/A:1007465528199
[9] Greiner, R., & Zhou, W. (2002). Structural extension to logistic regression: Discriminative parameter learning of belief net classifiers. In Annual national conference on artificial intelligence (AAAI), pp. 167-173. · Zbl 0767.68084
[10] Greiner, R., Su, X., Shen, B., & Zhou, W. (2005). Structural extensions to logistic regression: Discriminative parameter learning of belief net classifiers. Machine Learning, 59, 297-322. · Zbl 1101.68759
[11] Grossman, D., & Domingos, P. (2004). Learning Bayesian network classifiers by maximizing conditional likelihood. In ICML. · Zbl 0836.65080
[12] Heckerman, D., & Meek, C. (1997). Models and selection criteria for regression and classification. In International conference on uncertainty in artificial intelligence. · Zbl 0836.65080
[13] Jebara, T. (2003). Machine Learning: Discriminative and Generative. Berlin: Springer. · Zbl 1030.68073
[14] Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In ICML (pp. 275-283). · Zbl 1222.68184
[15] Langford, J., Li, L., & Strehl, A. (2007). Vowpal wabbit online learning project. https://github.com/JohnLangford/vowpal_wabbit/wiki. · Zbl 1222.68184
[16] Martinez, A., Chen, S., Webb, G. I., & Zaidi, N. A. (2016). Scalable learning of Bayesian network classifiers. Journal of Machine Learning Research, 17, 1-35. · Zbl 1360.68694
[17] Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems. · Zbl 0892.68077
[18] Pernkopf, F., & Bilmes, J. (2005). Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In ICML. · Zbl 1242.68294
[19] Pernkopf, F., & Bilms, J. A. (2010). Efficient heuristics for discriminative structure learning of Bayesian network classifiers. Journal of Machine Learning Research, 11, 2323-2360. · Zbl 1242.68294
[20] Pernkopf, F., & Wohlmayr, M. (2009). On discriminative parameter learning of Bayesian network classifiers. In ECML PKDD. · Zbl 1295.68187
[21] Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press. · Zbl 0853.62046 · doi:10.1017/CBO9780511812651
[22] Roos, T., Wettig, H., Grünwald, P., Myllymäki, P., & Tirri, H. (2005). On discriminative Bayesian network classifiers and logistic regression. Machine Learning, 59(3), 267-296. · Zbl 1101.68785
[23] Rubinstein, Y. D., & Hastie, T. (1997). Discriminative vs informative learning. In AAAI.
[24] Sahami, M. (1996). Learning limited dependence bayesian classifiers. In Proceedings of the second international conference on knowledge discovery and data mining (pp. 335-338). · Zbl 1317.68199
[25] Su, J., Zhang, H., Ling, C., & Matwin, S. (2008). Discriminative parameter learning for Bayesian networks. In ICML.
[26] Webb, G. I. (2000). Multiboosting: A technique for combining boosting and wagging. Machine Learning, 40(2), 159-196. · doi:10.1023/A:1007659514849
[27] Webb, G. I., Boughton, J., Zheng, F., Ting, K. M., & Salem, H. (2012). Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification. Machine Learning, 86(2), 233-272. · Zbl 1238.68136
[28] Zaidi, N. A., Carman, M. J., Cerquides, J., & Webb, G. I. (2014). Naive-bayes inspired effective pre-conditioners for speeding-up logistic regression. In IEEE international conference on data mining. · Zbl 1317.68199
[29] Zaidi, N. A., Cerquides, J., Carman, M. J., & Webb, G. I. (2013). Alleviating naive Bayes attribute independence assumption by attribute weighting. Journal of Machine Learning Research, 14, 1947-1988. · Zbl 1317.68199
[30] Zaidi, N. A., Petitjean, F., & Webb, G. I. (2016). Preconditioning an artificial neural network using naive bayes. In Proceedings of the 20th Pacific-Asia conference on knowledge discovery and data mining (PAKDD).
[31] Zaidi, N. A., Webb, G. I., Carman, M. J., & Petitjean, F. (2015). Deep Broad Learning—Big models for big data. arXiv:1509.01346.
[32] Zhu, C., Byrd, R. H., & Nocedal, J. (1997). LBFGSB, Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23(4), 550-560. · Zbl 0912.65057 · doi:10.1145/279232.279236
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.