Adaptive Probabilistic Networks with Hidden Variables

3295 Accesses
196 Citations
Explore all metrics

Abstract

Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are used widely for uncertain reasoning in artificial intelligence. In this paper, we investigate the problem of learning probabilistic networks with known structure and hidden variables. This is an important problem, because structure is much easier to elicit from experts than numbers, and the world is rarely fully observable. We present a gradient-based algorithm and show that the gradient can be computed locally, using information that is available as a byproduct of standard inference algorithms for probabilistic networks. Our experimental results demonstrate that using prior knowledge about the structure, even with hidden variables, can significantly improve the learning rate of probabilistic networks. We extend the method to networks in which the conditional probability tables are described using a small number of parameters. Examples include noisy-OR nodes and dynamic probabilistic networks. We show how this additional structure can be exploited by our algorithm to speed up the learning even further. We also outline an extension to hybrid networks, in which some of the nodes take on values in a continuous domain.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Andersen, S. K., Olesen, K. G., Jensen, F. V., & Jensen, F. (1989). HUGIN—A shell for building Bayesian belief universes for expert systems. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 1080–1085). Detroit, MI: Morgan Kaufmann.
Google Scholar
Apolloni, B., & de Falco, D. (1991). Learning by asymmetric parallel Boltzmann machines. Neural Computation, 3, 402–408.
Google Scholar
Baum, E. B., & Wilczek, F. (1988). Supervised learning of probability distributions by neural networks. In Anderson, D. Z. (Ed.), Neural Information Processing Systems, (pp. 52–61). American Institute of Physics, New York.
Google Scholar
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press, Oxford.
Google Scholar
Bridle, J. S. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogelman Soulié & J. Hérault, (Eds.), Neurocomputing: Algorithms, architectures and applications. Berlin: Springer-Verlag.
Google Scholar
Buntine, W. L. (1994). Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.
Google Scholar
Buntine, W. L. (1996). A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8, 195–210.
Google Scholar
Cooper, G., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
Google Scholar
Daganzo, C. (1979). Multinomial probit: The theory and its application to demand forecasting. Academic Press, New York.
Google Scholar
Dasgupta, S. (1997). The sample complexity of learning Bayesian nets. Machine Learning, 29, 165–180.
Google Scholar
Dean, T., & Kanazawa, K. (1988). Probabilistic temporal reasoning. Proceedings of the Seventh National Conference on Artificial Intelligence (pp. 524–528). St. Paul, MN: American Association for Artificial Intelligence.
Google Scholar
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39 (Series B), 1–38.
Google Scholar
Finney, D. J. (1947). Probit analysis; a statistical treatment of the sigmoid response curve. Cambridge, UK: Cambridge University Press.
Google Scholar
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.
Google Scholar
Friedman, N., & Goldszmidt, M. (1996). Learning Bayesian networks with local structure. Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference (pp. 252–262). Portland, OR: Morgan Kaufmann.
Google Scholar
Friedman, N., & Yakhini, M. (1996). On the sample complexity of learning Bayesian networks. Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference (pp. 274–282). Portland, OR: Morgan Kaufmann.
Google Scholar
Fung, R., & Chang, K. C. (1989). Weighting and integrating evidence for stochastic simulation in Bayesian networks. Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence. Windsor, Ontario: Morgan Kaufmann.
Google Scholar
Geiger, D., Heckerman, D., & Meek, C. (1996). Asymptotic model selection for directed networks with hidden variables. Uncertainty in Artificial Intelligence: Proceedings of the Twelfth Conference (pp. 283–290). Portland, OR: Morgan Kaufmann.
Google Scholar
Ghahramani, Z., & Jordan, M. I. (1997). Factorial hidden Markov models. Machine Learning, 29, 245–273.
Google Scholar
Golmard, J.-L., & Mallet, A. (1991). Learning probabilities in causal trees from incomplete databases. Revue d'Intelligence Artificielle, 5, 93–106.
Google Scholar
Haddawy, P. (1994). Generating Bayesian networks from probability logic knowledge bases. Uncertainty in Artificial Intelligence: Proceedings of the Tenth Conference (pp. 262–269). Seattle, WA: Morgan Kaufmann.
Google Scholar
Heckerman, D. (1996). A tutorial on learning with Bayesian networks (Technical report MSR-TR–95–06). Microsoft Research, Redmond, Washington.
Google Scholar
Heckerman, D., Geiger, D., & Chickering, M. (1994). Learning Bayesian networks: The combination of knowledge and statistical data (Technical report MSR-TR–94–09). Microsoft Research, Redmond,Washington.
Google Scholar
Heckerman, D., & Wellman, M. (1995). Bayesian networks. Communications of the Association for Computing Machinery, 38, 27–30.
Google Scholar
Koller, D., & Pfeffer, A. (1996). Learning the parameters of first order probabilistic rules. Working Notes of the AAAI Fall Symposium on Learning Complex Behaviors in Adaptive Intelligent Systems. Cambridge, Massachusetts.
Kwoh, C.-K., & Gillies, D. F. (1996). Using hidden nodes in Bayesian networks. Artificial Intelligence, 88, 1–38.
Google Scholar
Laskey, K. B. (1990). Adapting connectionist learning to Bayes networks. International Journal of Approximate Reasoning, 4, 261–282.
Google Scholar
Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics and Data Analysis, 19, 191–201.
Google Scholar
Lauritzen, S. L., & Wermuth, N. (1989). Graphical models for associations between variables, some of which are qualitative and some quantitative. Annals of Statistics, 17, 31–57.
Google Scholar
Lauritzen, S. L. (1991). The EM algorithm for graphical association models with missing data (Technical report TR–91–05). Department of Statistics, Aalborg University.
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, B 50, 157–224.
Google Scholar
MacKay, D. J. C. (1992). A practical Bayesian framework for back-propagation networks. Neural Computation, 4, 448–472.
Google Scholar
Neal, R. M. (1992a). Asymmetric parallel Boltzmann machines are belief networks. Neural Computation, 4, 832–834.
Google Scholar
Neal, R. M. (1992b). Connectionist learning of belief networks. Artificial Intelligence, 56, 71–113.
Google Scholar
Neal, R. M., & Hinton, G. E. (1993). A new view of the EM algorithm that justifies incremental and other variants. Unpublished manuscript, Department of Computer Science, University of Toronto, Toronto, Ontario.
Google Scholar
Ngo, L., Haddawy, P., & Helwig, J. (1995). A theoretical framework for context-sensitive temporal probability model construction with application to plan projection. Uncertainty in Artificial Intelligence: Proceedings of the Eleventh Conference (pp. 419–426). Montreal, Canada: Morgan Kaufmann.
Google Scholar
Olesen, K. G., Lauritzen, S. L., & Jensen, F. V. (1992). aHUGIN: A system for creating adaptive causal probabilistic networks. Uncertainty in Artificial Intelligence: Proceedings of the Eighth Conference. Stanford, CA: Morgan Kaufmann.
Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Poggio, T., & Girosi, F. (1990). Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247, 978–982.
Google Scholar
Pradhan, M., Provan, G. M., Middleton, B., & Henrion, M. (1994). Knowledge engineering for large belief networks. Uncertainty in Artificial Intelligence: Proceedings of the Tenth Conference (pp. 484–490). Seattle, WA: Morgan Kaufmann.
Google Scholar
Price, W. H. (1992). Numerical recipes in C. Cambridge, UK: Cambridge University Press.
Google Scholar
Russell, S. J., & Grosof, B. (1987). A declarative approach to bias in concept learning. Proceedings of the Sixth National Conference on Artificial Intelligence. Seattle, WA: Morgan Kaufmann.
Google Scholar
Shachter, R. D., & Peot, M. A. (1989). Simulation approaches to general probabilistic inference on belief networks. Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence. Windsor, Ontario: Morgan Kaufmann.
Google Scholar
Shachter, R. S., & Kenley, C. R. (1989). Gaussian influence diagrams. Management Science, 35, 527–550.
Google Scholar
Smyth, P., Heckerman, D., & Jordan, M. (1997). Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9, 227–269.
Google Scholar
Spiegelhalter, D., Dawid, P., Lauritzen, S., & Cowell, R. (1993). Bayesian analysis in expert systems. Statistical Science, 8, 219–282.
Google Scholar
Spiegelhalter, D. J., & Cowell, R. G. (1992). Learning in probabilistic expert systems. In Bernardo, J. M., Berger, J. O., Dawid, A. P., & Smith, A. F. M. (Eds.), Bayesian Statistics 4. Oxford, UK: Oxford University Press.
Google Scholar
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. Berlin: Springer-Verlag.
Google Scholar
Tadepalli, P. (1993). Learning from queries and examples with tree-structured bias. Proceedings of the Tenth International Conference on Machine Learning. Amherst, MA: Morgan Kaufmann.
Google Scholar
Thiesson, B. (1995a). Accelerated quantification of Bayesian networks with incomplete data. Proceedings of the First International Conference on Knowledge Discovery and Data Mining (pp. 306–311). Montreal, Canada: AAAI Press.
Google Scholar
Thiesson, B. (1995b). Score and information for recursive exponential models with incomplete data (Technical report R–95–2020). Institute for Electronic Systems, Aalborg University, Denmark.
Google Scholar
Titterington, D., Smith, A., & Makov, U. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley.
Google Scholar
Towell, G., & Shavlik, J. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70, 119–165.
Google Scholar
Wellman, M. P. (1990). Fundamental concepts of qualitative probabilistic networks. Artificial Intelligence, 44, 257–303.
Google Scholar
Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78, 1550–1560.
Google Scholar
Zweig, G. (1996). Methods for learning dynamic probabilistic networks and a comparison with hidden Markov models. Master's thesis, Computer Science Division, University of California, Berkeley.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Division, University of California, Berkeley, CA, 94720-1776
John Binder
Computer Science Department, Stanford University, Stanford, CA, 94305-9010
Daphne Koller
Computer Science Division, University of California, Berkeley, CA, 94720-1776
Stuart Russell
Microsoft Corporation, One Microsoft Way, Redmond, WA, 98052-6399
Keiji Kanazawa

Authors

John Binder
View author publications
You can also search for this author in PubMed Google Scholar
Daphne Koller
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Russell
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Kanazawa
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Binder, J., Koller, D., Russell, S. et al. Adaptive Probabilistic Networks with Hidden Variables. Machine Learning 29, 213–244 (1997). https://doi.org/10.1023/A:1007421730016

Download citation

Issue Date: November 1997
DOI: https://doi.org/10.1023/A:1007421730016

Adaptive Probabilistic Networks with Hidden Variables

Abstract

Article PDF

Similar content being viewed by others

Probabilistic machine learning and artificial intelligence

Probabilistic Modeling in Machine Learning

Dealing with Continuous Variables in Graphical Models

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Adaptive Probabilistic Networks with Hidden Variables

Abstract

Article PDF

Similar content being viewed by others

Probabilistic machine learning and artificial intelligence

Probabilistic Modeling in Machine Learning

Dealing with Continuous Variables in Graphical Models

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation