-
Revisiting a Core-Jet Laboratory at High Redshift: Analysis of the Radio Jet in the Quasar PKS 2215+020 at z=3.572
Authors:
Sándor Frey,
Judit Fogasy,
Krisztina Perger,
Kateryna Kulish,
Petra Benke,
Dávid Koller,
Krisztina Éva Gabányi
Abstract:
The prominent radio quasar PKS 2215+020 (J2217+0220) was once labelled as a new laboratory for core--jet physics at redshift z=3.572 because of its exceptionally extended jet structure traceable with very long baseline interferometric (VLBI) observations up to a ~600 pc projected distance from the compact core and a hint of an arcsec-scale radio and an X-ray jet. While the presence of an X-ray jet…
▽ More
The prominent radio quasar PKS 2215+020 (J2217+0220) was once labelled as a new laboratory for core--jet physics at redshift z=3.572 because of its exceptionally extended jet structure traceable with very long baseline interferometric (VLBI) observations up to a ~600 pc projected distance from the compact core and a hint of an arcsec-scale radio and an X-ray jet. While the presence of an X-ray jet could not be confirmed later, this active galactic nucleus is still unique at high redshift with its long VLBI jet. Here, we analyse archival multi-epoch VLBI imaging data at five frequency bands from 1.7 to 15.4 GHz covering a period of more than 25 years from 1995 to 2020. We constrain apparent proper motions of jet components in PKS 2215+020 for the first time. Brightness distribution modeling at 8 GHz reveals a nearly 0.02 mas/yr proper motion (moderately superluminal with apparently two times the speed of light), and provides delta=11.5 for the Doppler-boosting factor in the inner relativistic jet that is inclined within 2 deg to the line of sight and has a Gamma=6 bulk Lorentz factor. These values qualify PKS 2215+020 as a blazar, with rather typical jet properties in a small sample of only about 20 objects at z>3.5 that have similar measurements to date. According to the 2-GHz VLBI data, the diffuse and extended outer emission feature at ~60 mas from the core, probably a place where the jet interacts with and decelerated by the ambient galactic medium, is consistent with being stationary, albeit slow motion cannot be excluded based on the presently available data.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Efficient Spatially Adaptive Convolution and Correlation
Authors:
Thomas W. Mitchel,
Benedict Brown,
David Koller,
Tim Weyrich,
Szymon Rusinkiewicz,
Michael Kazhdan
Abstract:
Fast methods for convolution and correlation underlie a variety of applications in computer vision and graphics, including efficient filtering, analysis, and simulation. However, standard convolution and correlation are inherently limited to fixed filters: spatial adaptation is impossible without sacrificing efficient computation. In early work, Freeman and Adelson have shown how steerable filters…
▽ More
Fast methods for convolution and correlation underlie a variety of applications in computer vision and graphics, including efficient filtering, analysis, and simulation. However, standard convolution and correlation are inherently limited to fixed filters: spatial adaptation is impossible without sacrificing efficient computation. In early work, Freeman and Adelson have shown how steerable filters can address this limitation, providing a way for rotating the filter as it is passed over the signal. In this work, we provide a general, representation-theoretic, framework that allows for spatially varying linear transformations to be applied to the filter. This framework allows for efficient implementation of extended convolution and correlation for transformation groups such as rotation (in 2D and 3D) and scale, and provides a new interpretation for previous methods including steerable filters and the generalized Hough transform. We present applications to pattern matching, image feature description, vector field visualization, and adaptive image filtering.
△ Less
Submitted 28 July, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Peptide-Spectra Matching from Weak Supervision
Authors:
Samuel S. Schoenholz,
Sean Hackett,
Laura Deming,
Eugene Melamud,
Navdeep Jaitly,
Fiona McAllister,
Jonathon O'Brien,
George Dahl,
Bryson Bennett,
Andrew M. Dai,
Daphne Koller
Abstract:
As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain. Instead, we have access to imperfect hand-coded models crafted by domain experts. In this paper, we apply deep neural networks to an important step of the pro…
▽ More
As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain. Instead, we have access to imperfect hand-coded models crafted by domain experts. In this paper, we apply deep neural networks to an important step of the protein identification problem, the pairing of mass spectra with short sequences of amino acids called peptides. We train our model to differentiate between top scoring results from a state-of-the art classical system and hard-negative second and third place results. Our resulting model is much better at identifying peptides with spectra than the model used to generate its training data. In particular, we achieve a 43% improvement over standard matching methods and a 10% improvement over a combination of the matching method and an industry standard cross-spectra reranking tool. Importantly, in a more difficult experimental regime that reflects current challenges facing biologists, our advantage over the previous state-of-the-art grows to 15% even after reranking. We believe this approach will generalize to other challenging scientific problems.
△ Less
Submitted 22 August, 2018; v1 submitted 20 August, 2018;
originally announced August 2018.
-
Inferring Multidimensional Rates of Aging from Cross-Sectional Data
Authors:
Emma Pierson,
Pang Wei Koh,
Tatsunori Hashimoto,
Daphne Koller,
Jure Leskovec,
Nicholas Eriksson,
Percy Liang
Abstract:
Modeling how individuals evolve over time is a fundamental problem in the natural and social sciences. However, existing datasets are often cross-sectional with each individual observed only once, making it impossible to apply traditional time-series methods. Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional…
▽ More
Modeling how individuals evolve over time is a fundamental problem in the natural and social sciences. However, existing datasets are often cross-sectional with each individual observed only once, making it impossible to apply traditional time-series methods. Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data. Our model represents each individual's features over time as a nonlinear function of a low-dimensional, linearly-evolving latent state. We prove that when this nonlinear function is constrained to be order-isomorphic, the model family is identifiable solely from cross-sectional data provided the distribution of time-independent variation is known. On the UK Biobank human health dataset, our model reconstructs the observed data while learning interpretable rates of aging associated with diseases, mortality, and aging risk factors.
△ Less
Submitted 5 March, 2019; v1 submitted 12 July, 2018;
originally announced July 2018.
-
Simple way to apply nonlocal van der Waals functionals within all-electron methods
Authors:
Fabien Tran,
Julia Stelzl,
David Koller,
Thomas Ruh,
Peter Blaha
Abstract:
The method based on fast Fourier transforms proposed by G. Román-Pérez and J. M. Soler [Phys. Rev. Lett. 103, 096102 (2009)], which allows for a computationally fast implementation of the nonlocal van der Waals (vdW) functionals, has significantly contributed to making the vdW functionals popular in solid-state physics. However, the Román-Pérez-Soler method relies on a plane-wave expansion of the…
▽ More
The method based on fast Fourier transforms proposed by G. Román-Pérez and J. M. Soler [Phys. Rev. Lett. 103, 096102 (2009)], which allows for a computationally fast implementation of the nonlocal van der Waals (vdW) functionals, has significantly contributed to making the vdW functionals popular in solid-state physics. However, the Román-Pérez-Soler method relies on a plane-wave expansion of the electron density; therefore it can not be applied readily to all-electron densities for which an unaffordable number of plane waves would be required for an accurate expansion. In this work, we present the results for the lattice constant and binding energy of solids that were obtained by applying a smoothing procedure to the all-electron density calculated with the linearized augmented plane-wave method. The smoothing procedure has the advantages of being very simple to implement, basis-set independent, and allowing the calculation of the potential. It is also shown that the results agree very well with those from the literature that were obtained with the projector augmented wave method.
△ Less
Submitted 27 July, 2017; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Tuned Models of Peer Assessment in MOOCs
Authors:
Chris Piech,
Jonathan Huang,
Zhenghao Chen,
Chuong Do,
Andrew Ng,
Daphne Koller
Abstract:
In massive open online courses (MOOCs), peer grading serves as a critical tool for scaling the grading of complex, open-ended assignments to courses with tens or hundreds of thousands of students. But despite promising initial trials, it does not always deliver accurate results compared to human experts. In this paper, we develop algorithms for estimating and correcting for grader biases and relia…
▽ More
In massive open online courses (MOOCs), peer grading serves as a critical tool for scaling the grading of complex, open-ended assignments to courses with tens or hundreds of thousands of students. But despite promising initial trials, it does not always deliver accurate results compared to human experts. In this paper, we develop algorithms for estimating and correcting for grader biases and reliabilities, showing significant improvement in peer grading accuracy on real data with 63,199 peer grades from Coursera's HCI course offerings --- the largest peer grading networks analysed to date. We relate grader biases and reliabilities to other student factors such as student engagement, performance as well as commenting style. We also show that our model can lead to more intelligent assignment of graders to gradees.
△ Less
Submitted 9 July, 2013;
originally announced July 2013.
-
Hybrid functionals for solids with an optimized Hartree-Fock mixing parameter
Authors:
David Koller,
Peter Blaha,
Fabien Tran
Abstract:
(Screened) hybrid functionals are being used more and more for solid-state calculations. Usually the fraction alpha of Hartree-Fock exchange is kept fixed during the calculation, however there is no single (universal) value for alpha which systematically leads to satisfying accuracy. Instead, one could use a property of the system under consideration to determine alpha and in this way the function…
▽ More
(Screened) hybrid functionals are being used more and more for solid-state calculations. Usually the fraction alpha of Hartree-Fock exchange is kept fixed during the calculation, however there is no single (universal) value for alpha which systematically leads to satisfying accuracy. Instead, one could use a property of the system under consideration to determine alpha and in this way the functional would be more flexible and potentially more accurate. Recently, it was proposed to use the static dielectric constant epsilon for the calculation of alpha [Shimazaki and Asai, Chem. Phys. Lett. 466, 91 (2008) and Marques et al., Phys. Rev. B 83, 035119 (2011)]. We explore this idea further and propose a scheme where the connection between epsilon and alpha is optimized based on experimental band gaps. epsilon, and thus alpha, is recalculated at each iteration of the self-consistent procedure. We present results for the band gap and lattice constant of various semiconductors and insulators with this procedure. In addition, we show that this approach can also be combined with a non-self-consistent hybrid approximation to speed up the calculations considerably, while retaining an excellent accuracy in most cases.
△ Less
Submitted 16 April, 2013;
originally announced April 2013.
-
Probability Estimation in Face of Irrelevant Information
Authors:
Adam J. Grove,
Daphne Koller
Abstract:
In this paper, we consider one aspect of the problem of applying decision theory to the design of agents that learn how to make decisions under uncertainty. This aspect concerns how an agent can estimate probabilities for the possible states of the world, given that it only makes limited observations before committing to a decision. We show that the naive application of statistical tools can be…
▽ More
In this paper, we consider one aspect of the problem of applying decision theory to the design of agents that learn how to make decisions under uncertainty. This aspect concerns how an agent can estimate probabilities for the possible states of the world, given that it only makes limited observations before committing to a decision. We show that the naive application of statistical tools can be improved upon if the agent can determine which of his observations are truly relevant to the estimation problem at hand. We give a framework in which such determinations can be made, and define an estimation procedure to use them. Our framework also suggests several extensions, which show how additional knowledge can be used to improve tile estimation procedure still further.
△ Less
Submitted 20 March, 2013;
originally announced March 2013.
-
Generating New Beliefs From Old
Authors:
Fahiem Bacchus,
Adam J. Grove,
Joseph Y. Halpern,
Daphne Koller
Abstract:
In previous work [BGHK92, BGHK93], we have studied the random-worlds approach -- a particular (and quite powerful) method for generating degrees of belief (i.e., subjective probabilities) from a knowledge base consisting of objective (first-order, statistical, and default) information. But allowing a knowledge base to contain only objective information is sometimes limiting. We occasionally wish…
▽ More
In previous work [BGHK92, BGHK93], we have studied the random-worlds approach -- a particular (and quite powerful) method for generating degrees of belief (i.e., subjective probabilities) from a knowledge base consisting of objective (first-order, statistical, and default) information. But allowing a knowledge base to contain only objective information is sometimes limiting. We occasionally wish to include information about degrees of belief in the knowledge base as well, because there are contexts in which old beliefs represent important information that should influence new beliefs. In this paper, we describe three quite general techniques for extending a method that generates degrees of belief from objective information to one that can make use of degrees of belief as well. All of our techniques are bloused on well-known approaches, such as cross-entropy. We discuss general connections between the techniques and in particular show that, although conceptually and technically quite different, all of the techniques give the same answer when applied to the random-worlds method.
△ Less
Submitted 27 February, 2013;
originally announced February 2013.
-
Stochastic Simulation Algorithms for Dynamic Probabilistic Networks
Authors:
Keiji Kanazawa,
Daphne Koller,
Stuart Russell
Abstract:
Stochastic simulation algorithms such as likelihood weighting often give fast, accurate approximations to posterior probabilities in probabilistic networks, and are the methods of choice for very large networks. Unfortunately, the special characteristics of dynamic probabilistic networks (DPNs), which are used to represent stochastic temporal processes, mean that standard simulation algorithms pe…
▽ More
Stochastic simulation algorithms such as likelihood weighting often give fast, accurate approximations to posterior probabilities in probabilistic networks, and are the methods of choice for very large networks. Unfortunately, the special characteristics of dynamic probabilistic networks (DPNs), which are used to represent stochastic temporal processes, mean that standard simulation algorithms perform very poorly. In essence, the simulation trials diverge further and further from reality as the process is observed over time. In this paper, we present simulation algorithms that use the evidence observed at each time step to push the set of trials back towards reality. The first algorithm, "evidence reversal" (ER) restructures each time slice of the DPN so that the evidence nodes for the slice become ancestors of the state variables. The second algorithm, called "survival of the fittest" sampling (SOF), "repopulates" the set of trials at each time step using a stochastic reproduction rate weighted by the likelihood of the evidence according to each trial. We compare the performance of each algorithm with likelihood weighting on the original network, and also investigate the benefits of combining the ER and SOF methods. The ER/SOF combination appears to maintain bounded error independent of the number of time steps in the simulation.
△ Less
Submitted 20 February, 2013;
originally announced February 2013.
-
Context-Specific Independence in Bayesian Networks
Authors:
Craig Boutilier,
Nir Friedman,
Moises Goldszmidt,
Daphne Koller
Abstract:
Bayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution. This allows a natural and compact representation of the distribution, eases knowledge acquisition, and supports effective inference algorithms. It is well-known, however, that there are certain independencies that we cannot capture qualitatively within the Bayesian netwo…
▽ More
Bayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution. This allows a natural and compact representation of the distribution, eases knowledge acquisition, and supports effective inference algorithms. It is well-known, however, that there are certain independencies that we cannot capture qualitatively within the Bayesian network structure: independencies that hold only in certain contexts, i.e., given a specific assignment of values to certain variables. In this paper, we propose a formal notion of context-specific independence (CSI), based on regularities in the conditional probability tables (CPTs) at a node. We present a technique, analogous to (and based on) d-separation, for determining when such independence holds in a given network. We then focus on a particular qualitative representation scheme - tree-structured CPTs - for capturing CSI. We suggest ways in which this representation can be used to support effective inference algorithms. In particular, we present a structural decomposition of the resulting network which can improve the performance of clustering algorithms, and an alternative algorithm based on cutset conditioning.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
Nonuniform Dynamic Discretization in Hybrid Networks
Authors:
Alexander V. Kozlov,
Daphne Koller
Abstract:
We consider probabilistic inference in general hybrid networks, which include continuous and discrete variables in an arbitrary topology. We reexamine the question of variable discretization in a hybrid network aiming at minimizing the information loss induced by the discretization. We show that a nonuniform partition across all variables as opposed to uniform partition of each variable separate…
▽ More
We consider probabilistic inference in general hybrid networks, which include continuous and discrete variables in an arbitrary topology. We reexamine the question of variable discretization in a hybrid network aiming at minimizing the information loss induced by the discretization. We show that a nonuniform partition across all variables as opposed to uniform partition of each variable separately reduces the size of the data structures needed to represent a continuous function. We also provide a simple but efficient procedure for nonuniform partition. To represent a nonuniform discretization in the computer memory, we introduce a new data structure, which we call a Binary Split Partition (BSP) tree. We show that BSP trees can be an exponential factor smaller than the data structures in the standard uniform discretization in multiple dimensions and show how the BSP trees can be used in the standard join tree algorithm. We show that the accuracy of the inference process can be significantly improved by adjusting discretization with evidence. We construct an iterative anytime algorithm that gradually improves the quality of the discretization and the accuracy of the answer on a query. We provide empirical evidence that the algorithm converges.
△ Less
Submitted 6 February, 2013;
originally announced February 2013.
-
Object-Oriented Bayesian Networks
Authors:
Daphne Koller,
Avi Pfeffer
Abstract:
Bayesian networks provide a modeling language and associated inference algorithm for stochastic domains. They have been successfully applied in a variety of medium-scale applications. However, when faced with a large complex domain, the task of modeling using Bayesian networks begins to resemble the task of programming using logical circuits. In this paper, we describe an object-oriented Bayesi…
▽ More
Bayesian networks provide a modeling language and associated inference algorithm for stochastic domains. They have been successfully applied in a variety of medium-scale applications. However, when faced with a large complex domain, the task of modeling using Bayesian networks begins to resemble the task of programming using logical circuits. In this paper, we describe an object-oriented Bayesian network (OOBN) language, which allows complex domains to be described in terms of inter-related objects. We use a Bayesian network fragment to describe the probabilistic relations between the attributes of an object. These attributes can themselves be objects, providing a natural framework for encoding part-of hierarchies. Classes are used to provide a reusable probabilistic model which can be applied to multiple similar objects. Classes also support inheritance of model fragments from a class to a subclass, allowing the common aspects of related classes to be defined only once. Our language has clear declarative semantics: an OOBN can be interpreted as a stochastic functional program, so that it uniquely specifies a probabilistic model. We provide an inference algorithm for OOBNs, and show that much of the structural information encoded by an OOBN--particularly the encapsulation of variables within an object and the reuse of model fragments in different contexts--can also be used to speed up the inference process.
△ Less
Submitted 6 February, 2013;
originally announced February 2013.
-
Update Rules for Parameter Estimation in Bayesian Networks
Authors:
Eric Bauer,
Daphne Koller,
Yoram Singer
Abstract:
This paper re-examines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in on-line learning [Kivinen & Warmuth, 1994]. We provide a unified framework for parameter estimation that encompasses both on-line learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional b…
▽ More
This paper re-examines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in on-line learning [Kivinen & Warmuth, 1994]. We provide a unified framework for parameter estimation that encompasses both on-line learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a pre-accumulated set of samples is used in a one-time model selection process. In the batch case, our framework encompasses both the gradient projection algorithm and the EM algorithm for Bayesian networks. The framework also leads to new on-line and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM.
△ Less
Submitted 6 February, 2013;
originally announced February 2013.
-
Tractable Inference for Complex Stochastic Processes
Authors:
Xavier Boyen,
Daphne Koller
Abstract:
The monitoring and control of any dynamic system depends crucially on the ability to reason about its current status and its future trajectory. In the case of a stochastic system, these tasks typically involve the use of a belief state- a probability distribution over the state of the process at a given point in time. Unfortunately, the state spaces of complex processes are very large, making an e…
▽ More
The monitoring and control of any dynamic system depends crucially on the ability to reason about its current status and its future trajectory. In the case of a stochastic system, these tasks typically involve the use of a belief state- a probability distribution over the state of the process at a given point in time. Unfortunately, the state spaces of complex processes are very large, making an explicit representation of a belief state intractable. Even in dynamic Bayesian networks (DBNs), where the process itself can be represented compactly, the representation of the belief state is intractable. We investigate the idea of maintaining a compact approximation to the true belief state, and analyze the conditions under which the errors due to the approximations taken over the lifetime of the process do not accumulate to make our answers completely irrelevant. We show that the error in a belief state contracts exponentially as the process evolves. Thus, even with multiple approximations, the error in our process remains bounded indefinitely. We show how the additional structure of a DBN can be used to design our approximation scheme, improving its performance significantly. We demonstrate the applicability of our ideas in the context of a monitoring task, showing that orders of magnitude faster inference can be achieved with only a small degradation in accuracy.
△ Less
Submitted 30 January, 2013;
originally announced January 2013.
-
SPOOK: A System for Probabilistic Object-Oriented Knowledge Representation
Authors:
Avi Pfeffer,
Daphne Koller,
Brian Milch,
Ken T. Takusagawa
Abstract:
In previous work, we pointed out the limitations of standard Bayesian networks as a modeling framework for large, complex domains. We proposed a new, richly structured modeling language, {em Object-oriented Bayesian Netorks}, that we argued would be able to deal with such domains. However, it turns out that OOBNs are not expressive enough to model many interesting aspects of complex domains: the…
▽ More
In previous work, we pointed out the limitations of standard Bayesian networks as a modeling framework for large, complex domains. We proposed a new, richly structured modeling language, {em Object-oriented Bayesian Netorks}, that we argued would be able to deal with such domains. However, it turns out that OOBNs are not expressive enough to model many interesting aspects of complex domains: the existence of specific named objects, arbitrary relations between objects, and uncertainty over domain structure. These aspects are crucial in real-world domains such as battlefield awareness. In this paper, we present SPOOK, an implemented system that addresses these limitations. SPOOK implements a more expressive language that allows it to represent the battlespace domain naturally and compactly. We present a new inference algorithm that utilizes the model structure in a fundamental way, and show empirically that it achieves orders of magnitude speedup over existing approaches.
△ Less
Submitted 23 January, 2013;
originally announced January 2013.
-
A General Algorithm for Approximate Inference and its Application to Hybrid Bayes Nets
Authors:
Daphne Koller,
Uri Lerner,
Dragomir Anguelov
Abstract:
The clique tree algorithm is the standard method for doing inference in Bayesian networks. It works by manipulating clique potentials - distributions over the variables in a clique. While this approach works well for many networks, it is limited by the need to maintain an exact representation of the clique potentials. This paper presents a new unified approach that combines approximate inference…
▽ More
The clique tree algorithm is the standard method for doing inference in Bayesian networks. It works by manipulating clique potentials - distributions over the variables in a clique. While this approach works well for many networks, it is limited by the need to maintain an exact representation of the clique potentials. This paper presents a new unified approach that combines approximate inference and the clique tree algorithm, thereby circumventing this limitation. Many known approximate inference algorithms can be viewed as instances of this approach. The algorithm essentially does clique tree propagation, using approximate inference to estimate the densities in each clique. In many settings, the computation of the approximate clique potential can be done easily using statistical importance sampling. Iterations are used to gradually improve the quality of the estimation.
△ Less
Submitted 23 January, 2013;
originally announced January 2013.
-
Discovering the Hidden Structure of Complex Dynamic Systems
Authors:
Xavier Boyen,
Nir Friedman,
Daphne Koller
Abstract:
Dynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly…
▽ More
Dynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly those where some relevant variables are partially observed or even entirely unknown. Our approach is based on the Structural Expectation Maximization (SEM) algorithm. The main computational cost of the SEM algorithm is the gathering of expected sufficient statistics. We propose a novel approximation scheme that allows these sufficient statistics to be computed efficiently. We also investigate the fundamental problem of discovering the existence of hidden variables without exhaustive and expensive search. Our approach is based on the observation that, in dynamic systems, ignoring a hidden variable typically results in a violation of the Markov property. Thus, our algorithm searches for such violations in the data, and introduces hidden variables to explain them. We provide empirical results showing that the algorithm is able to learn the dynamics of complex systems in a computationally tractable way.
△ Less
Submitted 23 January, 2013;
originally announced January 2013.
-
Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (2001)
Authors:
John Breese,
Daphne Koller
Abstract:
This is the Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, which was held in Seattle, WA, August 2-5 2001
This is the Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, which was held in Seattle, WA, August 2-5 2001
△ Less
Submitted 28 August, 2014; v1 submitted 19 January, 2013;
originally announced January 2013.
-
Probabilistic Models for Agents' Beliefs and Decisions
Authors:
Brian Milch,
Daphne Koller
Abstract:
Many applications of intelligent systems require reasoning about the mental states of agents in the domain. We may want to reason about an agent's beliefs, including beliefs about other agents; we may also want to reason about an agent's preferences, and how his beliefs and preferences relate to his behavior. We define a probabilistic epistemic logic (PEL) in which belief statements are given a…
▽ More
Many applications of intelligent systems require reasoning about the mental states of agents in the domain. We may want to reason about an agent's beliefs, including beliefs about other agents; we may also want to reason about an agent's preferences, and how his beliefs and preferences relate to his behavior. We define a probabilistic epistemic logic (PEL) in which belief statements are given a formal semantics, and provide an algorithm for asserting and querying PEL formulas in Bayesian networks. We then show how to reason about an agent's behavior by modeling his decision process as an influence diagram and assuming that he behaves rationally. PEL can then be used for reasoning from an agent's observed actions to conclusions about other aspects of the domain, including unobserved domain variables and the agent's mental states.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.
-
Policy Iteration for Factored MDPs
Authors:
Daphne Koller,
Ron Parr
Abstract:
Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has shown that value functions in factored MDPs can often be approximated well using a decomposed value function: a linear combination of <I>restricted</I> basis functions, each of which refers only to a small subset…
▽ More
Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has shown that value functions in factored MDPs can often be approximated well using a decomposed value function: a linear combination of <I>restricted</I> basis functions, each of which refers only to a small subset of variables. An approximate value function for a particular policy can be computed using approximate dynamic programming, but this approach (and others) can only produce an approximation relative to a distance metric which is weighted by the stationary distribution of the current policy. This type of weighted projection is ill-suited to policy improvement. We present a new approach to value determination, that uses a simple closed-form computation to directly compute a least-squares decomposed approximation to the value function <I>for any weights</I>. We then use this value determination algorithm as a subroutine in a policy iteration process. We show that, under reasonable restrictions, the policies induced by a factored value function are compactly represented, and can be manipulated efficiently in a policy iteration process. We also present a method for computing error bounds for decomposed value functions using a variable-elimination algorithm for function optimization. The complexity of all of our algorithms depends on the factorization of system dynamics and of the approximate value function.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.
-
Being Bayesian about Network Structure
Authors:
Nir Friedman,
Daphne Koller
Abstract:
In many domains, we are interested in analyzing the structure of the underlying distribution, e.g., whether one variable is a direct parent of the other. Bayesian model-selection attempts to find the MAP model and use its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute…
▽ More
In many domains, we are interested in analyzing the structure of the underlying distribution, e.g., whether one variable is a direct parent of the other. Bayesian model-selection attempts to find the MAP model and use its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables. This allows us to compute, for a given ordering, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orderings rather than over network structures. The space of orderings is much smaller and more regular than the space of structures, and has a smoother posterior `landscape'. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.
-
Utilities as Random Variables: Density Estimation and Structure Discovery
Authors:
Urszula Chajewska,
Daphne Koller
Abstract:
Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person's utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply statistical density estimation techniques to learn such a density function from a database of partially…
▽ More
Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person's utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply statistical density estimation techniques to learn such a density function from a database of partially elicited utility functions. In particular, we define a Bayesian learning framework for this problem, assuming the distribution over utilities is a mixture of Gaussians, where the mixture components represent statistically coherent subpopulations. We can also extend our techniques to the problem of discovering generalized additivity structure in the utility functions in the population. We define a Bayesian model selection criterion for utility function structure and a search procedure over structures. The factorization of the utilities in the learned model, and the generalization obtained from density estimation, allows us to provide robust estimates of utilities using a significantly smaller number of utility elicitation questions. We experiment with our technique on synthetic utility data and on a real database of utility functions in the domain of prenatal diagnosis.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.
-
Exact Inference in Networks with Discrete Children of Continuous Parents
Authors:
Uri Lerner,
Eran Segal,
Daphne Koller
Abstract:
Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can…
▽ More
Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks. Animportant subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains also include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to berepresented. No exact inference algorithm has been proposed for these enhanced CLG networks. In this paper, we generalize Lauritzen's algorithm, providing the first "exact" inference algorithm for augmented CLG networks - networks where continuous nodes are conditional linear Gaussians but that also allow discrete children ofcontinuous parents. Our algorithm is exact in the sense that it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to the accuracy obtained by numerical integration used within thealgorithm. When the discrete children are modeled with softmax CPDs (as is the case in many real world domains) the approximation of the continuous distributions using the first two moments is particularly accurate. Our algorithm is simple to implement and often comparable in its complexity to Lauritzen's algorithm. We show empirically that it achieves substantially higher accuracy than previous approximate algorithms.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
Discriminative Probabilistic Models for Relational Data
Authors:
Ben Taskar,
Pieter Abbeel,
Daphne Koller
Abstract:
In many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hypertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, ignoring the correlations between them. Recently, Probabilistic Relational Models, a relational versi…
▽ More
In many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hypertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, ignoring the correlations between them. Recently, Probabilistic Relational Models, a relational version of Bayesian networks, were used to define a joint probabilistic model for a collection of related entities. In this paper, we present an alternative framework that builds on (conditional) Markov networks and addresses two limitations of the previous approach. First, undirected models do not impose the acyclicity constraint that hinders representation of many important relational dependencies in directed models. Second, undirected models are well suited for discriminative training, where we optimize the conditional likelihood of the labels given the features, which generally improves classification accuracy. We show how to train these models effectively, and how to use approximate probabilistic inference over the learned model for collective classification of multiple related entities. We provide experimental results on a webpage classification task, showing that accuracy can be significantly improved by modeling relational dependencies.
△ Less
Submitted 12 December, 2012;
originally announced January 2013.
-
Continuous Time Bayesian Networks
Authors:
Uri Nodelman,
Christian R. Shelton,
Daphne Koller
Abstract:
In this paper we present a language for finite state continuous time Bayesian networks (CTBNs), which describe structured stochastic processes that evolve over continuous time. The state of the system is decomposed into a set of local variables whose values change over time. The dynamics of the system are described by specifying the behavior of each local variable as a function of its parents in…
▽ More
In this paper we present a language for finite state continuous time Bayesian networks (CTBNs), which describe structured stochastic processes that evolve over continuous time. The state of the system is decomposed into a set of local variables whose values change over time. The dynamics of the system are described by specifying the behavior of each local variable as a function of its parents in a directed (possibly cyclic) graph. The model specifies, at any given point in time, the distribution over two aspects: when a local variable changes its value and the next value it takes. These distributions are determined by the variable s CURRENT value AND the CURRENT VALUES OF its parents IN the graph.More formally, each variable IS modelled AS a finite state continuous time Markov process whose transition intensities are functions OF its parents.We present a probabilistic semantics FOR the language IN terms OF the generative model a CTBN defines OVER sequences OF events.We list types OF queries one might ask OF a CTBN, discuss the conceptual AND computational difficulties associated WITH exact inference, AND provide an algorithm FOR approximate inference which takes advantage OF the structure within the process.
△ Less
Submitted 12 December, 2012;
originally announced January 2013.
-
Monitoring a Complez Physical System using a Hybrid Dynamic Bayes Net
Authors:
Uri Lerner,
Brooks Moses,
Maricia Scott,
Sheila McIlraith,
Daphne Koller
Abstract:
The Reverse Water Gas Shift system (RWGS) is a complex physical system designed to produce oxygen from the carbon dioxide atmosphere on Mars. If sent to Mars, it would operate without human supervision, thus requiring a reliable automated system for monitoring and control. The RWGS presents many challenges typical of real-world systems, including: noisy and biased sensors, nonlinear behavior, e…
▽ More
The Reverse Water Gas Shift system (RWGS) is a complex physical system designed to produce oxygen from the carbon dioxide atmosphere on Mars. If sent to Mars, it would operate without human supervision, thus requiring a reliable automated system for monitoring and control. The RWGS presents many challenges typical of real-world systems, including: noisy and biased sensors, nonlinear behavior, effects that are manifested over different time granularities, and unobservability of many important quantities. In this paper we model the RWGS using a hybrid (discrete/continuous) Dynamic Bayesian Network (DBN), where the state at each time slice contains 33 discrete and 184 continuous variables. We show how the system state can be tracked using probabilistic inference over the model. We discuss how to deal with the various challenges presented by the RWGS, providing a suite of techniques that are likely to be useful in a wide range of applications. In particular, we describe a general framework for dealing with nonlinear behavior using numerical integration techniques, extending the successful Unscented Filter. We also show how to use a fixed-point computation to deal with effects that develop at different time scales, specifically rapid changes occurring during slowly changing processes. We test our model using real data collected from the RWGS, demonstrating the feasibility of hybrid DBNs for monitoring complex real-world physical systems.
△ Less
Submitted 12 December, 2012;
originally announced January 2013.
-
Learning Hierarchical Object Maps Of Non-Stationary Environments with mobile robots
Authors:
Dragomir Anguelov,
Rahul Biswas,
Daphne Koller,
Benson Limketkai,
Sebastian Thrun
Abstract:
Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g.,…
▽ More
Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical representation, which links individual objects with generic shape templates of object classes. We derive an approximate EM algorithm for learning shape parameters at both levels of the hierarchy, using local occupancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of objects and object templates in the environment. Experimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office environments. The approach outperforms a previously developed non-hierarchical algorithm that models objects but lacks class templates.
△ Less
Submitted 12 December, 2012;
originally announced January 2013.
-
Learning Module Networks
Authors:
Eran Segal,
Dana Pe'er,
Aviv Regev,
Daphne Koller,
Nir Friedman
Abstract:
Methods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this…
▽ More
Methods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this paper, we consider a solution to this problem, suitable for domains where many variables have similar behavior. Our method is based on a new class of models, which we call module networks. A module network explicitly represents the notion of a module - a set of variables that have the same parents in the network and share the same conditional probability distribution. We define the semantics of module networks, and describe an algorithm that learns a module network from data. The algorithm learns both the partitioning of the variables into modules and the dependency structure between the variables. We evaluate our algorithm on synthetic data, and on real data in the domains of gene expression and the stock market. Our results show that module networks generalize better than Bayesian networks, and that the learned module network structure reveals regularities that are obscured in learned Bayesian networks.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
Learning Continuous Time Bayesian Networks
Authors:
Uri Nodelman,
Christian R. Shelton,
Daphne Koller
Abstract:
Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning…
▽ More
Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning parameters and structure of a CTBN from fully observed data. We define a conjugate prior for CTBNs, and show how it can be used both for Bayesian parameter estimation and as the basis of a Bayesian score for structure learning. Because acyclicity is not a constraint in CTBNs, we can show that the structure learning problem is significantly easier, both in theory and in practice, than structure learning for dynamic Bayesian networks (DBNs). Furthermore, as CTBNs can tailor the parameters and dependency structure to the different time granularities of the evolution of different variables, they can provide a better fit to continuous-time processes than DBNs with a fixed time granularity.
△ Less
Submitted 19 October, 2012;
originally announced December 2012.
-
Application of screened hybrid functionals to the bulk transition metals Rh, Pd, and Pt
Authors:
Fabien Tran,
David Koller,
Peter Blaha
Abstract:
We present the results of calculations on bulk transition metals Rh, Pd, and Pt using the screened hybrid functional YS-PBE0 [F. Tran and P. Blaha, Phys. Rev. B \textbf{83}, 235118 (2011)]. The results for the equilibrium geometry are compared with those obtained from (semi)local functionals, namely, the local density approximation and the generalized gradient approximation PBE of Perdew \textit{e…
▽ More
We present the results of calculations on bulk transition metals Rh, Pd, and Pt using the screened hybrid functional YS-PBE0 [F. Tran and P. Blaha, Phys. Rev. B \textbf{83}, 235118 (2011)]. The results for the equilibrium geometry are compared with those obtained from (semi)local functionals, namely, the local density approximation and the generalized gradient approximation PBE of Perdew \textit{et al}. [J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. \textbf{77}, 3865 (1996)]. It is shown that the screened hybrid functional yields more accurate equilibrium geometry than PBE, but, overall, it is not more accurate than LDA. However, in contradiction with experiment, we find that the screened hybrid functional favors a ferromagnetic state as the ground state for all three transition metals. Therefore, the use of hybrid functionals for, e.g., the study of catalytically active systems with correlated oxides on a metal support is questionable.
△ Less
Submitted 27 September, 2012;
originally announced September 2012.
-
Recovering Articulated Object Models from 3D Range Data
Authors:
Dragomir Anguelov,
Daphne Koller,
Hoi-Cheung Pang,
Praveen Srinivasan,
Sebastian Thrun
Abstract:
We address the problem of unsupervised learning of complex articulated object models from 3D range data. We describe an algorithm whose input is a set of meshes corresponding to different configurations of an articulated object. The algorithm automatically recovers a decomposition of the object into approximately rigid parts, the location of the parts in the different object instances, and the art…
▽ More
We address the problem of unsupervised learning of complex articulated object models from 3D range data. We describe an algorithm whose input is a set of meshes corresponding to different configurations of an articulated object. The algorithm automatically recovers a decomposition of the object into approximately rigid parts, the location of the parts in the different object instances, and the articulated object skeleton linking the parts. Our algorithm first registers allthe meshes using an unsupervised non-rigid technique described in a companion paper. It then segments the meshes using a graphical model that captures the spatial contiguity of parts. The segmentation is done using the EM algorithm, iterating between finding a decomposition of the object into rigid parts, and finding the location of the parts in the object instances. Although the graphical model is densely connected, the object decomposition step can be performed optimally and efficiently, allowing us to identify a large number of object parts while avoiding local maxima. We demonstrate the algorithm on real world datasets, recovering a 15-part articulated model of a human puppet from just 7 different puppet configurations, as well as a 4 part model of a fiexing arm where significant non-rigid deformation was present.
△ Less
Submitted 11 July, 2012;
originally announced July 2012.
-
Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks
Authors:
Marc Teyssier,
Daphne Koller
Abstract:
One of the basic tasks for Bayesian networks (BNs) is that of learning a network structure from data. The BN-learning problem is NP-hard, so the standard solution is heuristic search. Many approaches have been proposed for this task, but only a very small number outperform the baseline of greedy hill-climbing with tabu lists; moreover, many of the proposed algorithms are quite complex and hard to…
▽ More
One of the basic tasks for Bayesian networks (BNs) is that of learning a network structure from data. The BN-learning problem is NP-hard, so the standard solution is heuristic search. Many approaches have been proposed for this task, but only a very small number outperform the baseline of greedy hill-climbing with tabu lists; moreover, many of the proposed algorithms are quite complex and hard to implement. In this paper, we propose a very simple and easy-to-implement method for addressing this task. Our approach is based on the well-known fact that the best network (of bounded in-degree) consistent with a given node ordering can be found very efficiently. We therefore propose a search not over the space of structures, but over the space of orderings, selecting for each ordering the best network consistent with it. This search space is much smaller, makes more global search steps, has a lower branching factor, and avoids costly acyclicity checks. We present results for this algorithm on both synthetic and real data sets, evaluating both the score of the network found and in the running time. We show that ordering-based search outperforms the standard baseline, and is competitive with recent algorithms that are much harder to implement.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Expectation Maximization and Complex Duration Distributions for Continuous Time Bayesian Networks
Authors:
Uri Nodelman,
Christian R. Shelton,
Daphne Koller
Abstract:
Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning the parameters an…
▽ More
Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning the parameters and structure of a CTBN from partially observed data. We show how to apply expectation maximization (EM) and structural expectation maximization (SEM) to CTBNs. The availability of the EM algorithm allows us to extend the representation of CTBNs to allow a much richer class of transition durations distributions, known as phase distributions. This class is a highly expressive semi-parametric representation, which can approximate any duration distribution arbitrarily closely. This extension to the CTBN framework addresses one of the main limitations of both CTBNs and DBNs - the restriction to exponentially / geometrically distributed duration. We present experimental results on a real data set of people's life spans, showing that our algorithm learns reasonable models - structure and parameters - from partially observed data, and, with the use of phase distributions, achieves better performance than DBNs.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Expectation Propagation for Continuous Time Bayesian Networks
Authors:
Uri Nodelman,
Daphne Koller,
Christian R. Shelton
Abstract:
Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. As shown previously, exact inference in CTBNs is int…
▽ More
Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. As shown previously, exact inference in CTBNs is intractable. We address the problem of approximate inference, allowing for general queries conditioned on evidence over continuous time intervals and at discrete time points. We show how CTBNs can be parameterized within the exponential family, and use that insight to develop a message passing scheme in cluster graphs and allows us to apply expectation propagation to CTBNs. The clusters in our cluster graph do not contain distributions over the cluster variables at individual time points, but distributions over trajectories of the variables throughout a duration. Thus, unlike discrete time temporal models such as dynamic Bayesian networks, we can adapt the time granularity at which we reason for different variables and in different conditions.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Learning Factor Graphs in Polynomial Time & Sample Complexity
Authors:
Pieter Abbeel,
Daphne Koller,
Andrew Y. Ng
Abstract:
We study computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded factor size and bounded connectivity can be learned in polynomial time and polynomial number of samples, assuming that the data is generated by a network in this class. This result covers both parameter estimation for a known netwo…
▽ More
We study computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded factor size and bounded connectivity can be learned in polynomial time and polynomial number of samples, assuming that the data is generated by a network in this class. This result covers both parameter estimation for a known network structure and structure learning. It implies as a corollary that we can learn factor graphs for both Bayesian networks and Markov networks of bounded degree, in polynomial time and sample complexity. Unlike maximum likelihood estimation, our method does not require inference in the underlying network, and so applies to networks where inference is intractable. We also show that the error of our learned model degrades gracefully when the generating distribution is not a member of the target class of networks.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Continuous Time Markov Networks
Authors:
Tal El-Hay,
Nir Friedman,
Daphne Koller,
Raz Kupferman
Abstract:
A central task in many applications is reasoning about processes that change in a continuous time. The mathematical framework of Continuous Time Markov Processes provides the basic foundations for modeling such systems. Recently, Nodelman et al introduced continuous time Bayesian networks (CTBNs), which allow a compact representation of continuous-time processes over a factored state space. In thi…
▽ More
A central task in many applications is reasoning about processes that change in a continuous time. The mathematical framework of Continuous Time Markov Processes provides the basic foundations for modeling such systems. Recently, Nodelman et al introduced continuous time Bayesian networks (CTBNs), which allow a compact representation of continuous-time processes over a factored state space. In this paper, we introduce continuous time Markov networks (CTMNs), an alternative representation language that represents a different type of continuous-time dynamics. In many real life processes, such as biological and chemical systems, the dynamics of the process can be naturally described as an interplay between two forces - the tendency of each entity to change its state, and the overall fitness or energy function of the entire system. In our model, the first force is described by a continuous-time proposal process that suggests possible local changes to the state of the system at different rates. The second force is represented by a Markov network that encodes the fitness, or desirability, of different states; a proposed local change is then accepted with a probability that is a function of the change in the fitness distribution. We show that the fitness distribution is also the stationary distribution of the Markov process, so that this representation provides a characterization of a temporal process whose stationary distribution has a compact graphical representation. This allows us to naturally capture a different type of structure in complex dynamical processes, such as evolving biological sequences. We describe the semantics of the representation, its basic properties, and how it compares to CTBNs. We also provide algorithms for learning such models from data, and discuss its applicability to biological sequence evolution.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing
Authors:
Gal Elidan,
Ian McGraw,
Daphne Koller
Abstract:
Inference for probabilistic graphical models is still very much a practical challenge in large domains. The commonly used and effective belief propagation (BP) algorithm and its generalizations often do not converge when applied to hard, real-life inference tasks. While it is widely recognized that the scheduling of messages in these algorithms may have significant consequences, this issue remains…
▽ More
Inference for probabilistic graphical models is still very much a practical challenge in large domains. The commonly used and effective belief propagation (BP) algorithm and its generalizations often do not converge when applied to hard, real-life inference tasks. While it is widely recognized that the scheduling of messages in these algorithms may have significant consequences, this issue remains largely unexplored. In this work, we address the question of how to schedule messages for asynchronous propagation so that a fixed point is reached faster and more often. We first show that any reasonable asynchronous BP converges to a unique fixed point under conditions similar to those that guarantee convergence of synchronous BP. In addition, we show that the convergence rate of a simple round-robin schedule is at least as good as that of synchronous propagation. We then propose residual belief propagation (RBP), a novel, easy-to-implement, asynchronous propagation algorithm that schedules messages in an informed way, that pushes down a bound on the distance from the fixed point. Finally, we demonstrate the superiority of RBP over state-of-the-art methods for a variety of challenging synthetic and real-life problems: RBP converges significantly more often than other methods; and it significantly reduces running time until convergence, even when other methods converge.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Reasoning at the Right Time Granularity
Authors:
Suchi Saria,
Uri Nodelman,
Daphne Koller
Abstract:
Most real-world dynamic systems are composed of different components that often evolve at very different rates. In traditional temporal graphical models, such as dynamic Bayesian networks, time is modeled at a fixed granularity, generally selected based on the rate at which the fastest component evolves. Inference must then be performed at this fastest granularity, potentially at significant compu…
▽ More
Most real-world dynamic systems are composed of different components that often evolve at very different rates. In traditional temporal graphical models, such as dynamic Bayesian networks, time is modeled at a fixed granularity, generally selected based on the rate at which the fastest component evolves. Inference must then be performed at this fastest granularity, potentially at significant computational cost. Continuous Time Bayesian Networks (CTBNs) avoid time-slicing in the representation by modeling the system as evolving continuously over time. The expectation-propagation (EP) inference algorithm of Nodelman et al. (2005) can then vary the inference granularity over time, but the granularity is uniform across all parts of the system, and must be selected in advance. In this paper, we provide a new EP algorithm that utilizes a general cluster graph architecture where clusters contain distributions that can overlap in both space (set of variables) and time. This architecture allows different parts of the system to be modeled at very different time granularities, according to their current rate of evolution. We also provide an information-theoretic criterion for dynamically re-partitioning the clusters during inference to tune the level of approximation to the current rate of evolution. This avoids the need to hand-select the appropriate granularity, and allows the granularity to adapt as information is transmitted across the network. We present experiments demonstrating that this approach can result in significant computational savings.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.
-
Modeling Latent Variable Uncertainty for Loss-based Learning
Authors:
M. Pawan Kumar,
Ben Packer,
Daphne Koller
Abstract:
We consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation is modeled using latent variables. Previous methods overburden a single distribution with two separate tasks: (i) modeling the uncertainty in the latent variab…
▽ More
We consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation is modeled using latent variables. Previous methods overburden a single distribution with two separate tasks: (i) modeling the uncertainty in the latent variables during training; and (ii) making accurate predictions for the output and the latent variables during testing. We propose a novel framework that separates the demands of the two tasks using two distributions: (i) a conditional distribution to model the uncertainty of the latent variables for a given input-output pair; and (ii) a delta distribution to predict the output and the latent variables for a given input. During learning, we encourage agreement between the two distributions by minimizing a loss-based dissimilarity coefficient. Our approach generalizes latent SVM in two important ways: (i) it models the uncertainty over latent variables instead of relying on a pointwise estimate; and (ii) it allows the use of loss functions that depend on latent variables, which greatly increases its applicability. We demonstrate the efficacy of our approach on two challenging problems---object detection and action detection---using publicly available datasets.
△ Less
Submitted 18 June, 2012;
originally announced June 2012.
-
Constrained Approximate Maximum Entropy Learning of Markov Random Fields
Authors:
Varun Ganapathi,
David Vickrey,
John Duchi,
Daphne Koller
Abstract:
Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief propagation (LBP) can suffer from poor convergence. In this paper, we provide a different approach for combining MRF learning and Bethe approximation. We conside…
▽ More
Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief propagation (LBP) can suffer from poor convergence. In this paper, we provide a different approach for combining MRF learning and Bethe approximation. We consider the dual of maximum likelihood Markov network learning - maximizing entropy with moment matching constraints - and then approximate both the objective and the constraints in the resulting optimization problem. Unlike previous work along these lines (Teh & Welling, 2003), our formulation allows parameter sharing between features in a general log-linear model, parameter regularization and conditional training. We show that piecewise training (Sutton & McCallum, 2005) is a very restricted special case of this formulation. We study two optimization strategies: one based on a single convex approximation and one that uses repeated convex approximations. We show results on several real-world networks that demonstrate that these algorithms can significantly outperform learning with loopy and piecewise. Our results also provide a framework for analyzing the trade-offs of different relaxations of the entropy objective and of the constraints.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
Convex Point Estimation using Undirected Bayesian Transfer Hierarchies
Authors:
Gal Elidan,
Ben Packer,
Geremy Heitz,
Daphne Koller
Abstract:
When related learning tasks are naturally arranged in a hierarchy, an appealing approach for coping with scarcity of instances is that of transfer learning using a hierarchical Bayes framework. As fully Bayesian computations can be difficult and computationally demanding, it is often desirable to use posterior point estimates that facilitate (relatively) efficient prediction. However, the hierarch…
▽ More
When related learning tasks are naturally arranged in a hierarchy, an appealing approach for coping with scarcity of instances is that of transfer learning using a hierarchical Bayes framework. As fully Bayesian computations can be difficult and computationally demanding, it is often desirable to use posterior point estimates that facilitate (relatively) efficient prediction. However, the hierarchical Bayes framework does not always lend itself naturally to this maximum aposteriori goal. In this work we propose an undirected reformulation of hierarchical Bayes that relies on priors in the form of similarity measures. We introduce the notion of "degree of transfer" weights on components of these similarity measures, and show how they can be automatically learned within a joint probabilistic framework. Importantly, our reformulation results in a convex objective for many learning problems, thus facilitating optimal posterior point estimation using standard optimization techniques. In addition, we no longer require proper priors, allowing for flexible and straightforward specification of joint distributions over transfer hierarchies. We show that our framework is effective for learning models that are part of transfer hierarchies for two real-life tasks: object shape modeling using Gaussian density estimation and document classification.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
Projected Subgradient Methods for Learning Sparse Gaussians
Authors:
John Duchi,
Stephen Gould,
Daphne Koller
Abstract:
Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a high-dimensional space. Our approach uses the l1-norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in a…
▽ More
Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a high-dimensional space. Our approach uses the l1-norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in asymptotic complexity. We also extend the l1-regularized objective to the problem of sparsifying entire blocks within the inverse covariance matrix. Our methods generalize fairly easily to this case, while other methods do not. We demonstrate that our extensions give better generalization performance on two real domains--biological network analysis and a 2D-shape modeling image task.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
MAP Estimation of Semi-Metric MRFs via Hierarchical Graph Cuts
Authors:
M. Pawan Kumar,
Daphne Koller
Abstract:
We consider the task of obtaining the maximum a posteriori estimate of discrete pairwise random fields with arbitrary unary potentials and semimetric pairwise potentials. For this problem, we propose an accurate hierarchical move making strategy where each move is computed efficiently by solving an st-MINCUT problem. Unlike previous move making approaches, e.g. the widely used a-expansion algorith…
▽ More
We consider the task of obtaining the maximum a posteriori estimate of discrete pairwise random fields with arbitrary unary potentials and semimetric pairwise potentials. For this problem, we propose an accurate hierarchical move making strategy where each move is computed efficiently by solving an st-MINCUT problem. Unlike previous move making approaches, e.g. the widely used a-expansion algorithm, our method obtains the guarantees of the standard linear programming (LP) relaxation for the important special case of metric labeling. Unlike the existing LP relaxation solvers, e.g. interior-point algorithms or tree-reweighted message passing, our method is significantly faster as it uses only the efficient st-MINCUT algorithm in its design. Using both synthetic and real data experiments, we show that our technique outperforms several commonly used algorithms.
△ Less
Submitted 9 May, 2012;
originally announced May 2012.
-
A Continuation Method for Nash Equilibria in Structured Games
Authors:
B. Blum,
D. Koller,
C. R. Shelton
Abstract:
Structured game representations have recently attracted interest as models for multi-agent artificial intelligence scenarios, with rational behavior most commonly characterized by Nash equilibria. This paper presents efficient, exact algorithms for computing Nash equilibria in structured game representations, including both graphical games and multi-agent influence diagrams (MAIDs). The algorith…
▽ More
Structured game representations have recently attracted interest as models for multi-agent artificial intelligence scenarios, with rational behavior most commonly characterized by Nash equilibria. This paper presents efficient, exact algorithms for computing Nash equilibria in structured game representations, including both graphical games and multi-agent influence diagrams (MAIDs). The algorithms are derived from a continuation method for normal-form and extensive-form games due to Govindan and Wilson; they follow a trajectory through a space of perturbed games and their equilibria, exploiting game structure through fast computation of the Jacobian of the payoff function. They are theoretically guaranteed to find at least one equilibrium of the game, and may find more. Our approach provides the first efficient algorithm for computing exact equilibria in graphical games with arbitrary topology, and the first algorithm to exploit fine-grained structural properties of MAIDs. Experimental results are presented demonstrating the effectiveness of the algorithms and comparing them to predecessors. The running time of the graphical game algorithm is similar to, and often better than, the running time of previous approximate algorithms. The algorithm for MAIDs can effectively solve games that are much larger than those solvable by previous methods.
△ Less
Submitted 29 September, 2011;
originally announced October 2011.
-
Efficient Solution Algorithms for Factored MDPs
Authors:
C. Guestrin,
D. Koller,
R. Parr,
S. Venkataraman
Abstract:
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs…
▽ More
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.
△ Less
Submitted 9 June, 2011;
originally announced June 2011.
-
Discovering shared and individual latent structure in multiple time series
Authors:
Suchi Saria,
Daphne Koller,
Anna Penn
Abstract:
This paper proposes a nonparametric Bayesian method for exploratory data analysis and feature construction in continuous time series. Our method focuses on understanding shared features in a set of time series that exhibit significant individual variability. Our method builds on the framework of latent Diricihlet allocation (LDA) and its extension to hierarchical Dirichlet processes, which allows…
▽ More
This paper proposes a nonparametric Bayesian method for exploratory data analysis and feature construction in continuous time series. Our method focuses on understanding shared features in a set of time series that exhibit significant individual variability. Our method builds on the framework of latent Diricihlet allocation (LDA) and its extension to hierarchical Dirichlet processes, which allows us to characterize each series as switching between latent ``topics'', where each topic is characterized as a distribution over ``words'' that specify the series dynamics. However, unlike standard applications of LDA, we discover the words as we learn the model. We apply this model to the task of tracking the physiological signals of premature infants; our model obtains clinically significant insights as well as useful features for supervised learning tasks.
△ Less
Submitted 11 August, 2010;
originally announced August 2010.
-
Surface Plasmon Polariton microscope with Parabolic Reflectors
Authors:
Aurelien Drezet,
Daniel Koller,
Andreas Hohenau,
Alfred Leitner,
Franz R. Aussenegg,
Joachim R. Krenn
Abstract:
We report the realization of a two--dimensional optical microscope for surface plasmons polaritons (SPPs) based on parabolic Bragg mirrors. These mirrors are built from lithographically fabricated gold nanostructures on gold thin films. We show by direct imaging by leakage radiation microscopy that the magnification power of the SPP microscope follows basic predictions of geometrical optics. Spa…
▽ More
We report the realization of a two--dimensional optical microscope for surface plasmons polaritons (SPPs) based on parabolic Bragg mirrors. These mirrors are built from lithographically fabricated gold nanostructures on gold thin films. We show by direct imaging by leakage radiation microscopy that the magnification power of the SPP microscope follows basic predictions of geometrical optics. Spatial resolution down to the value set by the diffraction limit is demonstrated.
△ Less
Submitted 3 February, 2010;
originally announced February 2010.
-
Plasmonic crystal demultiplexer and multiports
Authors:
Aurelien Drezet,
Daniel Koller,
Andreas Hohenau,
Alfred Leitner,
Franz R. Aussenegg,
Joachim R. Krenn
Abstract:
Artificially built periodic optical structures in dielectric and metallic media have generated considerable interest due to their potential for optical device miniaturization. In this context plasmonics, i.e., optics based on surface plasmon polaritons (SPPs) offers new exciting prospects. SPPs are hybrid light/electron surface waves at the interface between a dielectric and a metal and as such…
▽ More
Artificially built periodic optical structures in dielectric and metallic media have generated considerable interest due to their potential for optical device miniaturization. In this context plasmonics, i.e., optics based on surface plasmon polaritons (SPPs) offers new exciting prospects. SPPs are hybrid light/electron surface waves at the interface between a dielectric and a metal and as such hold the potential for 2D optical functionality. Indeed, SPP elements as mirrors, splitters and interferometers have been recently demonstrated. However, for plasmonics to qualify at the information technology level requires necessarily the realization of wavelength division (demultiplexing) which constitutes a fundamental ingredient of optical communication. In the following we experimentally demonstrate 2D SPP demultiplexing in the visible spectral range by using photonic crystals for SPPs (plasmonic crystals). In addition, we demonstrate that plasmonic crystal are capable of realizing integrated linear multiports which could constitute building blocks of analog or quantum optical computing.
△ Less
Submitted 3 February, 2010;
originally announced February 2010.
-
Leakage radiation microscopy of surface plasmon polaritons
Authors:
A. Drezet,
A. Hohenau,
D. Koller,
A. Stepanov,
H. Ditlbacher,
B. Steinberger,
F. R. Aussenegg,
A. Leitner,
J. R. Krenn
Abstract:
We review the principle and methodology of leakage radiation microscopy (LRM) applied to surface plasmon polaritons (SPPs). Therefore we first analyse in detail the electromagnetic theory of leaky SPP waves. We show that LRM is a versatile optical far-field method allowing direct quantitative imaging and analysis of SPP propagation on thin metal films. We illustrate the LRM potentiality by analy…
▽ More
We review the principle and methodology of leakage radiation microscopy (LRM) applied to surface plasmon polaritons (SPPs). Therefore we first analyse in detail the electromagnetic theory of leaky SPP waves. We show that LRM is a versatile optical far-field method allowing direct quantitative imaging and analysis of SPP propagation on thin metal films. We illustrate the LRM potentiality by analyzing the propagation of SPP waves interacting with several two dimensional plasmonic devices realized and studied in the recent years.
△ Less
Submitted 3 February, 2010;
originally announced February 2010.