-
Linear Causal Disentanglement via Interventions
Authors:
Chandler Squires,
Anna Seigal,
Salil Bhate,
Caroline Uhler
Abstract:
Causal disentanglement seeks a representation of data involving latent variables that relate to one another via a causal model. A representation is identifiable if both the latent model and the transformation from latent to observed variables are unique. In this paper, we study observed variables that are a linear transformation of a linear latent causal model. Data from interventions are necessar…
▽ More
Causal disentanglement seeks a representation of data involving latent variables that relate to one another via a causal model. A representation is identifiable if both the latent model and the transformation from latent to observed variables are unique. In this paper, we study observed variables that are a linear transformation of a linear latent causal model. Data from interventions are necessary for identifiability: if one latent variable is missing an intervention, we show that there exist distinct models that cannot be distinguished. Conversely, we show that a single intervention on each latent variable is sufficient for identifiability. Our proof uses a generalization of the RQ decomposition of a matrix that replaces the usual orthogonal and upper triangular conditions with analogues depending on a partial order on the rows of the matrix, with partial order determined by a latent causal model. We corroborate our theoretical results with a method for causal disentanglement that accurately recovers a latent causal model.
△ Less
Submitted 11 June, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
A multi-modal neural network for learning cis and trans regulation of stress response in yeast
Authors:
Boxiang Liu,
Nadine Hussami,
Avanti Shrikumar,
Tyler Shimko,
Salil Bhate,
Scott Longwell,
Stephen Montgomery,
Anshul Kundaje
Abstract:
Deciphering gene regulatory networks is a central problem in computational biology. Here, we explore the use of multi-modal neural networks to learn predictive models of gene expression that include cis and trans regulatory components. We learn models of stress response in the budding yeast Saccharomyces cerevisiae. Our models achieve high performance and substantially outperform other state-of-th…
▽ More
Deciphering gene regulatory networks is a central problem in computational biology. Here, we explore the use of multi-modal neural networks to learn predictive models of gene expression that include cis and trans regulatory components. We learn models of stress response in the budding yeast Saccharomyces cerevisiae. Our models achieve high performance and substantially outperform other state-of-the-art methods such as boosting algorithms that use pre-defined cis-regulatory features. Our model learns several cis and trans regulators including well-known master stress response regulators. We use our models to perform in-silico TF knock-out experiments and demonstrate that in-silico predictions of target gene changes correlate with the results of the corresponding TF knockout microarray experiment.
△ Less
Submitted 25 August, 2019;
originally announced August 2019.
-
Theoretical foundations for the Human Cell Atlas
Authors:
Salil Bhate
Abstract:
In Schiebinger et al. (2017), the authors use optimal transport of measures on empirical distributions arising from biological experiments to relate the single cell RNA sequencing profiles for induced pluripotent stem cells differentiating. But such algorithms could be arbitrarily applied to any datasets from any collection of experiments. We consider here a natural question that arises: in a mann…
▽ More
In Schiebinger et al. (2017), the authors use optimal transport of measures on empirical distributions arising from biological experiments to relate the single cell RNA sequencing profiles for induced pluripotent stem cells differentiating. But such algorithms could be arbitrarily applied to any datasets from any collection of experiments. We consider here a natural question that arises: in a manner consistent with conventionally accepted assumptions about biology, in which cases can the results of two experiments be mapped to each other in this manner? The answer to this question is of fundamental practical importance in developing algorithms that use this method for analysing and integrating complex datasets collected as part of the Human Cell Atlas. Here, we develop a formulation of biology in terms of sheaves of $C^*(X)$-modules for a smooth manifold $X$ equipped with certain structures, that enables this question to be formally answered, leading to formal statements about experimental inference and phenotypic identifiability. These structures capture a perspective on biology that is consistent with a standard, widely accepted biological perspective and is mathematically intuitive. Our methods provide a framework in which to design complex experiments and the algorithms to analyse them in a way that their conclusions can be believed.
△ Less
Submitted 20 October, 2017;
originally announced October 2017.