Skip to main content

Showing 1–14 of 14 results for author: Kundaje, A

  1. arXiv:2209.12487  [pdf, other

    cs.CE

    Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

    Authors: AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, John Willes, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik

    Abstract: The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the… ▽ More

    Submitted 11 October, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 29+21 pages, 6+19 figures, 6+2 tables

  2. arXiv:2012.07421  [pdf, other

    cs.LG

    WILDS: A Benchmark of in-the-Wild Distribution Shifts

    Authors: Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang

    Abstract: Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchma… ▽ More

    Submitted 16 July, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

  3. The importance of transparency and reproducibility in artificial intelligence research

    Authors: Benjamin Haibe-Kains, George Alexandru Adam, Ahmed Hosny, Farnoosh Khodakarami, MAQC Society Board, Levi Waldron, Bo Wang, Chris McIntosh, Anshul Kundaje, Casey S. Greene, Michael M. Hoffman, Jeffrey T. Leek, Wolfgang Huber, Alvis Brazma, Joelle Pineau, Robert Tibshirani, Trevor Hastie, John P. A. Ioannidis, John Quackenbush, Hugo J. W. L. Aerts

    Abstract: In their study, McKinney et al. showed the high potential of artificial intelligence for breast cancer screening. However, the lack of detailed methods and computer code undermines its scientific value. We identify obstacles hindering transparent and reproducible AI research as faced by McKinney et al and provide solutions with implications for the broader field.

    Submitted 7 March, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

    Journal ref: Nature 586 (2020) E14-E16

  4. arXiv:1908.09426  [pdf, other

    q-bio.GN

    A multi-modal neural network for learning cis and trans regulation of stress response in yeast

    Authors: Boxiang Liu, Nadine Hussami, Avanti Shrikumar, Tyler Shimko, Salil Bhate, Scott Longwell, Stephen Montgomery, Anshul Kundaje

    Abstract: Deciphering gene regulatory networks is a central problem in computational biology. Here, we explore the use of multi-modal neural networks to learn predictive models of gene expression that include cis and trans regulatory components. We learn models of stress response in the budding yeast Saccharomyces cerevisiae. Our models achieve high performance and substantially outperform other state-of-th… ▽ More

    Submitted 25 August, 2019; originally announced August 2019.

    Comments: 5 pages, 2 figures; Presented at NIPS 2017 MLCB workshop

  5. arXiv:1901.06852  [pdf, other

    cs.LG stat.ML

    Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation

    Authors: Amr Alexandari, Anshul Kundaje, Avanti Shrikumar

    Abstract: Label shift refers to the phenomenon where the prior class probability p(y) changes between the training and test distributions, while the conditional probability p(x|y) stays fixed. Label shift arises in settings like medical diagnosis, where a classifier trained to predict disease given symptoms must be adapted to scenarios where the baseline prevalence of the disease is different. Given estimat… ▽ More

    Submitted 26 June, 2020; v1 submitted 21 January, 2019; originally announced January 2019.

    Comments: ICML 2020

  6. arXiv:1811.00416  [pdf, other

    cs.LG q-bio.GN stat.ML

    Technical Note on Transcription Factor Motif Discovery from Importance Scores (TF-MoDISco) version 0.5.6.5

    Authors: Avanti Shrikumar, Katherine Tian, Žiga Avsec, Anna Shcherbina, Abhimanyu Banerjee, Mahfuza Sharmin, Surag Nair, Anshul Kundaje

    Abstract: TF-MoDISco (Transcription Factor Motif Discovery from Importance Scores) is an algorithm for identifying motifs from basepair-level importance scores computed on genomic sequence data. This technical note focuses on version v0.5.6.5. The implementation is available at https://github.com/kundajelab/tfmodisco/tree/v0.5.6.5

    Submitted 30 April, 2020; v1 submitted 31 October, 2018; originally announced November 2018.

  7. arXiv:1807.09946  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Computationally Efficient Measures of Internal Neuron Importance

    Authors: Avanti Shrikumar, Jocelin Su, Anshul Kundaje

    Abstract: The challenge of assigning importance to individual neurons in a network is of interest when interpreting deep learning models. In recent work, Dhamdhere et al. proposed Total Conductance, a "natural refinement of Integrated Gradients" for attributing importance to internal neurons. Unfortunately, the authors found that calculating conductance in tensorflow required the addition of several custom… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

    Comments: 7 pages, 2 figures

  8. arXiv:1802.07024  [pdf, other

    stat.ML cs.LG

    A General Framework for Abstention Under Label Shift

    Authors: Amr M. Alexandari, Anshul Kundaje, Avanti Shrikumar

    Abstract: In safety-critical applications of machine learning, it is often important to abstain from making predictions on low confidence examples. Standard abstention methods tend to be focused on optimizing top-k accuracy, but in many applications, accuracy is not the metric of interest. Further, label shift (a shift in class proportions between training time and prediction time) is ubiquitous in practica… ▽ More

    Submitted 19 June, 2022; v1 submitted 20 February, 2018; originally announced February 2018.

  9. arXiv:1707.09587  [pdf, other

    stat.AP q-bio.GN

    Network modelling of topological domains using Hi-C data

    Authors: Y. X. Rachel Wang, Purnamrita Sarkar, Oana Ursu, Anshul Kundaje, Peter J. Bickel

    Abstract: Chromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. One specific feature of the 3D organization is known as topologically associating domains (TADs), which are densely interacting, contiguous chromatin regions playing important roles in regulating gene expression. A few algorithms have been proposed to detect TADs. In part… ▽ More

    Submitted 17 October, 2019; v1 submitted 30 July, 2017; originally announced July 2017.

    Journal ref: Annals of Applied Statistics 2019, Vol. 13, No. 3, 1511-1536

  10. arXiv:1704.02685  [pdf, other

    cs.CV cs.LG cs.NE

    Learning Important Features Through Propagating Activation Differences

    Authors: Avanti Shrikumar, Peyton Greenside, Anshul Kundaje

    Abstract: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the ac… ▽ More

    Submitted 12 October, 2019; v1 submitted 9 April, 2017; originally announced April 2017.

    Comments: Updated to include changes present in the ICML camera-ready paper, and other small corrections

    Journal ref: PMLR 70:3145-3153, 2017

  11. arXiv:1605.01713  [pdf, other

    cs.LG cs.CV cs.NE

    Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

    Authors: Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, Anshul Kundaje

    Abstract: Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a… ▽ More

    Submitted 11 April, 2017; v1 submitted 5 May, 2016; originally announced May 2016.

    Comments: 6 pages, 3 figures, this is an older version; see https://arxiv.org/abs/1704.02685 for the newer version

  12. Motif Discovery through Predictive Modeling of Gene Regulation

    Authors: Manuel Middendorf, Anshul Kundaje, Mihir Shah, Yoav Freund, Chris H. Wiggins, Christina Leslie

    Abstract: We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algori… ▽ More

    Submitted 14 January, 2007; originally announced January 2007.

    Comments: RECOMB 2005

    Journal ref: Research in Computational Molecular Biology 2005

  13. arXiv:q-bio/0411028  [pdf, ps, other

    q-bio.QM

    Predicting Genetic Regulatory Response Using Classification

    Authors: Manuel Middendorf, Anshul Kundaje, Chris Wiggins, Yoav Freund, Christina Leslie

    Abstract: We present a novel classification-based method for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences (``motifs'') in the gene's regul… ▽ More

    Submitted 12 November, 2004; originally announced November 2004.

    Comments: 8 pages, 4 figures, presented at Twelfth International Conference on Intelligent Systems for Molecular Biology (ISMB 2004), supplemental website: http://www.cs.columbia.edu/compbio/geneclass

    Journal ref: Proceedings of the Twelfth International Conference on Intelligent Systems for Molecular Biology (ISMB 2004), Bioinformatics 20 Suppl 1, I232-I240, 2004

  14. arXiv:q-bio/0406016  [pdf, ps, other

    q-bio.QM

    Predicting Genetic Regulatory Response using Classification: Yeast Stress Response

    Authors: Manuel Middendorf, Anshul Kundaje, Chris Wiggins, Yoav Freund, Christina Leslie

    Abstract: We present a novel classification-based algorithm called GeneClass for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences (``motifs'')… ▽ More

    Submitted 8 June, 2004; v1 submitted 7 June, 2004; originally announced June 2004.

    Comments: Supplementary website: http://www.cs.columbia.edu/compbio/geneclass

    Journal ref: Proceedings of the First Annual RECOMB Regulation Workshop 2004