-
Unraveling overoptimism and publication bias in ML-driven science
Authors:
Pouria Saidi,
Gautam Dasarathy,
Visar Berisha
Abstract:
Machine Learning (ML) is increasingly used across many disciplines with impressive reported results. However, recent studies suggest published performance of ML models are often overoptimistic. Validity concerns are underscored by findings of an inverse relationship between sample size and reported accuracy in published ML models, contrasting with the theory of learning curves where accuracy shoul…
▽ More
Machine Learning (ML) is increasingly used across many disciplines with impressive reported results. However, recent studies suggest published performance of ML models are often overoptimistic. Validity concerns are underscored by findings of an inverse relationship between sample size and reported accuracy in published ML models, contrasting with the theory of learning curves where accuracy should improve or remain stable with increasing sample size. This paper investigates factors contributing to overoptimism in ML-driven science, focusing on overfitting and publication bias. We introduce a novel stochastic model for observed accuracy, integrating parametric learning curves and the aforementioned biases. We construct an estimator that corrects for these biases in observed data. Theoretical and empirical results show that our framework can estimate the underlying learning curve, providing realistic performance assessments from published results. Applying the model to meta-analyses of classifications of neurological conditions, we estimate the inherent limits of ML-based prediction in each domain.
△ Less
Submitted 11 July, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
On Characterizations of Potential and Ordinal Potential Games
Authors:
Sina Arefizadeh,
Angelia Nedich,
Gautam Dasarathy
Abstract:
This paper investigates some necessary and sufficient conditions for a game to be a potential game. At first, we extend the classical results of Slade and Monderer and Shapley from games with one-dimensional action spaces to games with multi-dimensional action spaces, which require differentiable cost functions. Then, we provide a necessary and sufficient conditions for a game to have a potential…
▽ More
This paper investigates some necessary and sufficient conditions for a game to be a potential game. At first, we extend the classical results of Slade and Monderer and Shapley from games with one-dimensional action spaces to games with multi-dimensional action spaces, which require differentiable cost functions. Then, we provide a necessary and sufficient conditions for a game to have a potential function by investigating the structure of a potential function in terms of the players' cost differences, as opposed to differentials. This condition provides a systematic way for construction of a potential function, which is applied to network congestion games, as an example. Finally, we provide some sufficient conditions for a game to be ordinal potential and generalized ordinal potential.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Differential Analysis for Networks Obeying Conservation Laws
Authors:
Anirudh Rayas,
Rajasekhar Anguluri,
Jiajun Cheng,
Gautam Dasarathy
Abstract:
Networked systems that occur in various domains, such as the power grid, the brain, and opinion networks, are known to obey conservation laws. For instance, electric networks obey Kirchoff's laws, and social networks display opinion consensus. Such conservation laws are often modeled as balance equations that relate appropriate injected flows and potentials at the nodes of the networks. A recent l…
▽ More
Networked systems that occur in various domains, such as the power grid, the brain, and opinion networks, are known to obey conservation laws. For instance, electric networks obey Kirchoff's laws, and social networks display opinion consensus. Such conservation laws are often modeled as balance equations that relate appropriate injected flows and potentials at the nodes of the networks. A recent line of work considers the problem of estimating the unknown structure of such networked systems from observations of node potentials (and only the knowledge of the statistics of injected flows). Given the dynamic nature of the systems under consideration, an equally important task is estimating the change in the structure of the network from data -- the so called differential network analysis problem. That is, given two sets of node potential observations, the goal is to estimate the structural differences between the underlying networks. We formulate this novel differential network analysis problem for systems obeying conservation laws and devise a convex estimator to learn the edge changes directly from node potentials. We derive conditions under which the estimate is unique in the high-dimensional regime and devise an efficient ADMM-based approach to perform the estimation. Finally, we demonstrate the performance of our approach on synthetic and benchmark power network data.
△ Less
Submitted 30 January, 2023;
originally announced February 2023.
-
Active Sequential Two-Sample Testing
Authors:
Weizhi Li,
Prad Kadambi,
Pouria Saidi,
Karthikeyan Natesan Ramamurthy,
Gautam Dasarathy,
Visar Berisha
Abstract:
A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample features) are inexpensive to access, but their group memberships (or labels) are costly. To address the problem, we devise the first \emph{active sequential two…
▽ More
A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample features) are inexpensive to access, but their group memberships (or labels) are costly. To address the problem, we devise the first \emph{active sequential two-sample testing framework} that not only sequentially but also \emph{actively queries}. Our test statistic is a likelihood ratio where one likelihood is found by maximization over all class priors, and the other is provided by a probabilistic classification model. The classification model is adaptively updated and used to predict where the (unlabelled) features have a high dependency on labels; labeling the ``high-dependency'' features leads to the increased power of the proposed testing framework. In theory, we provide the proof that our framework produces an \emph{anytime-valid} $p$-value. In addition, we characterize the proposed framework's gain in testing power by analyzing the mutual information between the feature and label variables in asymptotic and finite-sample scenarios. In practice, we introduce an instantiation of our framework and evaluate it using several experiments; the experiments on the synthetic, MNIST, and application-specific datasets demonstrate that the testing power of the instantiated active sequential test significantly increases while the Type I error is under control.
△ Less
Submitted 27 June, 2024; v1 submitted 29 January, 2023;
originally announced January 2023.
-
Robust Model Selection of Gaussian Graphical Models
Authors:
Abrar Zahin,
Rajasekhar Anguluri,
Lalitha Sankar,
Oliver Kosut,
Gautam Dasarathy
Abstract:
In Gaussian graphical model selection, noise-corrupted samples present significant challenges. It is known that even minimal amounts of noise can obscure the underlying structure, leading to fundamental identifiability issues. A recent line of work addressing this "robust model selection" problem narrows its focus to tree-structured graphical models. Even within this specific class of models, exac…
▽ More
In Gaussian graphical model selection, noise-corrupted samples present significant challenges. It is known that even minimal amounts of noise can obscure the underlying structure, leading to fundamental identifiability issues. A recent line of work addressing this "robust model selection" problem narrows its focus to tree-structured graphical models. Even within this specific class of models, exact structure recovery is shown to be impossible. However, several algorithms have been developed that are known to provably recover the underlying tree-structure up to an (unavoidable) equivalence class.
In this paper, we extend these results beyond tree-structured graphs. We first characterize the equivalence class up to which general graphs can be recovered in the presence of noise. Despite the inherent ambiguity (which we prove is unavoidable), the structure that can be recovered reveals local clustering information and global connectivity patterns in the underlying model. Such information is useful in a range of real-world problems, including power grids, social networks, protein-protein interactions, and neural structures. We then propose an algorithm which provably recovers the underlying graph up to the identified ambiguity. We further provide finite sample guarantees in the high-dimensional regime for our algorithm and validate our results through numerical simulations.
△ Less
Submitted 7 May, 2024; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Transmission Line Parameter Estimation Under Non-Gaussian Measurement Noise
Authors:
Antos Cheeramban Varghese,
Anamitra Pal,
Gautam Dasarathy
Abstract:
Accurate knowledge of transmission line parameters is essential for a variety of power system monitoring, protection, and control applications. The use of phasor measurement unit (PMU) data for transmission line parameter estimation (TLPE) is well-documented. However, existing literature on PMU-based TLPE implicitly assumes the measurement noise to be Gaussian. Recently, it has been shown that the…
▽ More
Accurate knowledge of transmission line parameters is essential for a variety of power system monitoring, protection, and control applications. The use of phasor measurement unit (PMU) data for transmission line parameter estimation (TLPE) is well-documented. However, existing literature on PMU-based TLPE implicitly assumes the measurement noise to be Gaussian. Recently, it has been shown that the noise in PMU measurements (especially in the current phasors) is better represented by Gaussian mixture models (GMMs), i.e., the noises are non-Gaussian. We present a novel approach for TLPE that can handle non-Gaussian noise in the PMU measurements. The measurement noise is expressed as a GMM, whose components are identified using the expectation-maximization (EM) algorithm. Subsequently, noise and parameter estimation is carried out by solving a maximum likelihood estimation problem iteratively until convergence. The superior performance of the proposed approach over traditional approaches such as least squares and total least squares as well as the more recently proposed minimum total error entropy approach is demonstrated by performing simulations using the IEEE 118-bus system as well as proprietary PMU data obtained from a U.S. power utility.
△ Less
Submitted 27 August, 2022;
originally announced August 2022.
-
Controllability of Coarsely Measured Networked Linear Dynamical Systems (Extended Version)
Authors:
Nafiseh Ghoroghchian,
Rajasekhar Anguluri,
Gautam Dasarathy,
Stark C. Draper
Abstract:
We consider the controllability of large-scale linear networked dynamical systems when complete knowledge of network structure is unavailable and knowledge is limited to coarse summaries. We provide conditions under which average controllability of the fine-scale system can be well approximated by average controllability of the (synthesized, reduced-order) coarse-scale system. To this end, we requ…
▽ More
We consider the controllability of large-scale linear networked dynamical systems when complete knowledge of network structure is unavailable and knowledge is limited to coarse summaries. We provide conditions under which average controllability of the fine-scale system can be well approximated by average controllability of the (synthesized, reduced-order) coarse-scale system. To this end, we require knowledge of some inherent parametric structure of the fine-scale network that makes this type of approximation possible. Therefore, we assume that the underlying fine-scale network is generated by the stochastic block model (SBM) -- often studied in community detection. We then provide an algorithm that directly estimates the average controllability of the fine-scale system using a coarse summary of SBM. Our analysis indicates the necessity of underlying structure (e.g., in-built communities) to be able to quantify accurately the controllability from coarsely characterized networked dynamics. We also compare our method to that of the reduced-order method and highlight the regimes where both can outperform each other. Finally, we provide simulations to confirm our theoretical results for different scalings of network size and density, and the parameter that captures how much community-structure is retained in the coarse summary.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Learning the Structure of Large Networked Systems Obeying Conservation Laws
Authors:
Anirudh Rayas,
Rajasekhar Anguluri,
Gautam Dasarathy
Abstract:
Many networked systems such as electric networks, the brain, and social networks of opinion dynamics are known to obey conservation laws. Examples of this phenomenon include the Kirchoff laws in electric networks and opinion consensus in social networks. Conservation laws in networked systems may be modeled as balance equations of the form $X = B^{*} Y$, where the sparsity pattern of $B^{*}$ captu…
▽ More
Many networked systems such as electric networks, the brain, and social networks of opinion dynamics are known to obey conservation laws. Examples of this phenomenon include the Kirchoff laws in electric networks and opinion consensus in social networks. Conservation laws in networked systems may be modeled as balance equations of the form $X = B^{*} Y$, where the sparsity pattern of $B^{*}$ captures the connectivity of the network, and $Y, X \in \mathbb{R}^p$ are vectors of "potentials" and "injected flows" at the nodes respectively. The node potentials $Y$ cause flows across edges and the flows $X$ injected at the nodes are extraneous to the network dynamics. In several practical systems, the network structure is often unknown and needs to be estimated from data. Towards this, one has access to samples of the node potentials $Y$, but only the statistics of the node injections $X$. Motivated by this important problem, we study the estimation of the sparsity structure of the matrix $B^{*}$ from $n$ samples of $Y$ under the assumption that the node injections $X$ follow a Gaussian distribution with a known covariance $Σ_X$. We propose a new $\ell_{1}$-regularized maximum likelihood estimator for this problem in the high-dimensional regime where the size of the network $p$ is larger than sample size $n$. We show that this optimization problem is convex in the objective and admits a unique solution. Under a new mutual incoherence condition, we establish sufficient conditions on the triple $(n,p,d)$ for which exact sparsity recovery of $B^{*}$ is possible with high probability; $d$ is the degree of the graph. We also establish guarantees for the recovery of $B^{*}$ in the element-wise maximum, Frobenius, and operator norms. Finally, we complement these theoretical results with experimental validation of the performance of the proposed estimator on synthetic and real-world data.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
A Machine Learning Framework for Event Identification via Modal Analysis of PMU Data
Authors:
Nima T. Bazargani,
Gautam Dasarathy,
Lalitha Sankar,
Oliver Kosut
Abstract:
Power systems are prone to a variety of events (e.g. line trips and generation loss) and real-time identification of such events is crucial in terms of situational awareness, reliability, and security. Using measurements from multiple synchrophasors, i.e., phasor measurement units (PMUs), we propose to identify events by extracting features based on modal dynamics. We combine such traditional phys…
▽ More
Power systems are prone to a variety of events (e.g. line trips and generation loss) and real-time identification of such events is crucial in terms of situational awareness, reliability, and security. Using measurements from multiple synchrophasors, i.e., phasor measurement units (PMUs), we propose to identify events by extracting features based on modal dynamics. We combine such traditional physics-based feature extraction methods with machine learning to distinguish different event types. Including all measurement channels at each PMU allows exploiting diverse features but also requires learning classification models over a high-dimensional space. To address this issue, various feature selection methods are implemented to choose the best subset of features. Using the obtained subset of features, we investigate the performance of two well-known classification models, namely, logistic regression (LR) and support vector machines (SVM) to identify generation loss and line trip events in two datasets. The first dataset is obtained from simulated generation loss and line trip events in the Texas 2000-bus synthetic grid. The second is a proprietary dataset with labeled events obtained from a large utility in the USA involving measurements from nearly 500 PMUs. Our results indicate that the proposed framework is promising for identifying the two types of events.
△ Less
Submitted 3 October, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
A label-efficient two-sample test
Authors:
Weizhi Li,
Gautam Dasarathy,
Karthikeyan Natesan Ramamurthy,
Visar Berisha
Abstract:
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two…
▽ More
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed \emph{bimodal query} is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at \url{https://github.com/wayne0908/Label-Efficient-Two-Sample}.
△ Less
Submitted 19 July, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Maximizing and Satisficing in Multi-armed Bandits with Graph Information
Authors:
Parth K. Thaker,
Mohit Malu,
Nikhil Rao,
Gautam Dasarathy
Abstract:
Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to simila…
▽ More
Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similar relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms are captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one with a sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms \textbf{\algoname{}} (GRaph-based UcB) and $ζ$-\textbf{\algoname{}} for these problems and provide a theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement, showing a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.
△ Less
Submitted 20 November, 2022; v1 submitted 2 August, 2021;
originally announced August 2021.
-
State and Topology Estimation for Unobservable Distribution Systems using Deep Neural Networks
Authors:
Behrouz Azimian,
Reetam Sen Biswas,
Shiva Moshtagh,
Anamitra Pal,
Lang Tong,
Gautam Dasarathy
Abstract:
Time-synchronized state estimation for reconfigurable distribution networks is challenging because of limited real-time observability. This paper addresses this challenge by formulating a deep learning (DL)-based approach for topology identification (TI) and unbalanced three-phase distribution system state estimation (DSSE). Two deep neural networks (DNNs) are trained for time-synchronized DNN-bas…
▽ More
Time-synchronized state estimation for reconfigurable distribution networks is challenging because of limited real-time observability. This paper addresses this challenge by formulating a deep learning (DL)-based approach for topology identification (TI) and unbalanced three-phase distribution system state estimation (DSSE). Two deep neural networks (DNNs) are trained for time-synchronized DNN-based TI and DSSE, respectively, for systems that are incompletely observed by synchrophasor measurement devices (SMDs) in real-time. A data-driven approach for judicious SMD placement to facilitate reliable TI and DSSE is also provided. Robustness of the proposed methodology is demonstrated by considering non-Gaussian noise in the SMD measurements. A comparison of the DNN-based DSSE with more conventional approaches indicates that the DL-based approach gives better accuracy with smaller number of SMDs.
△ Less
Submitted 26 March, 2022; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Graph Community Detection from Coarse Measurements: Recovery Conditions for the Coarsened Weighted Stochastic Block Model
Authors:
Nafiseh Ghoroghchian,
Gautam Dasarathy,
Stark C. Draper
Abstract:
We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our…
▽ More
We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our objective is to develop conditions on the graph structure, the quantity, and properties of measurements, under which we can recover the community organization in this coarse graph. In this paper, we build on the stochastic block model by mathematically formalizing the coarsening process, and characterizing its impact on the community members and connections. Through this novel setup and modeling, we characterize an error bound for community recovery. The error bound yields simple and closed-form asymptotic conditions to achieve the perfect recovery of the coarse graph communities.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Finding the Homology of Decision Boundaries with Active Learning
Authors:
Weizhi Li,
Gautam Dasarathy,
Karthikeyan Natesan Ramamurthy,
Visar Berisha
Abstract:
Accurately and efficiently characterizing the decision boundary of classifiers is important for problems related to model selection and meta-learning. Inspired by topological data analysis, the characterization of decision boundaries using their homology has recently emerged as a general and powerful tool. In this paper, we propose an active learning algorithm to recover the homology of decision b…
▽ More
Accurately and efficiently characterizing the decision boundary of classifiers is important for problems related to model selection and meta-learning. Inspired by topological data analysis, the characterization of decision boundaries using their homology has recently emerged as a general and powerful tool. In this paper, we propose an active learning algorithm to recover the homology of decision boundaries. Our algorithm sequentially and adaptively selects which samples it requires the labels of. We theoretically analyze the proposed framework and show that the query complexity of our active learning algorithm depends naturally on the intrinsic complexity of the underlying manifold. We demonstrate the effectiveness of our framework in selecting best-performing machine learning models for datasets just using their respective homological summaries. Experiments on several standard datasets show the sample complexity improvement in recovering the homology and demonstrate the practical utility of the framework for model selection. Source code for our algorithms and experimental results is available at https://github.com/wayne0908/Active-Learning-Homology.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Differentiable Programming for Hyperspectral Unmixing using a Physics-based Dispersion Model
Authors:
John Janiczek,
Parth Thaker,
Gautam Dasarathy,
Christopher S. Edwards,
Philip Christensen,
Suren Jayasuriya
Abstract:
Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visible-to-infrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this paper, spectral variation is considere…
▽ More
Hyperspectral unmixing is an important remote sensing task with applications including material identification and analysis. Characteristic spectral features make many pure materials identifiable from their visible-to-infrared spectra, but quantifying their presence within a mixture is a challenging task due to nonlinearities and factors of variation. In this paper, spectral variation is considered from a physics-based approach and incorporated into an end-to-end spectral unmixing algorithm via differentiable programming. The dispersion model is introduced to simulate realistic spectral variation, and an efficient method to fit the parameters is presented. Then, this dispersion model is utilized as a generative model within an analysis-by-synthesis spectral unmixing algorithm. Further, a technique for inverse rendering using a convolutional neural network to predict parameters of the generative model is introduced to enhance performance and speed when training data is available. Results achieve state-of-the-art on both infrared and visible-to-near-infrared (VNIR) datasets, and show promise for the synergy between physics-based models and deep learning in hyperspectral unmixing in the future.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
On the alpha-loss Landscape in the Logistic Model
Authors:
Tyler Sypherd,
Mario Diaz,
Lalitha Sankar,
Gautam Dasarathy
Abstract:
We analyze the optimization landscape of a recently introduced tunable class of loss functions called $α$-loss, $α\in (0,\infty]$, in the logistic model. This family encapsulates the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$) and contains compelling properties that enable the practitioner to discern among a host of operating conditions relevant to emerging l…
▽ More
We analyze the optimization landscape of a recently introduced tunable class of loss functions called $α$-loss, $α\in (0,\infty]$, in the logistic model. This family encapsulates the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$) and contains compelling properties that enable the practitioner to discern among a host of operating conditions relevant to emerging learning methods. Specifically, we study the evolution of the optimization landscape of $α$-loss with respect to $α$ using tools drawn from the study of strictly-locally-quasi-convex functions in addition to geometric techniques. We interpret these results in terms of optimization complexity via normalized gradient descent.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
On the Sample Complexity and Optimization Landscape for Quadratic Feasibility Problems
Authors:
Parth Thaker,
Gautam Dasarathy,
Angelia Nedić
Abstract:
We consider the problem of recovering a complex vector $\mathbf{x}\in \mathbb{C}^n$ from $m$ quadratic measurements $\{\langle A_i\mathbf{x}, \mathbf{x}\rangle\}_{i=1}^m$. This problem, known as quadratic feasibility, encompasses the well known phase retrieval problem and has applications in a wide range of important areas including power system state estimation and x-ray crystallography. In gener…
▽ More
We consider the problem of recovering a complex vector $\mathbf{x}\in \mathbb{C}^n$ from $m$ quadratic measurements $\{\langle A_i\mathbf{x}, \mathbf{x}\rangle\}_{i=1}^m$. This problem, known as quadratic feasibility, encompasses the well known phase retrieval problem and has applications in a wide range of important areas including power system state estimation and x-ray crystallography. In general, not only is the the quadratic feasibility problem NP-hard to solve, but it may in fact be unidentifiable. In this paper, we establish conditions under which this problem becomes {identifiable}, and further prove isometry properties in the case when the matrices $\{A_i\}_{i=1}^m$ are Hermitian matrices sampled from a complex Gaussian distribution. Moreover, we explore a nonconvex {optimization} formulation of this problem, and establish salient features of the associated optimization landscape that enables gradient algorithms with an arbitrary initialization to converge to a \emph{globally optimal} point with a high probability. Our results also reveal sample complexity requirements for successfully identifying a feasible solution in these contexts.
△ Less
Submitted 14 December, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Regularization via Structural Label Smoothing
Authors:
Weizhi Li,
Gautam Dasarathy,
Visar Berisha
Abstract:
Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to…
▽ More
Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such label smoothing imposes a quantifiable bias in the Bayes error rate of the training data, with regions of the feature space with high overlap and low marginal likelihood having a lower bias and regions of low overlap and high marginal likelihood having a higher bias. These theoretical results motivate a simple objective function for data-dependent smoothing to mitigate the potential negative consequences of the operation while maintaining its desirable properties as a regularizer. We call this approach Structural Label Smoothing (SLS). We implement SLS and empirically validate on synthetic, Higgs, SVHN, CIFAR-10, and CIFAR-100 datasets. The results confirm our theoretical insights and demonstrate the effectiveness of the proposed method in comparison to traditional label smoothing.
△ Less
Submitted 4 July, 2020; v1 submitted 7 January, 2020;
originally announced January 2020.
-
Graph quilting: graphical model selection from partially observed covariances
Authors:
Giuseppe Vinci,
Gautam Dasarathy,
Genevera I. Allen
Abstract:
Graphical model selection is a seemingly impossible task when many pairs of variables are never jointly observed; this requires inference of conditional dependencies with no observations of corresponding marginal dependencies. This under-explored statistical problem arises in neuroimaging, for example, when different partially overlapping subsets of neurons are recorded in non-simultaneous session…
▽ More
Graphical model selection is a seemingly impossible task when many pairs of variables are never jointly observed; this requires inference of conditional dependencies with no observations of corresponding marginal dependencies. This under-explored statistical problem arises in neuroimaging, for example, when different partially overlapping subsets of neurons are recorded in non-simultaneous sessions. We call this statistical challenge the "Graph Quilting" problem. We study this problem in the context of sparse inverse covariance learning, and focus on Gaussian graphical models where we show that missing parts of the covariance matrix yields an unidentifiable precision matrix specifying the graph. Nonetheless, we show that, under mild conditions, it is possible to correctly identify edges connecting the observed pairs of nodes. Additionally, we show that we can recover a minimal superset of edges connecting variables that are never jointly observed. Thus, one can infer conditional relationships even when marginal relationships are unobserved, a surprising result! To accomplish this, we propose an $\ell_1$-regularized partially observed likelihood-based graph estimator and provide performance guarantees in population and in high-dimensional finite-sample settings. We illustrate our approach using synthetic data, as well as for learning functional neural connectivity from calcium imaging data.
△ Less
Submitted 15 February, 2023; v1 submitted 11 December, 2019;
originally announced December 2019.
-
A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization
Authors:
Tyler Sypherd,
Mario Diaz,
John Kevin Cava,
Gautam Dasarathy,
Peter Kairouz,
Lalitha Sankar
Abstract:
We introduce a tunable loss function called $α$-loss, parameterized by $α\in (0,\infty]$, which interpolates between the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $α$-loss and Arimoto conditional entropy, verify the classification-calibration o…
▽ More
We introduce a tunable loss function called $α$-loss, parameterized by $α\in (0,\infty]$, which interpolates between the exponential loss ($α= 1/2$), the log-loss ($α= 1$), and the 0-1 loss ($α= \infty$), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between $α$-loss and Arimoto conditional entropy, verify the classification-calibration of $α$-loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of $α$-loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning $α$-loss away from log-loss ($α= 1$), and to this end we provide simple heuristics for the practitioner. In particular, navigating the $α$ hyperparameter can readily provide superior model robustness to label flips ($α> 1$) and sensitivity to imbalanced classes ($α< 1$).
△ Less
Submitted 21 December, 2022; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Thresholding Graph Bandits with GrAPL
Authors:
Daniel LeJeune,
Gautam Dasarathy,
Richard G. Baraniuk
Abstract:
In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits. The main goal is to efficiently identify a subset of arms in a multi-armed bandit problem whose means are above a specified threshold. While traditionally in such problems, the arms are assumed to be independent, in our paradigm we further suppose that we have access to the similarity between…
▽ More
In this paper, we introduce a new online decision making paradigm that we call Thresholding Graph Bandits. The main goal is to efficiently identify a subset of arms in a multi-armed bandit problem whose means are above a specified threshold. While traditionally in such problems, the arms are assumed to be independent, in our paradigm we further suppose that we have access to the similarity between the arms in the form of a graph, allowing us gain information about the arm means in fewer samples. Such settings play a key role in a wide range of modern decision making problems where rapid decisions need to be made in spite of the large number of options available at each time. We present GrAPL, a novel algorithm for the thresholding graph bandit problem. We demonstrate theoretically that this algorithm is effective in taking advantage of the graph structure when available and the reward function homophily (that strongly connected arms have similar rewards) when favorable. We confirm these theoretical findings via experiments on both synthetic and real data.
△ Less
Submitted 24 March, 2020; v1 submitted 22 May, 2019;
originally announced May 2019.
-
IdeoTrace: A Framework for Ideology Tracing with a Case Study on the 2016 U.S. Presidential Election
Authors:
Indu Manickam,
Andrew S. Lan,
Gautam Dasarathy,
Richard G. Baraniuk
Abstract:
The 2016 United States presidential election has been characterized as a period of extreme divisiveness that was exacerbated on social media by the influence of fake news, trolls, and social bots. However, the extent to which the public became more polarized in response to these influences over the course of the election is not well understood. In this paper we propose IdeoTrace, a framework for (…
▽ More
The 2016 United States presidential election has been characterized as a period of extreme divisiveness that was exacerbated on social media by the influence of fake news, trolls, and social bots. However, the extent to which the public became more polarized in response to these influences over the course of the election is not well understood. In this paper we propose IdeoTrace, a framework for (i) jointly estimating the ideology of social media users and news websites and (ii) tracing changes in user ideology over time. We apply this framework to the last two months of the election period for a group of 47508 Twitter users and demonstrate that both liberal and conservative users became more polarized over time.
△ Less
Submitted 30 May, 2019; v1 submitted 21 May, 2019;
originally announced May 2019.
-
MISSION: Ultra Large-Scale Feature Selection using Count-Sketches
Authors:
Amirali Aghazadeh,
Ryan Spring,
Daniel LeJeune,
Gautam Dasarathy,
Anshumali Shrivastava,
Richard G. Baraniuk
Abstract:
Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions…
▽ More
Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions of dimensions. At such scale, even storing the feature vector is impossible, causing most existing feature selection methods to fail. Workarounds like feature hashing, a standard approach to large-scale machine learning, helps with the computational feasibility, but at the cost of losing the interpretability of features. In this paper, we present MISSION, a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure. MISSION retains the simplicity of feature hashing without sacrificing the interpretability of the features while using only O(log^2(p)) working memory. We demonstrate that MISSION accurately and efficiently performs feature selection on real-world, large-scale datasets with billions of dimensions.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
Coalescent-based species tree estimation: a stochastic Farris transform
Authors:
Gautam Dasarathy,
Elchanan Mossel,
Robert Nowak,
Sebastien Roch
Abstract:
The reconstruction of a species phylogeny from genomic data faces two significant hurdles: 1) the trees describing the evolution of each individual gene--i.e., the gene trees--may differ from the species phylogeny and 2) the molecular sequences corresponding to each gene often provide limited information about the gene trees themselves. In this paper we consider an approach to species tree reconst…
▽ More
The reconstruction of a species phylogeny from genomic data faces two significant hurdles: 1) the trees describing the evolution of each individual gene--i.e., the gene trees--may differ from the species phylogeny and 2) the molecular sequences corresponding to each gene often provide limited information about the gene trees themselves. In this paper we consider an approach to species tree reconstruction that addresses both these hurdles. Specifically, we propose an algorithm for phylogeny reconstruction under the multispecies coalescent model with a standard model of site substitution. The multispecies coalescent is commonly used to model gene tree discordance due to incomplete lineage sorting, a well-studied population-genetic effect.
In previous work, an information-theoretic trade-off was derived in this context between the number of loci, $m$, needed for an accurate reconstruction and the length of the locus sequences, $k$. It was shown that to reconstruct an internal branch of length $f$, one needs $m$ to be of the order of $1/[f^{2} \sqrt{k}]$. That previous result was obtained under the molecular clock assumption, i.e., under the assumption that mutation rates (as well as population sizes) are constant across the species phylogeny.
Here we generalize this result beyond the restrictive molecular clock assumption, and obtain a new reconstruction algorithm that has the same data requirement (up to log factors). Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with $n \geq 3$ species, the rooted species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
DeepCodec: Adaptive Sensing and Recovery via Deep Convolutional Neural Networks
Authors:
Ali Mousavi,
Gautam Dasarathy,
Richard G. Baraniuk
Abstract:
In this paper we develop a novel computational sensing framework for sensing and recovering structured signals. When trained on a set of representative signals, our framework learns to take undersampled measurements and recover signals from them using a deep convolutional neural network. In other words, it learns a transformation from the original signals to a near-optimal number of undersampled m…
▽ More
In this paper we develop a novel computational sensing framework for sensing and recovering structured signals. When trained on a set of representative signals, our framework learns to take undersampled measurements and recover signals from them using a deep convolutional neural network. In other words, it learns a transformation from the original signals to a near-optimal number of undersampled measurements and the inverse transformation from measurements to signals. This is in contrast to traditional compressive sensing (CS) systems that use random linear measurements and convex optimization or iterative algorithms for signal recovery. We compare our new framework with $\ell_1$-minimization from the phase transition point of view and demonstrate that it outperforms $\ell_1$-minimization in the regions of phase transition plot where $\ell_1$-minimization cannot recover the exact solution. In addition, we experimentally demonstrate how learning measurements enhances the overall recovery performance, speeds up training of recovery framework, and leads to having fewer parameters to learn.
△ Less
Submitted 11 July, 2017;
originally announced July 2017.
-
Multi-fidelity Bayesian Optimisation with Continuous Approximations
Authors:
Kirthevasan Kandasamy,
Gautam Dasarathy,
Jeff Schneider,
Barnabas Poczos
Abstract:
Bandit methods for black-box optimisation, such as Bayesian optimisation, are used in a variety of applications including hyper-parameter tuning and experiment design. Recently, \emph{multi-fidelity} methods have garnered considerable attention since function evaluations have become increasingly expensive in such applications. Multi-fidelity methods use cheap approximations to the function of inte…
▽ More
Bandit methods for black-box optimisation, such as Bayesian optimisation, are used in a variety of applications including hyper-parameter tuning and experiment design. Recently, \emph{multi-fidelity} methods have garnered considerable attention since function evaluations have become increasingly expensive in such applications. Multi-fidelity methods use cheap approximations to the function of interest to speed up the overall optimisation process. However, most multi-fidelity methods assume only a finite number of approximations. In many practical applications however, a continuous spectrum of approximations might be available. For instance, when tuning an expensive neural network, one might choose to approximate the cross validation performance using less data $N$ and/or few training iterations $T$. Here, the approximations are best viewed as arising out of a continuous two dimensional space $(N,T)$. In this work, we develop a Bayesian optimisation method, BOCA, for this setting. We characterise its theoretical properties and show that it achieves better regret than than strategies which ignore the approximations. BOCA outperforms several other baselines in synthetic and real experiments.
△ Less
Submitted 17 March, 2017;
originally announced March 2017.
-
The Multi-fidelity Multi-armed Bandit
Authors:
Kirthevasan Kandasamy,
Gautam Dasarathy,
Jeff Schneider,
Barnabás Póczos
Abstract:
We study a variant of the classical stochastic $K$-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, t…
▽ More
We study a variant of the classical stochastic $K$-armed bandit where observing the outcome of each arm is expensive, but cheap approximations to this outcome are available. For example, in online advertising the performance of an ad can be approximated by displaying it for shorter time periods or to narrower audiences. We formalise this task as a multi-fidelity bandit, where, at each time step, the forecaster may choose to play an arm at any one of $M$ fidelities. The highest fidelity (desired outcome) expends cost $λ^{(m)}$. The $m^{\text{th}}$ fidelity (an approximation) expends $λ^{(m)} < λ^{(M)}$ and returns a biased estimate of the highest fidelity. We develop MF-UCB, a novel upper confidence bound procedure for this setting and prove that it naturally adapts to the sequence of available approximations and costs thus attaining better regret than naive strategies which ignore the approximations. For instance, in the above online advertising example, MF-UCB would use the lower fidelities to quickly eliminate suboptimal ads and reserve the larger expensive experiments on a small set of promising candidates. We complement this result with a lower bound and show that MF-UCB is nearly optimal under certain conditions.
△ Less
Submitted 30 October, 2016;
originally announced October 2016.
-
Multi-fidelity Gaussian Process Bandit Optimisation
Authors:
Kirthevasan Kandasamy,
Gautam Dasarathy,
Junier B. Oliva,
Jeff Schneider,
Barnabas Poczos
Abstract:
In many scientific and engineering applications, we are tasked with the maximisation of an expensive to evaluate black box function $f$. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $f$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer s…
▽ More
In many scientific and engineering applications, we are tasked with the maximisation of an expensive to evaluate black box function $f$. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $f$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of $f$ in a small but promising region and speedily identify the optimum. We formalise this task as a \emph{multi-fidelity} bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop MF-GP-UCB, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour, and achieves better regret than strategies which ignore multi-fidelity information. Empirically, MF-GP-UCB outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments.
△ Less
Submitted 15 March, 2019; v1 submitted 20 March, 2016;
originally announced March 2016.
-
Active Learning Algorithms for Graphical Model Selection
Authors:
Gautam Dasarathy,
Aarti Singh,
Maria-Florina Balcan,
Jong Hyuk Park
Abstract:
The problem of learning the structure of a high dimensional graphical model from data has received considerable attention in recent years. In many applications such as sensor networks and proteomics it is often expensive to obtain samples from all the variables involved simultaneously. For instance, this might involve the synchronization of a large number of sensors or the tagging of a large numbe…
▽ More
The problem of learning the structure of a high dimensional graphical model from data has received considerable attention in recent years. In many applications such as sensor networks and proteomics it is often expensive to obtain samples from all the variables involved simultaneously. For instance, this might involve the synchronization of a large number of sensors or the tagging of a large number of proteins. To address this important issue, we initiate the study of a novel graphical model selection problem, where the goal is to optimize the total number of scalar samples obtained by allowing the collection of samples from only subsets of the variables. We propose a general paradigm for graphical model selection where feedback is used to guide the sampling to high degree vertices, while obtaining only few samples from the ones with the low degrees. We instantiate this framework with two specific active learning algorithms, one of which makes mild assumptions but is computationally expensive, while the other is more computationally efficient but requires stronger (nevertheless standard) assumptions. Whereas the sample complexity of passive algorithms is typically a function of the maximum degree of the graph, we show that the sample complexity of our algorithms is provable smaller and that it depends on a novel local complexity measure that is akin to the average degree of the graph. We finally demonstrate the efficacy of our framework via simulations.
△ Less
Submitted 7 April, 2016; v1 submitted 31 January, 2016;
originally announced February 2016.
-
S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification
Authors:
Gautam Dasarathy,
Robert Nowak,
Xiaojin Zhu
Abstract:
This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and label-efficient algorithm called S2 for this task. At each step, S2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S2 queries for the label of the vertex that bisects the *shortest shortest* path between any…
▽ More
This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and label-efficient algorithm called S2 for this task. At each step, S2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S2 queries for the label of the vertex that bisects the *shortest shortest* path between any pair of oppositely labeled vertices. We present a theoretical estimate of the number of queries S2 needs in terms of a novel parametrization of the complexity of binary functions on graphs. We also present experimental results demonstrating the performance of S2 on both real and synthetic data. While other graph-based active learning algorithms have shown promise in practice, our algorithm is the first with both good performance and theoretical guarantees. Finally, we demonstrate the implications of the S2 algorithm to the theory of nonparametric active learning. In particular, we show that S2 achieves near minimax optimal excess risk for an important class of nonparametric classification problems.
△ Less
Submitted 29 June, 2015;
originally announced June 2015.
-
Data Requirement for Phylogenetic Inference from Multiple Loci: A New Distance Method
Authors:
Gautam Dasarathy,
Robert Nowak,
Sebastien Roch
Abstract:
We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate g…
▽ More
We consider the problem of estimating the evolutionary history of a set of species (phylogeny or species tree) from several genes. It is known that the evolutionary history of individual genes (gene trees) might be topologically distinct from each other and from the underlying species tree, possibly confounding phylogenetic analysis. A further complication in practice is that one has to estimate gene trees from molecular sequences of finite length. We provide the first full data-requirement analysis of a species tree reconstruction method that takes into account estimation errors at the gene level. Under that criterion, we also devise a novel reconstruction algorithm that provably improves over all previous methods in a regime of interest.
△ Less
Submitted 30 June, 2014; v1 submitted 28 April, 2014;
originally announced April 2014.
-
Sketching Sparse Matrices
Authors:
Gautam Dasarathy,
Parikshit Shah,
Badri Narayan Bhaskar,
Robert Nowak
Abstract:
This paper considers the problem of recovering an unknown sparse p\times p matrix X from an m\times m matrix Y=AXB^T, where A and B are known m \times p matrices with m << p.
The main result shows that there exist constructions of the "sketching" matrices A and B so that even if X has O(p) non-zeros, it can be recovered exactly and efficiently using a convex program as long as these non-zeros ar…
▽ More
This paper considers the problem of recovering an unknown sparse p\times p matrix X from an m\times m matrix Y=AXB^T, where A and B are known m \times p matrices with m << p.
The main result shows that there exist constructions of the "sketching" matrices A and B so that even if X has O(p) non-zeros, it can be recovered exactly and efficiently using a convex program as long as these non-zeros are not concentrated in any single row/column of X. Furthermore, it suffices for the size of Y (the sketch dimension) to scale as m = O(\sqrt{# nonzeros in X} \times log p). The results also show that the recovery is robust and stable in the sense that if X is equal to a sparse matrix plus a perturbation, then the convex program we propose produces an approximation with accuracy proportional to the size of the perturbation. Unlike traditional results on sparse recovery, where the sensing matrix produces independent measurements, our sensing operator is highly constrained (it assumes a tensor product structure). Therefore, proving recovery guarantees require non-standard techniques. Indeed our approach relies on a novel result concerning tensor products of bipartite graphs, which may be of independent interest.
This problem is motivated by the following application, among others. Consider a p\times n data matrix D, consisting of n observations of p variables. Assume that the correlation matrix X:=DD^{T} is (approximately) sparse in the sense that each of the p variables is significantly correlated with only a few others. Our results show that these significant correlations can be detected even if we have access to only a sketch of the data S=AD with A \in R^{m\times p}.
△ Less
Submitted 26 March, 2013;
originally announced March 2013.
-
Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities
Authors:
Brian Eriksson,
Gautam Dasarathy,
Aarti Singh,
Robert Nowak
Abstract:
Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)…
▽ More
Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)/2 similarities. First, we show that if the intracluster similarities exceed intercluster similarities, then it is possible to correctly determine the hierarchical clustering from as few as 3N log N similarities. We demonstrate this order of magnitude savings in the number of pairwise similarities necessitates sequentially selecting which similarities to obtain in an adaptive fashion, rather than picking them at random. We then propose an active clustering method that is robust to a limited fraction of anomalous similarities, and show how even in the presence of these noisy similarity values we can resolve the hierarchical clustering using only O(N log^2 N) pairwise similarities.
△ Less
Submitted 18 February, 2011;
originally announced February 2011.