-
One variable Generalization of five entries of Ramanujan and their finite analogue
Authors:
Archit Agarwal
Abstract:
Ramanujan recorded five $q$-series identities at the end of his second notebook and an unified generalization of these identities obtained by Bhoria, Eyyunni and Maji. Recently, Dixit and Patel gave a finite analogue of the identity of Bhoria et. al. which in turn gives finite analogue of all the aforementioned identities of Ramanujan. In this paper, one of our main goals is to obtain a one-variab…
▽ More
Ramanujan recorded five $q$-series identities at the end of his second notebook and an unified generalization of these identities obtained by Bhoria, Eyyunni and Maji. Recently, Dixit and Patel gave a finite analogue of the identity of Bhoria et. al. which in turn gives finite analogue of all the aforementioned identities of Ramanujan. In this paper, one of our main goals is to obtain a one-variable generalization of the identity of Bhoria et. al. along with its finite analogue, which naturally generalizes the result of Dixit and Patel. Utilizing these newly established identities, we derive one-variable generalizations for each of the five entries by Ramanujan and their corresponding finite analogues.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Second-Order Optimization via Quiescence
Authors:
Aayushya Agarwal,
Larry Pileggi,
Ronald Rohrer
Abstract:
Second-order optimization methods exhibit fast convergence to critical points, however, in nonconvex optimization, these methods often require restrictive step-sizes to ensure a monotonically decreasing objective function. In the presence of highly nonlinear objective functions with large Lipschitz constants, increasingly small step-sizes become a bottleneck to fast convergence. We propose a secon…
▽ More
Second-order optimization methods exhibit fast convergence to critical points, however, in nonconvex optimization, these methods often require restrictive step-sizes to ensure a monotonically decreasing objective function. In the presence of highly nonlinear objective functions with large Lipschitz constants, increasingly small step-sizes become a bottleneck to fast convergence. We propose a second-order optimization method that utilizes a dynamic system model to represent the trajectory of optimization variables as an ODE. We then follow the quasi-steady state trajectory by forcing variables with the fastest rise time into a state known as quiescence. This optimization via quiescence allows us to adaptively select large step-sizes that sequentially follow each optimization variable to a quasi-steady state until all state variables reach the actual steady state, coinciding with the optimum. The result is a second-order method that utilizes large step-sizes and does not require a monotonically decreasing objective function to reach a critical point. Experimentally, we demonstrate the fast convergence of this approach for optimizing nonconvex problems in power systems and compare them to existing state-of-the-art second-order methods, including damped Newton-Raphson, BFGS, and SR1.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Crystallizations of small covers over the $n$-simplex $Δ^n$ and the prism $Δ^{n-1} \times I$
Authors:
Anshu Agarwal,
Biplab Basak
Abstract:
A small cover is a closed manifold $M^n$ with a locally standard $\mathbb{Z}_2^n$-action such that its orbit space is a simple convex polytope $P^n$. In this article, we study the crystallizations of small covers over the $n$-simplex $Δ^n$ and the prism $Δ^{n-1} \times I$. It is known that the small cover over the $n$-simplex $Δ^n$ is $\mathbb{RP}^n$. For every $n\geq 2$, we prove that…
▽ More
A small cover is a closed manifold $M^n$ with a locally standard $\mathbb{Z}_2^n$-action such that its orbit space is a simple convex polytope $P^n$. In this article, we study the crystallizations of small covers over the $n$-simplex $Δ^n$ and the prism $Δ^{n-1} \times I$. It is known that the small cover over the $n$-simplex $Δ^n$ is $\mathbb{RP}^n$. For every $n\geq 2$, we prove that $\mathbb{RP}^n$ has a unique $2^n$-vertex crystallization. We also demonstrate that there are exactly $1 + 2^{n-1}$ D-J equivalence classes of small covers over the prism $Δ^{n-1} \times I$, where $n\geq 3$. For each $\mathbb{Z}_2$-characteristic function of $Δ^{n-1} \times I$, we construct a $2^{n-1}(n+1)$-vertex crystallization of the small cover $M^n(λ)$ with regular genus $1 + 2^{n-4}(n^2 - 2n - 3)$, where $n\geq 4$. In particular, we construct four orientable and four non-orientable $\mathbb{RP}^3$-bundles over $\mathbb{S}^1$ up to D-J equivalence with the regular genus 6.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Numerical approximations of McKean Anticipative Backward Stochastic Differential Equations arising in Initial Margin requirements
Authors:
A. Agarwal,
S. De Marco,
E. Gobet,
J. G. Lopez-Salas,
F. Noubiagain,
A. Zhou
Abstract:
We introduce a new class of anticipative backward stochastic differential equations with a dependence of McKean type on the law of the solution, that we name MKABSDE. We provide existence and uniqueness results in a general framework with relatively general regularity assumptions on the coefficients. We show how such stochastic equations arise within the modern paradigm of derivative pricing where…
▽ More
We introduce a new class of anticipative backward stochastic differential equations with a dependence of McKean type on the law of the solution, that we name MKABSDE. We provide existence and uniqueness results in a general framework with relatively general regularity assumptions on the coefficients. We show how such stochastic equations arise within the modern paradigm of derivative pricing where a central counterparty (CCP) requires the members to deposit variation and initial margins to cover their exposure. In the case when the initial margin is proportional to the Conditional Value-at-Risk (CVaR) of the contract price, we apply our general result to define the price as a solution of a MKABSDE. We provide several linear and non-linear simpler approximations, which we solve using different numerical (deterministic and Monte-Carlo) methods.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Minimal simplicial degree $d$ self-maps of $\mathbb{S}^{n-1}\times \mathbb{S}^1$
Authors:
Anshu Agarwal,
Biplab Basak,
Sourav Sarkar
Abstract:
The degree of a map between orientable manifolds is a fundamental concept in topology that aids in understanding the structure and properties of the manifolds and the maps between them. Numerous studies have been conducted on the degree of maps between orientable topological spaces. For each $d \in \mathbb{Z}$, we construct a degree $ d$ simplicial map from a $(2(n+1) \max\{|d|,1\})$-facet colored…
▽ More
The degree of a map between orientable manifolds is a fundamental concept in topology that aids in understanding the structure and properties of the manifolds and the maps between them. Numerous studies have been conducted on the degree of maps between orientable topological spaces. For each $d \in \mathbb{Z}$, we construct a degree $ d$ simplicial map from a $(2(n+1) \max\{|d|,1\})$-facet colored triangulation of $\mathbb{S}^{n-1} \times \mathbb{S}^1$ to the standard $ 2(n+1) $-facet colored triangulation of $ \mathbb{S}^{n-1} \times \mathbb{S}^1 $. We demonstrate that these are the minimal possible colored triangulations for a degree $d $ simplicial self-map of $\mathbb{S}^{n-1} \times \mathbb{S}^1 $, where $n \geq 2 $. Additionally, we construct a minimal degree $d $ simplicial map from a closed orientable $ n$-manifold to $ \mathbb{S}^n $, where $n \geq 1 $.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
A classification of semi-equivelar gems of PL $d$-manifolds on the surface with Euler characteristic $-1$
Authors:
Anshu Agarwal,
Biplab Basak
Abstract:
A semi-equivelar gem of a PL $d$-manifold is a regular colored graph that represents the PL $d$-manifold and regularly embeds on a surface, with the property that the cyclic sequence of degrees of faces in the embedding around each vertex is identical. In \cite{bb24}, the authors classified semi-equivelar gems of PL $d$-manifolds embedded on surfaces with Euler characteristics greater than or equa…
▽ More
A semi-equivelar gem of a PL $d$-manifold is a regular colored graph that represents the PL $d$-manifold and regularly embeds on a surface, with the property that the cyclic sequence of degrees of faces in the embedding around each vertex is identical. In \cite{bb24}, the authors classified semi-equivelar gems of PL $d$-manifolds embedded on surfaces with Euler characteristics greater than or equal to zero. In this article, we focus on classifying semi-equivelar gems of PL $d$-manifolds embedded on the surface with Euler characteristic $-1$. We prove that if a semi-equivelar gem embeds regularly on the surface with Euler characteristic $-1$, then it belongs to one of the following types: $(8^3), (6^2,8), (6^2,12), (10^2,4), (12^2,4),$ $ (4,6,14), (4,6,16), (4,6,18), (4,6,24), (4,8,10), (4,8,12),$ or $(4,8,16)$. Furthermore, we provide constructions that demonstrate the existence of such gems for each of the aforementioned types.
△ Less
Submitted 21 October, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
A divisor generating q-series identity and its applications to probability theory and random graphs
Authors:
Archit Agarwal,
Subhash Chand Bhoria,
Pramod Eyyunni,
Bibekananda Maji
Abstract:
In I981, Uchimura studied a divisor generating $q$-series that has applications in probability theory and in the analysis of data structures, called heaps. Mainly, he proved the following identity. For $|q|<1$, \begin{equation*} \sum_{n=1}^\infty n q^n (q^{n+1})_\infty =\sum_{n=1}^{\infty} \frac{(-1)^{n-1} q^{\frac{n(n+1)}{2} } }{(1-q^n) ( q)_n } = \sum_{n=1}^{\infty} \frac{ q^n }{1-q^n}. \end{equ…
▽ More
In I981, Uchimura studied a divisor generating $q$-series that has applications in probability theory and in the analysis of data structures, called heaps. Mainly, he proved the following identity. For $|q|<1$, \begin{equation*} \sum_{n=1}^\infty n q^n (q^{n+1})_\infty =\sum_{n=1}^{\infty} \frac{(-1)^{n-1} q^{\frac{n(n+1)}{2} } }{(1-q^n) ( q)_n } = \sum_{n=1}^{\infty} \frac{ q^n }{1-q^n}. \end{equation*} Over the years, this identity has been generalized by many mathematicians in different directions. Uchimura himself in 1987, Dilcher (1995), Andrews-Crippa-Simon (1997), and recently Gupta-Kumar (2021) found a generalization of the aforementioned identity. Any generalization of the right most expression of the above identity, we name as divisor-type sum, whereas a generalization of the middle expression we say Ramanujan-type sum, and any generalization of the left most expression we refer it as Uchimura-type sum. Quite surprisingly, Simon, Crippa and Collenberg (1993) showed that the same divisor generating function has a connection with random acyclic digraphs. One of the main themes of this paper is to study these different generalizations and present a unified theory. We also discuss applications of these generalized identities in probability theory for the analysis of heaps and random acyclic digraphs.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Numerical approximation of McKean-Vlasov SDEs via stochastic gradient descent
Authors:
Ankush Agarwal,
Andrea Amato,
Goncalo dos Reis,
Stefano Pagliarani
Abstract:
We propose a novel approach to numerically approximate McKean-Vlasov stochastic differential equations (MV-SDE) using stochastic gradient descent (SGD) while avoiding the use of interacting particle systems. The technique of SGD is deployed to solve a Euclidean minimization problem, which is obtained by first representing the MV-SDE as a minimization problem over the set of continuous functions of…
▽ More
We propose a novel approach to numerically approximate McKean-Vlasov stochastic differential equations (MV-SDE) using stochastic gradient descent (SGD) while avoiding the use of interacting particle systems. The technique of SGD is deployed to solve a Euclidean minimization problem, which is obtained by first representing the MV-SDE as a minimization problem over the set of continuous functions of time, and then by approximating the domain with a finite-dimensional subspace. Convergence is established by proving certain intermediate stability and moment estimates of the relevant stochastic processes (including the tangent ones). Numerical experiments illustrate the competitive performance of our SGD based method compared to the IPS benchmarks. This work offers a theoretical foundation for using the SGD method in the context of numerical approximation of MV-SDEs, and provides analytical tools to study its stability and convergence.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Infinite families of solutions for $A^3 + B^3 = C^3 + D^3$ and $A^4 + B^4 + C^4 + D^4 + E^4 = F^4$
Authors:
Archit Agarwal,
Meghali Garg
Abstract:
Ramanujan, in his lost notebook, gave an interesting identity, which generates infinite families of solutions to Euler's Diophantine equation $A^3 + B^3 = C^3 + D^3$. In this paper, we produce a few infinite families of solutions to the aforementioned Diophantine equation as well as for the Diophantine equation $A^4 + B^4 + C^4 + D^4 + E^4 = F^4$ in the spirit of Ramanujan.
Ramanujan, in his lost notebook, gave an interesting identity, which generates infinite families of solutions to Euler's Diophantine equation $A^3 + B^3 = C^3 + D^3$. In this paper, we produce a few infinite families of solutions to the aforementioned Diophantine equation as well as for the Diophantine equation $A^4 + B^4 + C^4 + D^4 + E^4 = F^4$ in the spirit of Ramanujan.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Representation of convex geometries of convex dimension 3 by spheres
Authors:
Kira Adaricheva,
Arav Agarwal,
Na'ama Nevo
Abstract:
A convex geometry is a closure system satisfying the anti-exchange property. This paper, following the work of Adaricheva and Bolat (2019) and the Polymath REU (2020), continues to investigate representations of convex geometries with small convex dimension by convex shapes on the plane and in spaces of higher dimension. In particular, we answer in the negative the question raised by Polymath REU…
▽ More
A convex geometry is a closure system satisfying the anti-exchange property. This paper, following the work of Adaricheva and Bolat (2019) and the Polymath REU (2020), continues to investigate representations of convex geometries with small convex dimension by convex shapes on the plane and in spaces of higher dimension. In particular, we answer in the negative the question raised by Polymath REU (2020): whether every convex geometry of $cdim=3$ is representable by the circles on the plane. We show there are geometries of $cdim=3$ that cannot be represented by spheres in any $\mathbb{R}^k$, and this connects to posets not representable by spheres from the paper of Felsner, Fishburn and Trotter (1999). On the positive side, we use the result of Kincses (2015) to show that every finite poset is an ellipsoid order.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
An Equivalent Circuit Approach to Distributed Optimization
Authors:
Aayushya Agarwal,
Larry Pileggi
Abstract:
Distributed optimization is an essential paradigm to solve large-scale optimization problems in modern applications where big-data and high-dimensionality creates a computational bottleneck. Distributed optimization algorithms that exhibit fast convergence allow us to fully utilize computing resources and effectively scale to larger optimization problems in a myriad of areas ranging from machine l…
▽ More
Distributed optimization is an essential paradigm to solve large-scale optimization problems in modern applications where big-data and high-dimensionality creates a computational bottleneck. Distributed optimization algorithms that exhibit fast convergence allow us to fully utilize computing resources and effectively scale to larger optimization problems in a myriad of areas ranging from machine learning to power systems. In this work, we introduce a new centralized distributed optimization algorithm (ECADO) inspired by an equivalent circuit model of the distributed problem. The equivalent circuit (EC) model provides a physical analogy to derive new insights to develop a fast-convergent algorithm. The main contributions of this approach are: 1) a weighting scheme based on a circuit-inspired aggregate sensitivity analysis, and 2) an adaptive step-sizing derived from a stable, Backward-Euler numerical integration. We demonstrate that ECADO exhibits faster convergence compared to state-of-the art distributed optimization methods and provably converges for nonconvex problems. We leverage the ECADO features to solve convex and nonconvex optimization problems with large datasets such as: distributing data for logistic regression, training a deep neural network model for classification, and solving a high-dimensional problem security-constrained optimal power flow problem. Compared to state-of-the-art centralized methods, including ADMM, centralized gradient descent, and DANE, this new ECADO approach is shown to converge in fewer iterations.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
An Equivalent Circuit Workflow for Unconstrained Optimization
Authors:
Aayushya Agarwal,
Carmel Fiscko,
Soummya Kar,
Larry Pileggi,
Bruno Sinopoli
Abstract:
We introduce a new workflow for unconstrained optimization whereby objective functions are mapped onto a physical domain to more easily design algorithms that are robust to hyperparameters and achieve fast convergence rates. Specifically, we represent optimization problems as an equivalent circuit that are then solved solely as nonlinear circuits using robust solution methods. The equivalent circu…
▽ More
We introduce a new workflow for unconstrained optimization whereby objective functions are mapped onto a physical domain to more easily design algorithms that are robust to hyperparameters and achieve fast convergence rates. Specifically, we represent optimization problems as an equivalent circuit that are then solved solely as nonlinear circuits using robust solution methods. The equivalent circuit models the trajectory of component-wise scaled gradient flow problem as the transient response of the circuit for which the steady-state coincides with a critical point of the objective function. The equivalent circuit model leverages circuit domain knowledge to methodically design new optimization algorithms that would likely not be developed without a physical model. We incorporate circuit knowledge into optimization methods by 1) enhancing the underlying circuit model for fast numerical analysis, 2) controlling the optimization trajectory by designing the nonlinear circuit components, and 3) solving for step sizes using well-known methods from the circuit simulation. We first establish the necessary conditions that the controls must fulfill for convergence. We show that existing descent algorithms can be re-derived as special cases of this approach and derive new optimization algorithms that are developed with insights from a circuit-based model. The new algorithms can be designed to be robust to hyperparameters, achieve convergence rates comparable or faster than state of the art methods, and are applicable to optimizing a variety of both convex and nonconvex problems.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Bressoud-Subbarao type weighted partition identities for a generalized divisor function
Authors:
Archit Agarwal,
Subhash Chand Bhoria,
Pramod Eyyunni,
Bibekananda Maji
Abstract:
In 1984, Bressoud and Subbarao obtained an interesting weighted partition identity for a generalized divisor function, by means of combinatorial arguments. Recently, the last three named authors found an analytic proof of the aforementioned identity of Bressoud and Subbarao starting from a $q$-series identity of Ramanujan. In the present paper, we revisit the combinatorial arguments of Bressoud an…
▽ More
In 1984, Bressoud and Subbarao obtained an interesting weighted partition identity for a generalized divisor function, by means of combinatorial arguments. Recently, the last three named authors found an analytic proof of the aforementioned identity of Bressoud and Subbarao starting from a $q$-series identity of Ramanujan. In the present paper, we revisit the combinatorial arguments of Bressoud and Subbarao, and derive a more general weighted partition identity. Furthermore, with the help of a fractional differential operator, we establish a few more Bressoud-Subbarao type weighted partition identities beginning from an identity of Andrews, Garvan and Liang. We also found a one-variable generalization of an identity of Uchimura related to Bell polynomials.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Riesz-type criteria for the Riemann hypothesis
Authors:
Archit Agarwal,
Meghali Garg,
Bibekananda Maji
Abstract:
In 1916, Riesz proved that the Riemann hypothesis is equivalent to the bound $\sum_{n=1}^\infty \frac{μ(n)}{n^2} \exp\left( - \frac{x}{n^2} \right) = O_ε \left( x^{-\frac{3}{4} + ε} \right)$, as $x \rightarrow\infty$, for any $ε>0$. Around the same time, Hardy and Littlewood gave another equivalent criteria for the Riemann hypothesis while correcting an identity of Ramanujan. In the present paper,…
▽ More
In 1916, Riesz proved that the Riemann hypothesis is equivalent to the bound $\sum_{n=1}^\infty \frac{μ(n)}{n^2} \exp\left( - \frac{x}{n^2} \right) = O_ε \left( x^{-\frac{3}{4} + ε} \right)$, as $x \rightarrow\infty$, for any $ε>0$. Around the same time, Hardy and Littlewood gave another equivalent criteria for the Riemann hypothesis while correcting an identity of Ramanujan. In the present paper, we establish a one-variable generalization of the identity of Hardy and Littlewood and as an application, we provide Riesz-type criteria for the Riemann hypothesis. In particular, we obtain the bound given by Riesz as well as the bound of Hardy and Littlewood.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Adversarially Robust Learning for Security-Constrained Optimal Power Flow
Authors:
Priya L. Donti,
Aayushya Agarwal,
Neeraj Vijay Bedmutha,
Larry Pileggi,
J. Zico Kolter
Abstract:
In recent years, the ML community has seen surges of interest in both adversarially robust learning and implicit layers, but connections between these two areas have seldom been explored. In this work, we combine innovations from these areas to tackle the problem of N-k security-constrained optimal power flow (SCOPF). N-k SCOPF is a core problem for the operation of electrical grids, and aims to s…
▽ More
In recent years, the ML community has seen surges of interest in both adversarially robust learning and implicit layers, but connections between these two areas have seldom been explored. In this work, we combine innovations from these areas to tackle the problem of N-k security-constrained optimal power flow (SCOPF). N-k SCOPF is a core problem for the operation of electrical grids, and aims to schedule power generation in a manner that is robust to potentially k simultaneous equipment outages. Inspired by methods in adversarially robust training, we frame N-k SCOPF as a minimax optimization problem - viewing power generation settings as adjustable parameters and equipment outages as (adversarial) attacks - and solve this problem via gradient-based techniques. The loss function of this minimax problem involves resolving implicit equations representing grid physics and operational decisions, which we differentiate through via the implicit function theorem. We demonstrate the efficacy of our framework in solving N-3 SCOPF, which has traditionally been considered as prohibitively expensive to solve given that the problem size depends combinatorially on the number of potential outages.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Two-Stage Homotopy Method to Incorporate Discrete Control Variables into AC-OPF
Authors:
Timothy McNamara,
Amritanshu Pandey,
Aayushya Agarwal,
Larry Pileggi
Abstract:
Alternating-Current Optimal Power Flow (AC-OPF) is an optimization problem critical for planning and operating the power grid. The problem is traditionally formulated using only continuous variables. Typically, control devices with discrete-valued settings, which provide valuable flexibility to the network and improve resilience, are omitted from AC-OPF formulations due to the difficulty of integr…
▽ More
Alternating-Current Optimal Power Flow (AC-OPF) is an optimization problem critical for planning and operating the power grid. The problem is traditionally formulated using only continuous variables. Typically, control devices with discrete-valued settings, which provide valuable flexibility to the network and improve resilience, are omitted from AC-OPF formulations due to the difficulty of integrality constraints. We propose a two-stage homotopy algorithm to solve the AC-OPF problem with discrete-valued control settings. This method does not rely on prior knowledge of control settings or other initial conditions. The first stage relaxes the discrete settings to continuous variables and solves the optimization using a robust homotopy technique. Once the solution has been obtained using relaxed models, second homotopy problem gradually transforms the relaxed settings to their nearest feasible discrete values. We test the proposed algorithm on several large networks with switched shunts and adjustable transformers and show it can outperform a similar state-of-the-art solver.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Causal Matrix Completion
Authors:
Anish Agarwal,
Munther Dahleh,
Devavrat Shah,
Dennis Shen
Abstract:
Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved…
▽ More
Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems -- a canonical application for matrix completion -- a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield "missing not at random" (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call "synthetic nearest neighbors" (SNN), to estimate these causal estimands. We prove finite-sample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entry-wise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy
Authors:
Anish Agarwal,
Rahul Singh
Abstract:
The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census, enhancing the privacy of respondents while potentially reducing the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and infer…
▽ More
The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census, enhancing the privacy of respondents while potentially reducing the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency and Gaussian approximation by finite sample arguments, with a rate of $n^{ 1/2}$ for semiparametric estimands that degrades gracefully for nonparametric estimands. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and empirically validate. Our analysis provides nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. Calibrated simulations verify the coverage of our data cleaning adjusted confidence intervals and demonstrate the relevance of our results for Census-derived data.
△ Less
Submitted 12 February, 2024; v1 submitted 6 July, 2021;
originally announced July 2021.
-
Towards a Dimension-Free Understanding of Adaptive Linear Control
Authors:
Juan C. Perdomo,
Max Simchowitz,
Alekh Agarwal,
Peter Bartlett
Abstract:
We study the problem of adaptive control of the linear quadratic regulator for systems in very high, or even infinite dimension. We demonstrate that while sublinear regret requires finite dimensional inputs, the ambient state dimension of the system need not be bounded in order to perform online control. We provide the first regret bounds for LQR which hold for infinite dimensional systems, replac…
▽ More
We study the problem of adaptive control of the linear quadratic regulator for systems in very high, or even infinite dimension. We demonstrate that while sublinear regret requires finite dimensional inputs, the ambient state dimension of the system need not be bounded in order to perform online control. We provide the first regret bounds for LQR which hold for infinite dimensional systems, replacing dependence on ambient dimension with more natural notions of problem complexity. Our guarantees arise from a novel perturbation bound for certainty equivalence which scales with the prediction error in estimating the system parameters, without requiring consistent parameter recovery in more stringent measures like the operator norm. When specialized to finite dimensional settings, our bounds recover near optimal dimension and time horizon dependence.
△ Less
Submitted 15 July, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
Incremental Model Building Homotopy Approach for Solving Exact AC-Constrained Optimal Power Flow
Authors:
Amritanshu Pandey,
Aayushya Agarwal,
Larry Pileggi
Abstract:
Alternating-Current Optimal Power Flow (AC-OPF) is framed as a NP-hard non-convex optimization problem that solves for the most economical dispatch of grid generation given the AC-network and device constraints. Although there are no standard methodologies for obtaining the global optimum for the problem, there is considerable interest from planning and operational engineers in finding a local opt…
▽ More
Alternating-Current Optimal Power Flow (AC-OPF) is framed as a NP-hard non-convex optimization problem that solves for the most economical dispatch of grid generation given the AC-network and device constraints. Although there are no standard methodologies for obtaining the global optimum for the problem, there is considerable interest from planning and operational engineers in finding a local optimum. Nonetheless, solving for the local optima of a large AC-OPF problem is challenging and time-intensive, as none of the leading non-linear optimization toolboxes can provide any timely guarantees of convergence. To provide robust local convergence for large complex systems, we introduce a homotopy-based approach that solves a sequence of primal-dual interior point problems. We utilize the physics of the grid to develop the proposed homotopy method and demonstrate the efficacy of this approach on U.S. Eastern Interconnection sized test networks.
△ Less
Submitted 1 November, 2020;
originally announced November 2020.
-
On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls
Authors:
Anish Agarwal,
Devavrat Shah,
Dennis Shen
Abstract:
We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce…
▽ More
We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In the course of our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions for out-of-sample predictions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. Accordingly, we construct a hypothesis test to check when this conditions holds in practice. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures.
△ Less
Submitted 25 August, 2023; v1 submitted 27 October, 2020;
originally announced October 2020.
-
An Efficient Algorithm for Latin Squares in a Bipartite Min-Max-Plus System
Authors:
Mubasher Umer,
Umar Hayat,
Fazal Abbas,
Anurag Agarwal,
Petko Kitanov
Abstract:
In this paper, we consider the eigenproblems for Latin squares in a bipartite min-max-plus system. The focus is upon developing a new algorithm to compute the eigenvalue and eigenvectors (trivial and non-trivial) for Latin squares in a bipartite min-max-plus system. We illustrate the algorithm using some examples. Furthermore, we compare the results of our algorithm with some of the existing algor…
▽ More
In this paper, we consider the eigenproblems for Latin squares in a bipartite min-max-plus system. The focus is upon developing a new algorithm to compute the eigenvalue and eigenvectors (trivial and non-trivial) for Latin squares in a bipartite min-max-plus system. We illustrate the algorithm using some examples. Furthermore, we compare the results of our algorithm with some of the existing algorithms which shows that the propose method is more efficient.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal
Authors:
Alekh Agarwal,
Sham Kakade,
Lin F. Yang
Abstract:
This work considers the sample and computational complexity of obtaining an $ε$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an opt…
▽ More
This work considers the sample and computational complexity of obtaining an $ε$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP. We ask arguably the most basic and unresolved question in model based planning: is the naive "plug-in" approach, non-asymptotically, minimax optimal in the quality of the policy it finds, given a fixed sample size? Here, the non-asymptotic regime refers to when the sample size is sublinear in the model size.
With access to a generative model, we resolve this question in the strongest possible sense: our main result shows that \emph{any} high accuracy solution in the plug-in model constructed with $N$ samples, provides an $ε$-optimal policy in the true underlying MDP (where $ε$ is the minimax accuracy with $N$ samples at every state, action pair). In comparison, all prior (non-asymptotically) minimax optimal results use model free approaches, such as the Variance Reduced Q-value iteration algorithm (Sidford et al 2018), while the best known model-based results (e.g. Azar et al 2013) require larger sample sizes in their dependence on the planning horizon or the state space. Notably, we show that the model-based approach allows the use of \emph{any} efficient planning algorithm in the empirical MDP, which simplifies algorithm design as this approach does not tie the algorithm to the sampling procedure. The core of our analysis is avnovel "absorbing MDP" construction to address the statistical dependency issues that arise in the analysis of model-based planning approaches, a construction which may be helpful more generally.
△ Less
Submitted 4 April, 2020; v1 submitted 10 June, 2019;
originally announced June 2019.
-
Multi-echelon Supply Chain Inventory Planning using Simulation-Optimization with Data Resampling
Authors:
Anshul Agarwal
Abstract:
Modeling and optimization of multi-echelon supply chain systems is challenging as it requires a holistic approach that exploits synergies and interactions between echelons while accurately accounting for variability observed by these systems. We develop a simulation-optimization framework that minimizes average inventory while maintaining desired average $β$ service level at stocking locations. We…
▽ More
Modeling and optimization of multi-echelon supply chain systems is challenging as it requires a holistic approach that exploits synergies and interactions between echelons while accurately accounting for variability observed by these systems. We develop a simulation-optimization framework that minimizes average inventory while maintaining desired average $β$ service level at stocking locations. We use a discrete-event simulation framework to accurately capture system interactions. Instead of a parametric estimation approach, the demand and the lead time variability are quantified by bootstrap sampling the historical data, thus preserving the true nature of the variability. We compare three different open source simulation-optimization libraries - the derivative free methods from SciPy.Optimize, a Bayesian optimization algorithm Scikit-Optimize, and a radial basis function based black-box optimizer RBFOpt. The experiments demonstrate practical applicability of our approach. While we observe substantially lower inventory levels and computationally superior results from RBFOpt, depending on the problem, search strategy, and the random start states, close enough good solution can be obtained from both RBFOpt and Scikit-Optimize. The optimization results demonstrate a preference for a centralized inventory planning scheme that help with risk pooling. Moreover, with no order placement cost, the optimal solution tends to order more frequently in order to lower inventory.
△ Less
Submitted 31 December, 2018;
originally announced January 2019.
-
A Fourier-based Picard-iteration approach for a class of McKean-Vlasov SDEs with Lévy jumps
Authors:
Ankush Agarwal,
Stefano Pagliarani
Abstract:
We consider a class of Lévy-driven stochastic differential equations (SDEs) with McKean-Vlasov (MK-V) interaction in the drift coefficient. It is assumed that the coefficient is bounded, affine in the state variable, and only measurable in the law of the solution. We study the equivalent functional fixed-point equation for the unknown time-dependent coefficients of the associated Markovian SDE. By…
▽ More
We consider a class of Lévy-driven stochastic differential equations (SDEs) with McKean-Vlasov (MK-V) interaction in the drift coefficient. It is assumed that the coefficient is bounded, affine in the state variable, and only measurable in the law of the solution. We study the equivalent functional fixed-point equation for the unknown time-dependent coefficients of the associated Markovian SDE. By proving a contraction property for the functional map in a suitable normed space, we infer existence and uniqueness results for the MK-V SDE, and derive a discretized Picard iteration scheme that approximates the law of the solution through its characteristic function. Numerical illustrations show the effectiveness of our method, which appears to be appropriate to handle the multi-dimensional setting.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.
-
Validation of Inventory models for Single-echelon Supply Chain using Discrete-event Simulation
Authors:
Anshul Agarwal
Abstract:
Inventory decision frameworks proposed in the literature for single-echelon supply chain systems rely on assumptions to obtain closed form expressions. In particular, two such frameworks - one conventional and the other with a demand undershoot - determine optimal reorder point for a desired $β$ or Type-II service level. In this work we assess the accuracy and applicability of these frameworks wit…
▽ More
Inventory decision frameworks proposed in the literature for single-echelon supply chain systems rely on assumptions to obtain closed form expressions. In particular, two such frameworks - one conventional and the other with a demand undershoot - determine optimal reorder point for a desired $β$ or Type-II service level. In this work we assess the accuracy and applicability of these frameworks with the help of a discrete event simulation framework developed in SimPy. While in several cases the closed form literature models under-predict the service level, and thus result in a higher reorder point and safety stock, we observe some situations where model predicted service levels match the simulated results with statistical significance.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
Differential posets and restriction in critical groups
Authors:
Ayush Agarwal,
Christian Gaetz
Abstract:
In recent work, Benkart, Klivans, and Reiner defined the critical group of a faithful representation of a finite group $G$, which is analogous to the critical group of a graph. In this paper we study maps between critical groups induced by injective group homomorphisms and in particular the map induced by restriction of the representation to a subgroup. We show that in the abelian group case the c…
▽ More
In recent work, Benkart, Klivans, and Reiner defined the critical group of a faithful representation of a finite group $G$, which is analogous to the critical group of a graph. In this paper we study maps between critical groups induced by injective group homomorphisms and in particular the map induced by restriction of the representation to a subgroup. We show that in the abelian group case the critical groups are isomorphic to the critical groups of a certain Cayley graph and that the restriction map corresponds to a graph covering map. We also show that when $G$ is an element in a differential tower of groups, critical groups of certain representations are closely related to words of up-down maps in the associated differential poset. We use this to generalize an explicit formula for the critical group of the permutation representation of the symmetric group given by the second author, and to enumerate the factors in such critical groups.
△ Less
Submitted 3 January, 2020; v1 submitted 23 October, 2017;
originally announced October 2017.
-
Branching diffusion representation of semi-linear elliptic PDEs and estimation using Monte Carlo method
Authors:
Ankush Agarwal,
Julien Claisse
Abstract:
We study semi-linear elliptic PDEs with polynomial non-linearity and provide a probabilistic representation of their solution using branching diffusion processes. When the non-linearity involves the unknown function but not its derivatives, we extend previous results in the literature by showing that our probabilistic representation provides a solution to the PDE without assuming its existence. In…
▽ More
We study semi-linear elliptic PDEs with polynomial non-linearity and provide a probabilistic representation of their solution using branching diffusion processes. When the non-linearity involves the unknown function but not its derivatives, we extend previous results in the literature by showing that our probabilistic representation provides a solution to the PDE without assuming its existence. In the general case, we derive a new representation of the solution by using marked branching diffusion processes and automatic differentiation formulas to account for the non-linear gradient term. In both cases, we develop new theoretical tools to provide explicit sufficient conditions under which our probabilistic representations hold. As an application, we consider several examples including multi-dimensional semi-linear elliptic PDEs and estimate their solution by using the Monte Carlo method.
△ Less
Submitted 14 February, 2018; v1 submitted 2 April, 2017;
originally announced April 2017.
-
On the Inefficiency of Forward Markets in Leader-Follower Competition
Authors:
Desmond Cai,
Anish Agarwal,
Adam Wierman
Abstract:
Motivated by electricity markets, this paper studies the impact of forward contracting in situations where firms have capacity constraints and heterogeneous production lead times. We consider a model with two types of firms - leaders and followers - that choose production at two different times. Followers choose productions in the second stage but can sell forward contracts in the first stage. Our…
▽ More
Motivated by electricity markets, this paper studies the impact of forward contracting in situations where firms have capacity constraints and heterogeneous production lead times. We consider a model with two types of firms - leaders and followers - that choose production at two different times. Followers choose productions in the second stage but can sell forward contracts in the first stage. Our main result is an explicit characterization of the equilibrium outcomes. Classic results on forward contracting suggest that it can mitigate market power in simple settings; however the results in this paper show that the impact of forward markets in this setting is delicate - forward contracting can enhance or mitigate market power. In particular, our results show that leader-follower interactions created by heterogeneous production lead times may cause forward markets to be inefficient, even when there are a large number of followers. In fact, symmetric equilibria do not necessarily exist due to differences in market power among the leaders and followers.
△ Less
Submitted 28 June, 2016;
originally announced June 2016.
-
A Novel MINLP Reformulation for Nonlinear Generalized Disjunctive Programming (GDP) Problems
Authors:
Anshul Agarwal
Abstract:
In optimization problems, often equations and inequalities are represented using if-else (implication) construct which is known to be equivalent to a disjunction. Such statements are modeled and incorporated in an optimization problem using Generalized Disjunctive Programming (GDP). GDP provides a systematic methodology to model optimization problems involving logic disjunctions, logic proposition…
▽ More
In optimization problems, often equations and inequalities are represented using if-else (implication) construct which is known to be equivalent to a disjunction. Such statements are modeled and incorporated in an optimization problem using Generalized Disjunctive Programming (GDP). GDP provides a systematic methodology to model optimization problems involving logic disjunctions, logic propositions, and algebraic equations. In order to take advantage of the existing MINLP solvers, GDP problems can be reformulated as the standard MINLP problems. In this work we propose a novel reformulation methodology for general GDP problems with nonlinear equality and inequality constraints. The proposed methodology provides an exact reformulation, maintains feasibility and convexity of the constraints, and, most importantly, does not require choosing a tolerance level and a Big-M parameter. We also demonstrate how the new reformulation approach can be used to convert the logic proposition represented using if-else (implication) construct into equations in the standard MINLP format. The conversion methodology is extended for variations of implication constructs that include implicit else blocks, sequential implication logic, multiple testing conditions, and nested implication blocks. The proposed approach is utilized to model physical and mechanical properties in a mathematical optimization tool that solves an MINLP problem to design commercial products.
△ Less
Submitted 6 October, 2015;
originally announced October 2015.
-
A Lower Bound for the Optimization of Finite Sums
Authors:
Alekh Agarwal,
Leon Bottou
Abstract:
This paper presents a lower bound for optimizing a finite sum of $n$ functions, where each function is $L$-smooth and the sum is $μ$-strongly convex. We show that no algorithm can reach an error $ε$ in minimizing all functions from this class in fewer than $Ω(n + \sqrt{n(κ-1)}\log(1/ε))$ iterations, where $κ=L/μ$ is a surrogate condition number. We then compare this lower bound to upper bounds for…
▽ More
This paper presents a lower bound for optimizing a finite sum of $n$ functions, where each function is $L$-smooth and the sum is $μ$-strongly convex. We show that no algorithm can reach an error $ε$ in minimizing all functions from this class in fewer than $Ω(n + \sqrt{n(κ-1)}\log(1/ε))$ iterations, where $κ=L/μ$ is a surrogate condition number. We then compare this lower bound to upper bounds for recently developed methods specializing to this setting. When the functions involved in this sum are not arbitrary, but based on i.i.d. random data, then we further contrast these complexity results with those for optimal first-order methods to directly optimize the sum. The conclusion we draw is that a lot of caution is necessary for an accurate comparison, and identify machine learning scenarios where the new methods help computationally.
△ Less
Submitted 3 October, 2015; v1 submitted 2 October, 2014;
originally announced October 2014.
-
Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization
Authors:
Alekh Agarwal,
Animashree Anandkumar,
Prateek Jain,
Praneeth Netrapalli
Abstract:
We consider the problem of sparse coding, where each sample consists of a sparse linear combination of a set of dictionary atoms, and the task is to learn both the dictionary elements and the mixing coefficients. Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed. Typically, the coef…
▽ More
We consider the problem of sparse coding, where each sample consists of a sparse linear combination of a set of dictionary atoms, and the task is to learn both the dictionary elements and the mixing coefficients. Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed. Typically, the coefficients are estimated via $\ell_1$ minimization, keeping the dictionary fixed, and the dictionary is estimated through least squares, keeping the coefficients fixed. In this paper, we establish local linear convergence for this variant of alternating minimization and establish that the basin of attraction for the global optimum (corresponding to the true dictionary and the coefficients) is $\order{1/s^2}$, where $s$ is the sparsity level in each sample and the dictionary satisfies RIP. Combined with the recent results of approximate dictionary estimation, this yields provable guarantees for exact recovery of both the dictionary elements and the coefficients, when the dictionary elements are incoherent.
△ Less
Submitted 28 July, 2014; v1 submitted 29 October, 2013;
originally announced October 2013.
-
A Clustering Approach to Learn Sparsely-Used Overcomplete Dictionaries
Authors:
Alekh Agarwal,
Animashree Anandkumar,
Praneeth Netrapalli
Abstract:
We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements. Our main result is a strategy to approximately recover the unknown dictionary using an efficient algorithm. Our algorithm is a clustering-style procedure, where each cluster is used to estimate a dictionary element. The resulting solution…
▽ More
We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements. Our main result is a strategy to approximately recover the unknown dictionary using an efficient algorithm. Our algorithm is a clustering-style procedure, where each cluster is used to estimate a dictionary element. The resulting solution can often be further cleaned up to obtain a high accuracy estimate, and we provide one simple scenario where $\ell_1$-regularized regression can be used for such a second stage.
△ Less
Submitted 7 July, 2014; v1 submitted 8 September, 2013;
originally announced September 2013.
-
Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions
Authors:
Alekh Agarwal,
Sahand Negahban,
Martin J. Wainwright
Abstract:
We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse. Previous approaches are able to exploit only one of these two structures, yielding an $\order(\pdim/T)$ convergence rate for strongly convex objectives in $\pdim$ dimensions, and an $\order(\sqrt{(\spindex \log \pdim)/T})$ convergence rate…
▽ More
We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse. Previous approaches are able to exploit only one of these two structures, yielding an $\order(\pdim/T)$ convergence rate for strongly convex objectives in $\pdim$ dimensions, and an $\order(\sqrt{(\spindex \log \pdim)/T})$ convergence rate when the optimum is $\spindex$-sparse. Our algorithm is based on successively solving a series of $\ell_1$-regularized optimization problems using Nesterov's dual averaging algorithm. We establish that the error of our solution after $T$ iterations is at most $\order((\spindex \log\pdim)/T)$, with natural extensions to approximate sparsity. Our results apply to locally Lipschitz losses including the logistic, exponential, hinge and least-squares losses. By recourse to statistical minimax results, we show that our convergence rates are optimal up to multiplicative constant factors. The effectiveness of our approach is also confirmed in numerical simulations, in which we compare to several baselines on a least-squares regression problem.
△ Less
Submitted 18 July, 2012;
originally announced July 2012.
-
Efficient simulation of density and probability of large deviations of sum of random vectors using saddle point representations
Authors:
Santanu Dey,
Sandeep Juneja,
Ankush Agarwal
Abstract:
We consider the problem of efficient simulation estimation of the density function at the tails, and the probability of large deviations for a sum of independent, identically distributed, light-tailed and non-lattice random vectors. The latter problem besides being of independent interest, also forms a building block for more complex rare event problems that arise, for instance, in queueing and fi…
▽ More
We consider the problem of efficient simulation estimation of the density function at the tails, and the probability of large deviations for a sum of independent, identically distributed, light-tailed and non-lattice random vectors. The latter problem besides being of independent interest, also forms a building block for more complex rare event problems that arise, for instance, in queueing and financial credit risk modelling. It has been extensively studied in literature where state independent exponential twisting based importance sampling has been shown to be asymptotically efficient and a more nuanced state dependent exponential twisting has been shown to have a stronger bounded relative error property. We exploit the saddle-point based representations that exist for these rare quantities, which rely on inverting the characteristic functions of the underlying random vectors. These representations reduce the rare event estimation problem to evaluating certain integrals, which may via importance sampling be represented as expectations. Further, it is easy to identify and approximate the zero-variance importance sampling distribution to estimate these integrals. We identify such importance sampling measures and show that they possess the asymptotically vanishing relative error property that is stronger than the bounded relative error property. To illustrate the broader applicability of the proposed methodology, we extend it to similarly efficiently estimate the practically important expected overshoot of sums of iid random variables.
△ Less
Submitted 16 October, 2012; v1 submitted 5 March, 2012;
originally announced March 2012.
-
The Generalization Ability of Online Algorithms for Dependent Data
Authors:
Alekh Agarwal,
John C. Duchi
Abstract:
We study the generalization performance of online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $β$- or $φ$-mixing. We show high probability error bound…
▽ More
We study the generalization performance of online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $β$- or $φ$-mixing. We show high probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory.
△ Less
Submitted 6 June, 2012; v1 submitted 11 October, 2011;
originally announced October 2011.
-
Stochastic convex optimization with bandit feedback
Authors:
Alekh Agarwal,
Dean P. Foster,
Daniel Hsu,
Sham M. Kakade,
Alexander Rakhlin
Abstract:
This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm'…
▽ More
This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs $\otil(\poly(d)\sqrt{T})$ regret. Since any algorithm has regret at least $Ω(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.
△ Less
Submitted 8 October, 2011; v1 submitted 8 July, 2011;
originally announced July 2011.
-
Ergodic Mirror Descent
Authors:
John C. Duchi,
Alekh Agarwal,
Mikael Johansson,
Michael I. Jordan
Abstract:
We generalize stochastic subgradient descent methods to situations in which we do not receive independent samples from the distribution over which we optimize, but instead receive samples that are coupled over time. We show that as long as the source of randomness is suitably ergodic---it converges quickly enough to a stationary distribution---the method enjoys strong convergence guarantees, both…
▽ More
We generalize stochastic subgradient descent methods to situations in which we do not receive independent samples from the distribution over which we optimize, but instead receive samples that are coupled over time. We show that as long as the source of randomness is suitably ergodic---it converges quickly enough to a stationary distribution---the method enjoys strong convergence guarantees, both in expectation and with high probability. This result has implications for stochastic optimization in high-dimensional spaces, peer-to-peer distributed optimization schemes, decision problems with dependent data, and stochastic optimization problems over combinatorial spaces.
△ Less
Submitted 1 August, 2012; v1 submitted 24 May, 2011;
originally announced May 2011.
-
Distributed Delayed Stochastic Optimization
Authors:
Alekh Agarwal,
John C. Duchi
Abstract:
We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give r…
▽ More
We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. In application to distributed optimization, we develop procedures that overcome communication bottlenecks and synchronization requirements. We show $n$-node architectures whose optimization error in stochastic problems---in spite of asynchronous delays---scales asymptotically as $\order(1 / \sqrt{nT})$ after $T$ iterations. This rate is known to be optimal for a distributed system with $n$ nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a statistical machine learning task.
△ Less
Submitted 28 April, 2011;
originally announced April 2011.
-
Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization
Authors:
Alekh Agarwal,
Peter L. Bartlett,
Pradeep Ravikumar,
Martin J. Wainwright
Abstract:
Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in…
▽ More
Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes.
△ Less
Submitted 20 November, 2011; v1 submitted 2 September, 2010;
originally announced September 2010.
-
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
Authors:
John Duchi,
Alekh Agarwal,
Martin Wainwright
Abstract:
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent co-ordination, estimation in sensor networks, and large-scale optimization in machine learning. We…
▽ More
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent co-ordination, estimation in sensor networks, and large-scale optimization in machine learning. We develop and analyze distributed algorithms based on dual averaging of subgradients, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our method of analysis allows for a clear separation between the convergence of the optimization algorithm itself and the effects of communication constraints arising from the network structure. In particular, we show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. Our approach includes both the cases of deterministic optimization and communication, as well as problems with stochastic optimization and/or communication.
△ Less
Submitted 10 April, 2011; v1 submitted 12 May, 2010;
originally announced May 2010.
-
$n$-Colour self-inverse compositions
Authors:
Geetika Narang,
A K Agarwal
Abstract:
MacMahon's definition of self-inverse composition is extended to $n$-colour self-inverse composition. This introduces four new sequences which satisfy the same recurrence relation with different initial conditions like the famous Fibonacci and Lucas sequences. For these new sequences explicit formulas, recurrence relations, generating functions and a summation formula are obtained. Two new binom…
▽ More
MacMahon's definition of self-inverse composition is extended to $n$-colour self-inverse composition. This introduces four new sequences which satisfy the same recurrence relation with different initial conditions like the famous Fibonacci and Lucas sequences. For these new sequences explicit formulas, recurrence relations, generating functions and a summation formula are obtained. Two new binomial identities with combinatorial meaning are also given.
△ Less
Submitted 31 August, 2006;
originally announced August 2006.