Skip to main content

Showing 1–12 of 12 results for author: Madden, L

  1. arXiv:2405.13738  [pdf, ps, other

    cs.LG math.OC

    Interpolation with deep neural networks with non-polynomial activations: necessary and sufficient numbers of neurons

    Authors: Liam Madden

    Abstract: The minimal number of neurons required for a feedforward neural network to interpolate $n$ generic input-output pairs from $\mathbb{R}^d\times \mathbb{R}^{d'}$ is $Θ(\sqrt{nd'})$. While previous results have shown that $Θ(\sqrt{nd'})$ neurons are sufficient, they have been limited to sigmoid, Heaviside, and rectified linear unit (ReLU) as the activation function. Using a different approach, we pro… ▽ More

    Submitted 16 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: V2: reframed in terms of number of neurons, proved the necessary condition, and extended the sufficient condition beyond three layers

    MSC Class: 15A03; 26B10

  2. arXiv:2405.13718  [pdf, other

    cs.LG math.OC

    Next-token prediction capacity: general upper bounds and a lower bound for transformers

    Authors: Liam Madden, Curtis Fox, Christos Thrampoulidis

    Abstract: Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributi… ▽ More

    Submitted 16 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: V2: added a section axiomatizing the probabilistic assumptions of next-token prediction

    MSC Class: 15A03; 26B35

  3. arXiv:2308.02001  [pdf, other

    cs.LG math.OC

    Memory capacity of two layer neural networks with smooth activations

    Authors: Liam Madden, Christos Thrampoulidis

    Abstract: Determining the memory capacity of two layer neural networks with $m$ hidden neurons and input dimension $d$ (i.e., $md+2m$ total trainable parameters), which refers to the largest size of general data the network can memorize, is a fundamental machine learning question. For activations that are real analytic at a point and, if restricting to a polynomial there, have sufficiently high degree, we e… ▽ More

    Submitted 1 May, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: V3: the result was generalized to activations which are real analytic at a point by including a bias vector. The presentation and rigor were also improved

    MSC Class: 15A03; 26B10

    Journal ref: SIAM Journal on Mathematics of Data Science, 6(3):679-702, 2024

  4. Infrared Spectroscopic Survey of the Quiescent Medium of Nearby Clouds: II. Ice Formation and Grain Growth in Perseus and Serpens

    Authors: M. C. L. Madden, A. C. A. Boogert, J. E. Chiar, C. Knez, Y. J. Pendleton, A. G. G. M. Tielens, A. Yip

    Abstract: The properties of dust change during the transition from diffuse to dense clouds as a result of ice formation and dust coagulation, but much is still unclear about this transformation. We present 2-20 micron spectra of 49 field stars behind the Perseus and Serpens Molecular Clouds and establish relationships between the near-infrared continuum extinction (AK) and the depths of the 9.7 micron silic… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: 29 pages+17 pages appendix. Published in The Astrophysical Journal, 930, 2, 2022

    Journal ref: The Astrophysical Journal, Volume 930, Issue 1, id.2, 17 pp., 2022

  5. Sketching the Best Approximate Quantum Compiling Problem

    Authors: Liam Madden, Albert Akhriev, Andrea Simonetto

    Abstract: This paper considers the problem of quantum compilation from an optimization perspective by fixing a circuit structure of CNOTs and rotation gates then optimizing over the rotation angles. We solve the optimization problem classically and consider algorithmic tools to scale it to higher numbers of qubits. We investigate stochastic gradient descent and two sketch-and-solve algorithms. For all three… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: 10 pages, 4 figures, 1 table

    Journal ref: 2022 IEEE International Conference on Quantum Computing and Engineering (QCE), 492-502, 2022

  6. Online Stochastic Gradient Methods Under Sub-Weibull Noise and the Polyak-Łojasiewicz Condition

    Authors: Seunghyun Kim, Liam Madden, Emiliano Dall'Anese

    Abstract: This paper focuses on the online gradient and proximal-gradient methods with stochastic gradient errors. In particular, we examine the performance of the online gradient descent method when the cost satisfies the Polyak-Łojasiewicz (PL) inequality. We provide bounds in expectation and in high probability (that hold iteration-wise), with the latter derived by leveraging a sub-Weibull model for the… ▽ More

    Submitted 31 March, 2022; v1 submitted 6 August, 2021; originally announced August 2021.

    Journal ref: 2022 IEEE 61st Conference on Decision and Control (CDC), 3499-3506, 2022

  7. arXiv:2106.05649  [pdf, other

    quant-ph math.OC

    Best Approximate Quantum Compiling Problems

    Authors: Liam Madden, Andrea Simonetto

    Abstract: We study the problem of finding the best approximate circuit that is the closest (in some pertinent metric) to a target circuit, and which satisfies a number of hardware constraints, like gate alphabet and connectivity. We look at the problem in the CNOT+rotation gate set from a mathematical programming standpoint, offering contributions both in terms of understanding the mathematics of the proble… ▽ More

    Submitted 26 November, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: 25 pages, 9 figures, 3 tables

  8. A Stochastic Operator Framework for Optimization and Learning with Sub-Weibull Errors

    Authors: Nicola Bastianello, Liam Madden, Ruggero Carli, Emiliano Dall'Anese

    Abstract: This paper proposes a framework to study the convergence of stochastic optimization and learning algorithms. The framework is modeled over the different challenges that these algorithms pose, such as (i) the presence of random additive errors (e.g. due to stochastic gradients), and (ii) random coordinate updates (e.g. due to asynchrony in distributed set-ups). The paper covers both convex and stro… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: To appear in IEEE Transactions on Automatic Control

  9. arXiv:2006.05610  [pdf, other

    math.OC

    High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

    Authors: Liam Madden, Emiliano Dall'Anese, Stephen Becker

    Abstract: Stochastic gradient descent is one of the most common iterative algorithms used in machine learning and its convergence analysis is a rich area of research. Understanding its convergence properties can help inform what modifications of it to use in different settings. However, most theoretical results either assume convexity or only provide convergence results in mean. This paper, on the other han… ▽ More

    Submitted 14 July, 2024; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: V5: reorganization. No new analysis, but the generalization analysis was removed, the post-processing algorithm was emphasized, and new numerical experiments were run

  10. Bounds for the tracking error of first-order online optimization methods

    Authors: Liam Madden, Stephen Becker, Emiliano Dall'Anese

    Abstract: This paper investigates online algorithms for smooth time-varying optimization problems, focusing first on methods with constant step-size, momentum, and extrapolation-length. Assuming strong convexity, precise results for the tracking iterate error (the limit supremum of the norm of the difference between the optimal solution and the iterates) for online gradient descent are derived. The paper th… ▽ More

    Submitted 6 October, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: Main changes: added discussion on regret literature in section 2 and added discussion on lower bounds in section 4

    Journal ref: Journal of Optimization Theory and Applications, 189:437-457, 2021

  11. arXiv:1910.08123  [pdf, other

    math.OC cs.LG eess.SP

    Optimization and Learning with Information Streams: Time-varying Algorithms and Applications

    Authors: Emiliano Dall'Anese, Andrea Simonetto, Stephen Becker, Liam Madden

    Abstract: There is a growing cross-disciplinary effort in the broad domain of optimization and learning with streams of data, applied to settings where traditional batch optimization techniques cannot produce solutions at time scales that match the inter-arrival times of the data points due to computational and/or communication bottlenecks. Special types of online algorithms can handle this situation, and t… ▽ More

    Submitted 16 January, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

    Comments: Accepted for publication in IEEE Signal Processing Magazine. Limit of 40 references

    Journal ref: IEEE Signal Processing Magazine, May 2020

  12. Online Sparse Subspace Clustering

    Authors: Liam Madden, Stephen Becker, Emiliano Dall'Anese

    Abstract: This paper focuses on the sparse subspace clustering problem, and develops an online algorithmic solution to cluster data points on-the-fly, without revisiting the whole dataset. The strategy involves an online solution of a sparse representation (SR) problem to build a (sparse) dictionary of similarities where points in the same subspace are considered "similar," followed by a spectral clustering… ▽ More

    Submitted 3 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: 4 pages, 4 figures. Copyright 2019 IEEE. Published in the 2019 IEEE Data Science Workshop (DSW 2019), scheduled for June 4-6, 2019 in Minneapolis, Minnesota

    Journal ref: 2019 IEEE Data Science Workshop (DSW), 248--252, 2019