subscribe to arXiv mailings

On a new measure on the Levi-Civita field $\mathcal{R}$

Authors: Mateo Restrepo Borrero, Vatsal Srivastava, Khodr Shamseddine

Abstract: The Levi-Civita field $\mathcal{R}$ is the smallest non-Archimidean ordered field extension of the real numbers that is real closed and Cauchy complete in the topology induced by the order. In an earlier paper [Shamseddine-Berz-2003], a measure was defined on $\mathcal{R}$ in terms of the limit of the sums of the lengths of inner and outer covers of a set by countable unions of intervals as those… ▽ More The Levi-Civita field $\mathcal{R}$ is the smallest non-Archimidean ordered field extension of the real numbers that is real closed and Cauchy complete in the topology induced by the order. In an earlier paper [Shamseddine-Berz-2003], a measure was defined on $\mathcal{R}$ in terms of the limit of the sums of the lengths of inner and outer covers of a set by countable unions of intervals as those inner and outer sums get closer together. That definition proved useful in developing an integration theory over $\mathcal{R}$ in which the integral satisfies many of the essential properties of the Lebesgue integral of real analysis. Nevertheless, that measure theory lacks some intuitive results that one would expect in any reasonable definition for a measure; for example, the complement of a measurable set within another measurable set need not be measurable. In this paper, we will give a characterization for the measurable sets defined in [Shamseddine-Berz-2003]. Then we will introduce the notion of an outer measure on $\mathcal{R}$ and show some key properties the outer measure has. Finally, we will use the notion of outer measure to define a new measure on $\mathcal{R}$ that proves to be a better generalization of the Lebesgue measure from $\mathbb{R}$ to $\mathcal{R}$ and that leads to a family of measurable sets in $\mathcal{R}$ that strictly contains the family of measurable sets from [Shamseddine-Berz-2003], and for which most of the classic results for Lebesgue measurable sets in $\mathbb{R}$ hold. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2203.15260 [pdf, other]

Efficient Convex Optimization Requires Superlinear Memory

Authors: Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

Abstract: We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1.25 - δ}$ bits of memory must make at least $\tildeΩ(d^{1 + (4/3)δ})$ first-order queries (for any constant $δ\in [0, 1/4]$). Consequently, the performance of such memory-constrained algorithms are a polyno… ▽ More We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1.25 - δ}$ bits of memory must make at least $\tildeΩ(d^{1 + (4/3)δ})$ first-order queries (for any constant $δ\in [0, 1/4]$). Consequently, the performance of such memory-constrained algorithms are a polynomial factor worse than the optimal $\tilde{O}(d)$ query bound for this problem obtained by cutting plane methods that use $\tilde{O}(d^2)$ memory. This resolves a COLT 2019 open problem of Woodworth and Srebro. △ Less

Submitted 24 July, 2024; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: 33 pages, 1 figure

arXiv:2201.04315 [pdf, other]

On the Statistical Complexity of Sample Amplification

Authors: Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant

Abstract: The ``sample amplification'' problem formalizes the following question: Given $n$ i.i.d. samples drawn from an unknown distribution $P$, when is it possible to produce a larger set of $n+m$ samples which cannot be distinguished from $n+m$ i.i.d. samples drawn from $P$? In this work, we provide a firm statistical foundation for this problem by deriving generally applicable amplification procedures,… ▽ More The ``sample amplification'' problem formalizes the following question: Given $n$ i.i.d. samples drawn from an unknown distribution $P$, when is it possible to produce a larger set of $n+m$ samples which cannot be distinguished from $n+m$ i.i.d. samples drawn from $P$? In this work, we provide a firm statistical foundation for this problem by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions. Our techniques apply to a large class of distributions including the exponential family, and establish a rigorous connection between sample amplification and distribution learning. △ Less

Submitted 17 September, 2024; v1 submitted 12 January, 2022; originally announced January 2022.

Comments: To appear in the Annals of Statistics

arXiv:2111.14304 [pdf, ps, other]

doi 10.1007/s40687-023-00388-w

Iwasawa Invariants for Symmetric Square Representations

Authors: Anwesh Ray, R. Sujatha, Vinayak Vatsal

Abstract: Let $p\geq 5$ be a prime, and $\mathfrak{p}$ a prime of $\bar{\mathbb{Q}}$ above $p$. Let $g_1$ and $g_2$ be $\mathfrak{p}$-ordinary, $\mathfrak{p}$-distinguished and $p$-stabilized cuspidal newforms of nebentype characters $ε_1, ε_2$ respectively, and weight $k\geq 2$, whose associated newforms have level prime to $p$. Assume that the residual representations at $\mathfrak{p}$ associated to… ▽ More Let $p\geq 5$ be a prime, and $\mathfrak{p}$ a prime of $\bar{\mathbb{Q}}$ above $p$. Let $g_1$ and $g_2$ be $\mathfrak{p}$-ordinary, $\mathfrak{p}$-distinguished and $p$-stabilized cuspidal newforms of nebentype characters $ε_1, ε_2$ respectively, and weight $k\geq 2$, whose associated newforms have level prime to $p$. Assume that the residual representations at $\mathfrak{p}$ associated to $g_1$ and $g_2$ are absolutely irreducible and isomorphic. Then, the imprimitive $p$-adic L-functions associated with the symmetric square representations are shown to exhibit a congruence modulo $\mathfrak{p}$. Furthermore, the analytic and algebraic Iwasawa invariants associated to these representations of the $g_i$ are shown to be related. Along the way, we give a complete proof of the integrality of the $\mathfrak{p}$-adic L-function, normalized with Hida's canonical period. This fills a gap in the literature, since, despite the result being widely accepted, no complete proof seems to ever have been written down. On the algebraic side, we establish the corresponding congruence for Greenberg's Selmer groups, and verify that the Iwasawa main conjectures for the twisted symmetric square representations for $g_1$ and $g_2$ are compatible with the congruences. △ Less

Submitted 11 June, 2023; v1 submitted 28 November, 2021; originally announced November 2021.

Comments: Accepted for publication in Research in the Mathematical Sciences

MSC Class: 11R23

Journal ref: Research in Mathematical Sciences (2023), Vol 10, issue 1, 27

arXiv:2111.03137 [pdf, other]

Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

Authors: Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

Abstract: We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems. We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluatio… ▽ More We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems. We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components. This complexity bound (which we prove is nearly optimal) can improve almost exponentially on that of accelerated gradient methods, which grow as the square root of the condition number of $f$. Additionally, we provide efficient methods for solving stochastic, quadratic variants of this multiscale optimization problem. Rather than learn the decomposition of $f$ (which would be prohibitively expensive), our methods apply a clean recursive "Big-Step-Little-Step" interleaving of standard methods. The resulting algorithms use $\tilde{\mathcal{O}}(d m)$ space, are numerically stable, and open the door to a more fine-grained understanding of the complexity of convex optimization beyond condition number. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 95 pages, 4 figures; authors are listed in alphabetical order

arXiv:2110.14119 [pdf, other]

doi 10.1142/S0218216522500742

Vertex distortion detects the unknot

Authors: Marion Campisi, Nicholas Cazet, David Crncevic, Tasha Fellman, Phillip Kessler, Nikolas Rieke, Vatsal Srivastava, Luis Torres

Abstract: The first two authors introduced vertex distortion and showed that the vertex distortion of the unknot is trivial. It was conjectured that the vertex distortion of a knot is trivial if and only if the knot is trivial. We will use Denne-Sullivan's bound on Gromov distortion to bound the vertex distortion of nontrivial lattice knots. We will then conclude that trivial vertex distortion implies the u… ▽ More The first two authors introduced vertex distortion and showed that the vertex distortion of the unknot is trivial. It was conjectured that the vertex distortion of a knot is trivial if and only if the knot is trivial. We will use Denne-Sullivan's bound on Gromov distortion to bound the vertex distortion of nontrivial lattice knots. We will then conclude that trivial vertex distortion implies the unknot, proving the conjecture. Additionally, the first conjecture in vertex distortion's debut article is proven and a vertex distortion calculator is given. △ Less

Submitted 30 August, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: 12 pages, 6 figures

Journal ref: J. Knot Theory Ramif., Vol. 31, No. 11, 2250074 (2022)

arXiv:1912.05518 [pdf, ps, other]

doi 10.1112/jlms.12428

On the $\mathcal L$-invariant of the adjoint of a weight one modular form

Authors: Marti Roset, Victor Rotger, Vinayak Vatsal

Abstract: The purpose of this article is proving the equality of two natural $\mathcal L$-invariants attached to the adjoint representation of a weigth one cusp form, each defined by purely analytic, respectively algebraic means. The proof departs from Greenberg's definition of the algebraic $\mathcal L$-invariant as a universal norm of a canonical $\mathbb{Z}_p$-extension of $\mathbb{Q}_p$ associated to th… ▽ More The purpose of this article is proving the equality of two natural $\mathcal L$-invariants attached to the adjoint representation of a weigth one cusp form, each defined by purely analytic, respectively algebraic means. The proof departs from Greenberg's definition of the algebraic $\mathcal L$-invariant as a universal norm of a canonical $\mathbb{Z}_p$-extension of $\mathbb{Q}_p$ associated to the representation. We relate it to a certain $2\times 2$ regulator of $p$-adic logarithms of global units by means of class field theory, which we then show to be equal to the analytic $\mathcal L$-invariant computed by Rivero and the second author. △ Less

Submitted 11 December, 2019; originally announced December 2019.

arXiv:1904.12053 [pdf, other]

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Authors: Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

Abstract: Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ $\text{amplification procedure}$ takes as input $n$ independent draws from an unknown distribution $D$, and outputs a set of $m > n$ ``samples''. An amplifica… ▽ More Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ $\text{amplification procedure}$ takes as input $n$ independent draws from an unknown distribution $D$, and outputs a set of $m > n$ ``samples''. An amplification procedure is valid if no algorithm can distinguish the set of $m$ samples produced by the amplifier from a set of $m$ independent draws from $D$, with probability greater than $2/3$. Perhaps surprisingly, in many settings, a valid amplification procedure exists, even when the size of the input dataset, $n$, is significantly less than what would be necessary to learn $D$ to non-trivial accuracy. Specifically we consider two fundamental settings: the case where $D$ is an arbitrary discrete distribution supported on $\le k$ elements, and the case where $D$ is a $d$-dimensional Gaussian with unknown mean, and fixed covariance. In the first case, we show that an $\left(n, n + Θ(\frac{n}{\sqrt{k}})\right)$ amplifier exists. In particular, given $n=O(\sqrt{k})$ samples from $D$, one can output a set of $m=n+1$ datapoints, whose total variation distance from the distribution of $m$ i.i.d. draws from $D$ is a small constant, despite the fact that one would need quadratically more data, $n=Θ(k)$, to learn $D$ up to small constant total variation distance. In the Gaussian case, we show that an $\left(n,n+Θ(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $Θ(d)$ samples. In both the discrete and Gaussian settings, we show that these results are tight, to constant factors. Beyond these results, we formalize a number of curious directions for future research along this vein. △ Less

Submitted 25 August, 2024; v1 submitted 26 April, 2019; originally announced April 2019.

Comments: ICML 2020 (this version includes edits to clarify minor missing or incorrect details)

arXiv:1806.07856 [pdf, ps, other]

Test vectors for some ramified representations

Authors: V. Vatsal

Abstract: We give an explicit construction of test vectors for $T$-equivariant linear functionals on representations $Π$ of $GL_2$ of a $p$-adic field $F$, where $T$ is a non-split torus. Of particular interest is the case when both the representations are ramified; we completely solve this problem for principal series and Steinberg representations of $GL_2$, as well as for depth zero supercuspidals over… ▽ More We give an explicit construction of test vectors for $T$-equivariant linear functionals on representations $Π$ of $GL_2$ of a $p$-adic field $F$, where $T$ is a non-split torus. Of particular interest is the case when both the representations are ramified; we completely solve this problem for principal series and Steinberg representations of $GL_2$, as well as for depth zero supercuspidals over $\mathbf{Q}_p$. A key ingredient is a theorem of Casselman and Silberger, which allows us to quickly reduce almost all cases to that of the principal series, which can be analyzed directly. Our method shows that the only genuinely difficult cases are the characters of $T$ which occur in the primitive part (or "type") of $Π$ when $Π$ is supercuspidal. The method to handle the depth zero case is based on modular representation theory, motivated by considerations from Deligne-Lusztig theory and the de Rham cohomology of Deligne-Lusztig-Drinfeld curves. The proof also reveals some interesting features related to the Langlands correspondence in characteristic $p$. We show in particular that the test vector problem has an obstruction in characteristic $p$ beyond the root number criterion of Waldspurger and Tunnell, and exhibits an unexpected dichotomy related to the weights in Serre's conjecture and the signs of standard Gauss sums. △ Less

Submitted 20 June, 2018; originally announced June 2018.

arXiv:1806.05659 [pdf, ps, other]

Iwasawa Theory for Artin Representations, I

Authors: R. Greenberg, V. Vatsal

Abstract: This article is the first of a pair of articles dealing with the Iwasawa theory of modular forms of weight 1 and, more generally, of Artin representations satisfying certain conditions. The main results in this part analyze the structure of certain Selmer groups for the Artin representation. In particular, it is shown that the Selmer groups are co-torsion as $Λ$-modules. For each Selmer group, we… ▽ More This article is the first of a pair of articles dealing with the Iwasawa theory of modular forms of weight 1 and, more generally, of Artin representations satisfying certain conditions. The main results in this part analyze the structure of certain Selmer groups for the Artin representation. In particular, it is shown that the Selmer groups are co-torsion as $Λ$-modules. For each Selmer group, we consider a generator of the characteristic ideal of its Pontrjagin dual. We call that the algebraic p-adic L-function. We also construct an analytic $p$-adic L-function via deformation to higher weights. Both of these $p$-adic L-functions should conjecturally have an interpolation property involving $p$-adic logarithms of global units analogous to Stark units. The proof of the interpolation property for the algebraic $p$-adic L-function will be given in Part 2 of this work. △ Less

Submitted 14 June, 2018; originally announced June 2018.

arXiv:1603.06861 [pdf, other]

Trading-off variance and complexity in stochastic gradient descent

Authors: Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi

Abstract: Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (SVRG) method mitigates this shortcoming, introducing… ▽ More Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (SVRG) method mitigates this shortcoming, introducing a new update rule which requires infrequent passes over the entire input dataset to compute the full-gradient. In this work, we propose CheapSVRG, a stochastic variance-reduction optimization scheme. Our algorithm is similar to SVRG but instead of the full gradient, it uses a surrogate which can be efficiently computed on a small subset of the input data. It achieves a linear convergence rate ---up to some error level, depending on the nature of the optimization problem---and features a trade-off between the computational complexity and the convergence rate. Empirical evaluation shows that CheapSVRG performs at least competitively compared to the state of the art. △ Less

Submitted 22 March, 2016; originally announced March 2016.

Comments: 14 pages, 13 figures, first edition on 9th of October 2015

arXiv:math/9906215 [pdf, ps]

doi 10.1007/s002220000080

On the Iwasawa invariants of elliptic curves

Authors: Ralph Greenberg, Vinayak Vatsal

Abstract: Let p be an odd prime. Suppose that E is a modular elliptic curve/Q with good ordinary reduction at p. Let Q_{oo} denote the cyclotomic Z_p-extension of Q. It is conjectured that Sel_E(Q_{oo}) is a cotorsion Lambda-module and that its characteristic ideal is related to the p-adic L-function associated to E. Under certain hypotheses we prove that the validity of these conjectures is preserved by… ▽ More Let p be an odd prime. Suppose that E is a modular elliptic curve/Q with good ordinary reduction at p. Let Q_{oo} denote the cyclotomic Z_p-extension of Q. It is conjectured that Sel_E(Q_{oo}) is a cotorsion Lambda-module and that its characteristic ideal is related to the p-adic L-function associated to E. Under certain hypotheses we prove that the validity of these conjectures is preserved by congruences between the Fourier expansions of the associated modular forms. △ Less

Submitted 18 June, 1999; originally announced June 1999.

Comments: Abstract added in migration (taken from Greenberg's web site)

Report number: AIM 1999-11

Showing 1–12 of 12 results for author: Vatsal