-
Noise is All You Need: Private Second-Order Convergence of Noisy SGD
Authors:
Dmitrii Avdiukhin,
Michael Dinitz,
Chenglin Fan,
Grigory Yaroslavtsev
Abstract:
Private optimization is a topic of major interest in machine learning, with differentially private stochastic gradient descent (DP-SGD) playing a key role in both theory and practice. Furthermore, DP-SGD is known to be a powerful tool in contexts beyond privacy, including robustness, machine unlearning, etc. Existing analyses of DP-SGD either make relatively strong assumptions (e.g., Lipschitz con…
▽ More
Private optimization is a topic of major interest in machine learning, with differentially private stochastic gradient descent (DP-SGD) playing a key role in both theory and practice. Furthermore, DP-SGD is known to be a powerful tool in contexts beyond privacy, including robustness, machine unlearning, etc. Existing analyses of DP-SGD either make relatively strong assumptions (e.g., Lipschitz continuity of the loss function, or even convexity) or prove only first-order convergence (and thus might end at a saddle point in the non-convex setting). At the same time, there has been progress in proving second-order convergence of the non-private version of ``noisy SGD'', as well as progress in designing algorithms that are more complex than DP-SGD and do guarantee second-order convergence. We revisit DP-SGD and show that ``noise is all you need'': the noise necessary for privacy already implies second-order convergence under the standard smoothness assumptions, even for non-Lipschitz loss functions. Hence, we get second-order convergence essentially for free: DP-SGD, the workhorse of modern private optimization, under minimal assumptions can be used to find a second-order stationary point.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Optimal Sample Complexity of Contrastive Learning
Authors:
Noga Alon,
Dmitrii Avdiukhin,
Dor Elboim,
Orr Fischer,
Grigory Yaroslavtsev
Abstract:
Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on a…
▽ More
Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, both general $\ell_p$-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning $\ell_p$-distances for integer $p$. For any $p \ge 1$ we show that $\tilde Θ(\min(nd,n^2))$ labeled tuples are necessary and sufficient for learning $d$-dimensional representations of $n$-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Tree Learning: Optimal Algorithms and Sample Complexity
Authors:
Dmitrii Avdiukhin,
Grigory Yaroslavtsev,
Danny Vainstein,
Orr Fischer,
Sauman Das,
Faraz Mirza
Abstract:
We study the problem of learning a hierarchical tree representation of data from labeled samples, taken from an arbitrary (and possibly adversarial) distribution. Consider a collection of data tuples labeled according to their hierarchical structure. The smallest number of such tuples required in order to be able to accurately label subsequent tuples is of interest for data collection in machine l…
▽ More
We study the problem of learning a hierarchical tree representation of data from labeled samples, taken from an arbitrary (and possibly adversarial) distribution. Consider a collection of data tuples labeled according to their hierarchical structure. The smallest number of such tuples required in order to be able to accurately label subsequent tuples is of interest for data collection in machine learning. We present optimal sample complexity bounds for this problem in several learning settings, including (agnostic) PAC learning and online learning. Our results are based on tight bounds of the Natarajan and Littlestone dimensions of the associated problem. The corresponding tree classifiers can be constructed efficiently in near-linear time.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
HOUDINI: Escaping from Moderately Constrained Saddles
Authors:
Dmitrii Avdiukhin,
Grigory Yaroslavtsev
Abstract:
We give the first polynomial time algorithms for escaping from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function $f \colon \mathbb R^d \to \mathbb R$ we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. This constitutes the first tangible progress (without re…
▽ More
We give the first polynomial time algorithms for escaping from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function $f \colon \mathbb R^d \to \mathbb R$ we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. This constitutes the first tangible progress (without reliance on NP-oracles or altering the definitions to only account for certain constraints) on the main open question of the breakthrough work of Ge et al. who showed an analogous result for unconstrained and equality-constrained problems. Our results hold for both regular and stochastic gradient descent.
△ Less
Submitted 20 April, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Escaping Saddle Points with Compressed SGD
Authors:
Dmitrii Avdiukhin,
Grigory Yaroslavtsev
Abstract:
Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient…
▽ More
Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient compression converges to an $\varepsilon$-first-order stationary point. In this paper we extend these results to convergence to an $\varepsilon$-second-order stationary point ($\varepsilon$-SOSP), which is to the best of our knowledge the first result of this type. In addition, we show that, when the stochastic gradient is not Lipschitz, compressed SGD with RandomK compressor converges to an $\varepsilon$-SOSP with the same number of iterations as uncompressed SGD [Jin et al.,2021] (JACM), while improving the total communication by a factor of $\tilde Θ(\sqrt{d} \varepsilon^{-3/4})$, where $d$ is the dimension of the optimization problem. We present additional results for the cases when the compressor is arbitrary and when the stochastic gradient is Lipschitz.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Objective-Based Hierarchical Clustering of Deep Embedding Vectors
Authors:
Stanislav Naumov,
Grigory Yaroslavtsev,
Dmitrii Avdiukhin
Abstract:
We initiate a comprehensive experimental study of objective-based hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. This includes a large variety of image embedding (ImageNet, ImageNetV2, NaBirds), word embedding (Twitter, Wikipedia), and sentence embedding (SST-2) vectors from several popular recent models (e.g. Res…
▽ More
We initiate a comprehensive experimental study of objective-based hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. This includes a large variety of image embedding (ImageNet, ImageNetV2, NaBirds), word embedding (Twitter, Wikipedia), and sentence embedding (SST-2) vectors from several popular recent models (e.g. ResNet, ResNext, Inception V3, SBERT). Our study includes datasets with up to $4.5$ million entries with embedding dimensions up to $2048$.
In order to address the challenge of scaling up hierarchical clustering to such large datasets we propose a new practical hierarchical clustering algorithm B++&C. It gives a 5%/20% improvement on average for the popular Moseley-Wang (MW) / Cohen-Addad et al. (CKMM) objectives (normalized) compared to a wide range of classic methods and recent heuristics. We also introduce a theoretical algorithm B2SAT&C which achieves a $0.74$-approximation for the CKMM objective in polynomial time. This is the first substantial improvement over the trivial $2/3$-approximation achieved by a random binary tree. Prior to this work, the best poly-time approximation of $\approx 2/3 + 0.0004$ was due to Charikar et al. (SODA'19).
△ Less
Submitted 8 June, 2022; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection
Authors:
Sara Ahmadian,
Vaggos Chatziafratis,
Alessandro Epasto,
Euiwoong Lee,
Mohammad Mahdian,
Konstantin Makarychev,
Grigory Yaroslavtsev
Abstract:
Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization viewpoint of hierarchical clustering with pairwise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage),…
▽ More
Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization viewpoint of hierarchical clustering with pairwise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage), but also designing new algorithms. Here, for the maximization dual of Dasgupta's objective (introduced by Moseley-Wang), we present polynomial-time .4246 approximation algorithms that use Max-Uncut Bisection as a subroutine. The previous best worst-case approximation factor in polynomial time was .336, improving only slightly over Average-Linkage which achieves 1/3. Finally, we complement our positive results by providing APX-hardness (even for 0-1 similarities), under the Small Set Expansion hypothesis.
△ Less
Submitted 15 December, 2019;
originally announced December 2019.
-
Fast Fourier Sparsity Testing
Authors:
Grigory Yaroslavtsev,
Samson Zhou
Abstract:
A function $f : \mathbb{F}_2^n \to \mathbb{R}$ is $s$-sparse if it has at most $s$ non-zero Fourier coefficients. Motivated by applications to fast sparse Fourier transforms over $\mathbb{F}_2^n$, we study efficient algorithms for the problem of approximating the $\ell_2$-distance from a given function to the closest $s$-sparse function. While previous works (e.g., Gopalan et al. SICOMP 2011) stud…
▽ More
A function $f : \mathbb{F}_2^n \to \mathbb{R}$ is $s$-sparse if it has at most $s$ non-zero Fourier coefficients. Motivated by applications to fast sparse Fourier transforms over $\mathbb{F}_2^n$, we study efficient algorithms for the problem of approximating the $\ell_2$-distance from a given function to the closest $s$-sparse function. While previous works (e.g., Gopalan et al. SICOMP 2011) study the problem of distinguishing $s$-sparse functions from those that are far from $s$-sparse under Hamming distance, to the best of our knowledge no prior work has explicitly focused on the more general problem of distance estimation in the $\ell_2$ setting, which is particularly well-motivated for noisy Fourier spectra. Given the focus on efficiency, our main result is an algorithm that solves this problem with query complexity $\mathcal{O}(s)$ for constant accuracy and error parameters, which is only quadratically worse than applicable lower bounds.
△ Less
Submitted 13 October, 2019;
originally announced October 2019.
-
"Bring Your Own Greedy"+Max: Near-Optimal $1/2$-Approximations for Submodular Knapsack
Authors:
Dmitrii Avdiukhin,
Grigory Yaroslavtsev,
Samson Zhou
Abstract:
The problem of selecting a small-size representative summary of a large dataset is a cornerstone of machine learning, optimization and data science. Motivated by applications to recommendation systems and other scenarios with query-limited access to vast amounts of data, we propose a new rigorous algorithmic framework for a standard formulation of this problem as a submodular maximization subject…
▽ More
The problem of selecting a small-size representative summary of a large dataset is a cornerstone of machine learning, optimization and data science. Motivated by applications to recommendation systems and other scenarios with query-limited access to vast amounts of data, we propose a new rigorous algorithmic framework for a standard formulation of this problem as a submodular maximization subject to a linear (knapsack) constraint. Our framework is based on augmenting all partial Greedy solutions with the best additional item. It can be instantiated with negligible overhead in any model of computation, which allows the classic \greedy algorithm and its variants to be implemented. We give such instantiations in the offline (Greedy+Max), multi-pass streaming (Sieve+Max) and distributed (Distributed+Max) settings. Our algorithms give ($1/2-ε$)-approximation with most other key parameters of interest being near-optimal. Our analysis is based on a new set of first-order linear differential inequalities and their robust approximate versions. Experiments on typical datasets (movie recommendations, influence maximization) confirm scalability and high quality of solutions obtained via our framework. Instance-specific approximations are typically in the 0.6-0.7 range and frequently beat even the $(1-1/e) \approx 0.63$ worst-case barrier for polynomial-time algorithms.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
Approximate $\mathbb{F}_2$-Sketching of Valuation Functions
Authors:
Grigory Yaroslavtsev,
Samson Zhou
Abstract:
We study the problem of constructing a linear sketch of minimum dimension that allows approximation of a given real-valued function $f \colon \mathbb{F}_2^n \rightarrow \mathbb R$ with small expected squared error. We develop a general theory of linear sketching for such functions through which we analyze their dimension for most commonly studied types of valuation functions: additive, budget-addi…
▽ More
We study the problem of constructing a linear sketch of minimum dimension that allows approximation of a given real-valued function $f \colon \mathbb{F}_2^n \rightarrow \mathbb R$ with small expected squared error. We develop a general theory of linear sketching for such functions through which we analyze their dimension for most commonly studied types of valuation functions: additive, budget-additive, coverage, $α$-Lipschitz submodular and matroid rank functions. This gives a characterization of how many bits of information have to be stored about the input $x$ so that one can compute $f$ under additive updates to its coordinates.
Our results are tight in most cases and we also give extensions to the distributional version of the problem where the input $x \in \mathbb{F}_2^n$ is generated uniformly at random. Using known connections with dynamic streaming algorithms, both upper and lower bounds on dimension obtained in our work extend to the space complexity of algorithms evaluating $f(x)$ under long sequences of additive updates to the input $x$ presented as a stream. Similar results hold for simultaneous communication in a distributed setting.
△ Less
Submitted 30 June, 2019;
originally announced July 2019.
-
Adversarially Robust Submodular Maximization under Knapsack Constraints
Authors:
Dmitrii Avdiukhin,
Slobodan Mitrović,
Grigory Yaroslavtsev,
Samson Zhou
Abstract:
We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution…
▽ More
We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution can be constructed. For multiple knapsack constraints, our approximation is within a constant-factor of the best known non-robust solution.
We evaluate the performance of our algorithms by comparison to natural robustifications of existing non-robust algorithms under two objectives: 1) dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), 2) movie recommendations on a dataset from MovieLens. Experimental results show that our algorithms give the best objective for a majority of the inputs and show strong performance even compared to offline algorithms that are given the set of removals in advance.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Multi-Dimensional Balanced Graph Partitioning via Projected Gradient Descent
Authors:
Dmitrii Avdiukhin,
Sergey Pupyrev,
Grigory Yaroslavtsev
Abstract:
Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to the previous work, we study the multi-dimensional variant when balance according to multiple weight functions is required. As we demonstrate by experimental evaluation, such multi-dimensional balance is im…
▽ More
Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to the previous work, we study the multi-dimensional variant when balance according to multiple weight functions is required. As we demonstrate by experimental evaluation, such multi-dimensional balance is important for achieving performance improvements for typical distributed graph processing workloads. We propose a new scalable technique for the multidimensional balanced graph partitioning problem. The method is based on applying randomized projected gradient descent to a non-convex continuous relaxation of the objective. We show how to implement the new algorithm efficiently in both theory and practice utilizing various approaches for projection. Experiments with large-scale social networks containing up to hundreds of billions of edges indicate that our algorithm has superior performance compared with the state-of-the-art approaches.
△ Less
Submitted 15 February, 2019; v1 submitted 9 February, 2019;
originally announced February 2019.
-
Hierarchical Clustering for Euclidean Data
Authors:
Moses Charikar,
Vaggos Chatziafratis,
Rad Niazadeh,
Grigory Yaroslavtsev
Abstract:
Recent works on Hierarchical Clustering (HC), a well-studied problem in exploratory data analysis, have focused on optimizing various objective functions for this problem under arbitrary similarity measures. In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in seve…
▽ More
Recent works on Hierarchical Clustering (HC), a well-studied problem in exploratory data analysis, have focused on optimizing various objective functions for this problem under arbitrary similarity measures. In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications. We focus primarily on the popular Gaussian kernel and other related measures, presenting our results through the lens of the objective introduced recently by Moseley and Wang [2017]. We show that the approximation factor in Moseley and Wang [2017] can be improved for Euclidean data. We further demonstrate both theoretically and experimentally that our algorithms scale to very high dimension d, while outperforming average-linkage and showing competitive results against other less scalable approaches.
△ Less
Submitted 26 December, 2018;
originally announced December 2018.
-
Optimality of Linear Sketching under Modular Updates
Authors:
Kaave Hosseini,
Shachar Lovett,
Grigory Yaroslavtsev
Abstract:
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in $n$ dimensions, the existence of efficient streaming algorithms which can process $Ω(n^2)$ updates implies efficient linear sketching algorithms with comparable cost. This improves upon the previous work of Li, Nguyen and Woodruff [LNW14] and Ai, Hu, Li a…
▽ More
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in $n$ dimensions, the existence of efficient streaming algorithms which can process $Ω(n^2)$ updates implies efficient linear sketching algorithms with comparable cost. This improves upon the previous work of Li, Nguyen and Woodruff [LNW14] and Ai, Hu, Li and Woodruff [AHLW16] which required a triple-exponential number of updates to achieve a similar result for updates over integers. We extend our results to updates modulo $p$ for integers $p \ge 2$, and to approximation instead of exact computation.
△ Less
Submitted 24 September, 2018;
originally announced September 2018.
-
Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances
Authors:
Grigory Yaroslavtsev,
Adithya Vadapalli
Abstract:
We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of $n$ input $d$-dimensional vectors under Hamming, $\ell_1, \ell_2$ and $\ell_\infty$ distances. All our algorithms run in $O(\log n)$ rounds of MPC for any fixed $d$ and achieve $(1+ε)$-approximation for all distances (except Hamming for which we show an exact algorithm).…
▽ More
We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of $n$ input $d$-dimensional vectors under Hamming, $\ell_1, \ell_2$ and $\ell_\infty$ distances. All our algorithms run in $O(\log n)$ rounds of MPC for any fixed $d$ and achieve $(1+ε)$-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor inapproximability results for $o(\log n)$-round algorithms under standard MPC hardness assumptions (for sufficiently large dimension depending on the distance used). Efficiency of implementation of our algorithms in Apache Spark is demonstrated through experiments on a variety of datasets exhibiting speedups of several orders of magnitude.
△ Less
Submitted 25 March, 2018; v1 submitted 3 October, 2017;
originally announced October 2017.
-
Linear Sketching over $\mathbb F_2$
Authors:
Sampath Kannan,
Elchanan Mossel,
Grigory Yaroslavtsev
Abstract:
We initiate a systematic study of linear sketching over $\mathbb F_2$. For a given Boolean function $f \colon \{0,1\}^n \to \{0,1\}$ a randomized $\mathbb F_2$-sketch is a distribution $\mathcal M$ over $d \times n$ matrices with elements over $\mathbb F_2$ such that $\mathcal Mx$ suffices for computing $f(x)$ with high probability. We study a connection between $\mathbb F_2$-sketching and a two-p…
▽ More
We initiate a systematic study of linear sketching over $\mathbb F_2$. For a given Boolean function $f \colon \{0,1\}^n \to \{0,1\}$ a randomized $\mathbb F_2$-sketch is a distribution $\mathcal M$ over $d \times n$ matrices with elements over $\mathbb F_2$ such that $\mathcal Mx$ suffices for computing $f(x)$ with high probability. We study a connection between $\mathbb F_2$-sketching and a two-player one-way communication game for the corresponding XOR-function. Our results show that this communication game characterizes $\mathbb F_2$-sketching under the uniform distribution (up to dependence on error). Implications of this result include: 1) a composition theorem for $\mathbb F_2$-sketching complexity of a recursive majority function, 2) a tight relationship between $\mathbb F_2$-sketching complexity and Fourier sparsity, 3) lower bounds for a certain subclass of symmetric functions. We also fully resolve a conjecture of Montanaro and Osborne regarding one-way communication complexity of linear threshold functions by designing an $\mathbb F_2$-sketch of optimal size.
Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over $\mathbb F_2$ can be constructed as $\mathbb F_2$-sketches for the uniform distribution with only a minor loss. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result doesn't require the stream length to be triply exponential in $n$ and holds for streams of length $\tilde O(n)$ constructed through uniformly random updates. Finally, we state a conjecture that asks whether optimal one-way communication protocols for XOR-functions can be constructed as $\mathbb F_2$-sketches with only a small loss.
△ Less
Submitted 11 November, 2016; v1 submitted 6 November, 2016;
originally announced November 2016.
-
Privacy for the Protected (Only)
Authors:
Michael Kearns,
Aaron Roth,
Zhiwei Steven Wu,
Grigory Yaroslavtsev
Abstract:
Motivated by tensions between data privacy for individual citizens, and societal priorities such as counterterrorism and the containment of infectious disease, we introduce a computational model that distinguishes between parties for whom privacy is explicitly protected, and those for whom it is not (the targeted subpopulation). The goal is the development of algorithms that can effectively identi…
▽ More
Motivated by tensions between data privacy for individual citizens, and societal priorities such as counterterrorism and the containment of infectious disease, we introduce a computational model that distinguishes between parties for whom privacy is explicitly protected, and those for whom it is not (the targeted subpopulation). The goal is the development of algorithms that can effectively identify and take action upon members of the targeted subpopulation in a way that minimally compromises the privacy of the protected, while simultaneously limiting the expense of distinguishing members of the two groups via costly mechanisms such as surveillance, background checks, or medical testing. Within this framework, we provide provably privacy-preserving algorithms for targeted search in social networks. These algorithms are natural variants of common graph search methods, and ensure privacy for the protected by the careful injection of noise in the prioritization of potential targets. We validate the utility of our algorithms with extensive computational experiments on two large-scale social network datasets.
△ Less
Submitted 31 May, 2015;
originally announced June 2015.
-
Tight Bounds for Linear Sketches of Approximate Matchings
Authors:
Sepehr Assadi,
Sanjeev Khanna,
Yang Li,
Grigory Yaroslavtsev
Abstract:
We resolve the space complexity of linear sketches for approximating the maximum matching problem in dynamic graph streams where the stream may include both edge insertion and deletion. Specifically, we show that for any $ε> 0$, there exists a one-pass streaming algorithm, which only maintains a linear sketch of size $\tilde{O}(n^{2-3ε})$ bits and recovers an $n^ε$-approximate maximum matching in…
▽ More
We resolve the space complexity of linear sketches for approximating the maximum matching problem in dynamic graph streams where the stream may include both edge insertion and deletion. Specifically, we show that for any $ε> 0$, there exists a one-pass streaming algorithm, which only maintains a linear sketch of size $\tilde{O}(n^{2-3ε})$ bits and recovers an $n^ε$-approximate maximum matching in dynamic graph streams, where $n$ is the number of vertices in the graph. In contrast to the extensively studied insertion-only model, to the best of our knowledge, no non-trivial single-pass streaming algorithms were previously known for approximating the maximum matching problem on general dynamic graph streams.
Furthermore, we show that our upper bound is essentially tight. Namely, any linear sketch for approximating the maximum matching to within a factor of $O(n^ε)$ has to be of size $n^{2-3ε-o(1)}$ bits. We establish this lower bound by analyzing the corresponding simultaneous number-in-hand communication model, with a combinatorial construction based on Ruzsa-Szemerédi graphs.
△ Less
Submitted 6 May, 2015;
originally announced May 2015.
-
Near Optimal LP Rounding Algorithm for Correlation Clustering on Complete and Complete k-partite Graphs
Authors:
Shuchi Chawla,
Konstantin Makarychev,
Tselil Schramm,
Grigory Yaroslavtsev
Abstract:
We give new rounding schemes for the standard linear programming relaxation of the correlation clustering problem, achieving approximation factors almost matching the integrality gaps:
- For complete graphs our appoximation is $2.06 - \varepsilon$ for a fixed constant $\varepsilon$, which almost matches the previously known integrality gap of $2$.
- For complete $k$-partite graphs our approxim…
▽ More
We give new rounding schemes for the standard linear programming relaxation of the correlation clustering problem, achieving approximation factors almost matching the integrality gaps:
- For complete graphs our appoximation is $2.06 - \varepsilon$ for a fixed constant $\varepsilon$, which almost matches the previously known integrality gap of $2$.
- For complete $k$-partite graphs our approximation is $3$. We also show a matching integrality gap.
- For complete graphs with edge weights satisfying triangle inequalities and probability constraints, our approximation is $1.5$, and we show an integrality gap of $1.2$.
Our results improve a long line of work on approximation algorithms for correlation clustering in complete graphs, previously culminating in a ratio of $2.5$ for the complete case by Ailon, Charikar and Newman (JACM'08). In the weighted complete case satisfying triangle inequalities and probability constraints, the same authors give a $2$-approximation; for the bipartite case, Ailon, Avigdor-Elgrabli, Liberty and van Zuylen give a $4$-approximation (SICOMP'12).
△ Less
Submitted 23 June, 2015; v1 submitted 1 December, 2014;
originally announced December 2014.
-
Going for Speed: Sublinear Algorithms for Dense r-CSPs
Authors:
Grigory Yaroslavtsev
Abstract:
We give new sublinear and parallel algorithms for the extensively studied problem of approximating n-variable r-CSPs (constraint satisfaction problems with constraints of arity r up to an additive error. The running time of our algorithms is O(n/ε^2) + 2^O(1/ε^2) for Boolean r-CSPs and O(k^4 n / ε^2) + 2^O(log k / ε^2) for r-CSPs with constraints on variables over an alphabet of size k. For any co…
▽ More
We give new sublinear and parallel algorithms for the extensively studied problem of approximating n-variable r-CSPs (constraint satisfaction problems with constraints of arity r up to an additive error. The running time of our algorithms is O(n/ε^2) + 2^O(1/ε^2) for Boolean r-CSPs and O(k^4 n / ε^2) + 2^O(log k / ε^2) for r-CSPs with constraints on variables over an alphabet of size k. For any constant k this gives optimal dependence on n in the running time unconditionally, while the exponent in the dependence on 1/εis polynomially close to the lower bound under the exponential-time hypothesis, which is 2^Ω(ε^(-1/2)).
For Max-Cut this gives an exponential improvement in dependence on 1/εcompared to the sublinear algorithms of Goldreich, Goldwasser and Ron (JACM'98) and a linear speedup in n compared to the algorithms of Mathieu and Schudy (SODA'08). For the maximization version of k-Correlation Clustering problem our running time is O(k^4 n / ε^2) + k^O(1/ε^2), improving the previously best n k^{O(1/ε^3 log k/ε) by Guruswami and Giotis (SODA'06).
△ Less
Submitted 29 July, 2014;
originally announced July 2014.
-
Online Algorithms for Machine Minimization
Authors:
Nikhil Devanur,
Konstantin Makarychev,
Debmalya Panigrahi,
Grigory Yaroslavtsev
Abstract:
In this paper, we consider the online version of the machine minimization problem (introduced by Chuzhoy et al., FOCS 2004), where the goal is to schedule a set of jobs with release times, deadlines, and processing lengths on a minimum number of identical machines. Since the online problem has strong lower bounds if all the job parameters are arbitrary, we focus on jobs with uniform length. Our ma…
▽ More
In this paper, we consider the online version of the machine minimization problem (introduced by Chuzhoy et al., FOCS 2004), where the goal is to schedule a set of jobs with release times, deadlines, and processing lengths on a minimum number of identical machines. Since the online problem has strong lower bounds if all the job parameters are arbitrary, we focus on jobs with uniform length. Our main result is a complete resolution of the deterministic complexity of this problem by showing that a competitive ratio of $e$ is achievable and optimal, thereby improving upon existing lower and upper bounds of 2.09 and 5.2 respectively. We also give a constant-competitive online algorithm for the case of uniform deadlines (but arbitrary job lengths); to the best of our knowledge, no such algorithm was known previously. Finally, we consider the complimentary problem of throughput maximization where the goal is to maximize the sum of weights of scheduled jobs on a fixed set of identical machines (introduced by Bar-Noy et al. STOC 1999). We give a randomized online algorithm for this problem with a competitive ratio of e/e-1; previous results achieved this bound only for the case of a single machine or in the limit of an infinite number of machines.
△ Less
Submitted 4 March, 2014; v1 submitted 3 March, 2014;
originally announced March 2014.
-
Parallel Algorithms for Geometric Graph Problems
Authors:
Alexandr Andoni,
Aleksandar Nikolov,
Krzysztof Onak,
Grigory Yaroslavtsev
Abstract:
We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+ε)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of th…
▽ More
We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+ε)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, $n^{1+o_ε(1)}$. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for $(1+ε)$-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a $(1+ε)$-approximation algorithm with $n^δ$ space in the streaming-with-sorting model with $1/δ^{O(1)}$ passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.
△ Less
Submitted 4 January, 2014; v1 submitted 30 December, 2013;
originally announced January 2014.
-
The Round Complexity of Small Set Intersection
Authors:
David P. Woodruff,
Grigory Yaroslavtsev
Abstract:
The set disjointness problem is one of the most fundamental and well-studied problems in communication complexity. In this problem Alice and Bob hold sets $S, T \subseteq [n]$, respectively, and the goal is to decide if $S \cap T = \emptyset$. Reductions from set disjointness are a canonical way of proving lower bounds in data stream algorithms, data structures, and distributed computation. In the…
▽ More
The set disjointness problem is one of the most fundamental and well-studied problems in communication complexity. In this problem Alice and Bob hold sets $S, T \subseteq [n]$, respectively, and the goal is to decide if $S \cap T = \emptyset$. Reductions from set disjointness are a canonical way of proving lower bounds in data stream algorithms, data structures, and distributed computation. In these applications, often the set sizes $|S|$ and $|T|$ are bounded by a value $k$ which is much smaller than $n$. This is referred to as small set disjointness. A major restriction in the above applications is the number of rounds that the protocol can make, which, e.g., translates to the number of passes in streaming applications. A fundamental question is thus in understanding the round complexity of the small set disjointness problem. For an essentially equivalent problem, called OR-Equality, Brody et. al showed that with $r$ rounds of communication, the randomized communication complexity is $Ω(k \ilog^r k)$, where$\ilog^r k$ denotes the $r$-th iterated logarithm function. Unfortunately their result requires the error probability of the protocol to be $1/k^{Θ(1)}$. Since naïve amplification of the success probability of a protocol from constant to $1-1/k^{Θ(1)}$ blows up the communication by a $Θ(\log k)$ factor, this destroys their improvements over the well-known lower bound of $Ω(k)$ which holds for any number of rounds. They pose it as an open question to achieve the same $Ω(k \ilog^r k)$ lower bound for protocols with constant error probability. We answer this open question by showing that the $r$-round randomized communication complexity of ${\sf OREQ}_{n,k}$, and thus also of small set disjointness, with {\it constant error probability} is $Ω(k \ilog^r k)$, asymptotically matching known upper bounds for ${\sf OREQ}_{n,k}$ and small set disjointness.
△ Less
Submitted 9 April, 2013; v1 submitted 5 April, 2013;
originally announced April 2013.
-
Learning pseudo-Boolean k-DNF and Submodular Functions
Authors:
Sofya Raskhodnikova,
Grigory Yaroslavtsev
Abstract:
We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a natural generalization of DNF representation for functions with integer range. Each term in such a formula has an associated integral constant. We show that an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all constants associated…
▽ More
We prove that any submodular function f: {0,1}^n -> {0,1,...,k} can be represented as a pseudo-Boolean 2k-DNF formula. Pseudo-Boolean DNFs are a natural generalization of DNF representation for functions with integer range. Each term in such a formula has an associated integral constant. We show that an analog of Hastad's switching lemma holds for pseudo-Boolean k-DNFs if all constants associated with the terms of the formula are bounded.
This allows us to generalize Mansour's PAC-learning algorithm for k-DNFs to pseudo-Boolean k-DNFs, and hence gives a PAC-learning algorithm with membership queries under the uniform distribution for submodular functions of the form f:{0,1}^n -> {0,1,...,k}. Our algorithm runs in time polynomial in n, k^{O(k \log k / ε)}, 1/εand log(1/δ) and works even in the agnostic setting. The line of previous work on learning submodular functions [Balcan, Harvey (STOC '11), Gupta, Hardt, Roth, Ullman (STOC '11), Cheraghchi, Klivans, Kothari, Lee (SODA '12)] implies only n^{O(k)} query complexity for learning submodular functions in this setting, for fixed epsilon and delta.
Our learning algorithm implies a property tester for submodularity of functions f:{0,1}^n -> {0, ..., k} with query complexity polynomial in n for k=O((\log n/ \loglog n)^{1/2}) and constant proximity parameter ε.
△ Less
Submitted 10 August, 2012;
originally announced August 2012.
-
Accurate and Efficient Private Release of Datacubes and Contingency Tables
Authors:
Graham Cormode,
Cecilia M. Procopiuc,
Divesh Srivastava,
Grigory Yaroslavtsev
Abstract:
A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common t…
▽ More
A central problem in releasing aggregate information about sensitive data is to do so accurately while providing a privacy guarantee on the output. Recent work focuses on the class of linear queries, which include basic counting queries, data cubes, and contingency tables. The goal is to maximize the utility of their output, while giving a rigorous privacy guarantee. Most results follow a common template: pick a "strategy" set of linear queries to apply to the data, then use the noisy answers to these queries to reconstruct the queries of interest. This entails either picking a strategy set that is hoped to be good for the queries, or performing a costly search over the space of all possible strategies.
In this paper, we propose a new approach that balances accuracy and efficiency: we show how to improve the accuracy of a given query set by answering some strategy queries more accurately than others. This leads to an efficient optimal noise allocation for many popular strategies, including wavelets, hierarchies, Fourier coefficients and more. For the important case of marginal queries we show that this strictly improves on previous methods, both analytically and empirically. Our results also extend to ensuring that the returned query answers are consistent with an (unknown) data set at minimal extra cost in terms of time and noise.
△ Less
Submitted 25 July, 2012;
originally announced July 2012.
-
Steiner Transitive-Closure Spanners of d-Dimensional Posets
Authors:
Piotr Berman,
Arnab Bhattacharyya,
Elena Grigorescu,
Sofya Raskhodnikova,
David Woodruff,
Grigory Yaroslavtsev
Abstract:
Given a directed graph G and an integer k >= 1, a k-transitive-closure-spanner (k-TCspanner) of G is a directed graph H that has (1) the same transitive-closure as G and (2) diameter at most k. In some applications, the shortcut paths added to the graph in order to obtain small diameter can use Steiner vertices, that is, vertices not in the original graph G. The resulting spanner is called a Stein…
▽ More
Given a directed graph G and an integer k >= 1, a k-transitive-closure-spanner (k-TCspanner) of G is a directed graph H that has (1) the same transitive-closure as G and (2) diameter at most k. In some applications, the shortcut paths added to the graph in order to obtain small diameter can use Steiner vertices, that is, vertices not in the original graph G. The resulting spanner is called a Steiner transitive-closure spanner (Steiner TC-spanner).
Motivated by applications to property reconstruction and access control hierarchies, we concentrate on Steiner TC-spanners of directed acyclic graphs or, equivalently, partially ordered sets. In these applications, the goal is to find a sparsest Steiner k-TC-spanner of a poset G for a given k and G. The focus of this paper is the relationship between the dimension of a poset and the size of its sparsest Steiner TCspanner. The dimension of a poset G is the smallest d such that G can be embedded into a d-dimensional directed hypergrid via an order-preserving embedding.
We present a nearly tight lower bound on the size of Steiner 2-TC-spanners of d-dimensional directed hypergrids. It implies better lower bounds on the complexity of local reconstructors of monotone functions and functions with low Lipschitz constant. The proof of the lower bound constructs a dual solution to a linear programming relaxation of the Steiner 2-TC-spanner problem. We also show that one can efficiently construct a Steiner 2-TC-spanner, of size matching the lower bound, for any low-dimensional poset. Finally, we present a lower bound on the size of Steiner k-TC-spanners of d-dimensional posets that shows that the best-known construction, due to De Santis et al., cannot be improved significantly.
△ Less
Submitted 28 November, 2010;
originally announced November 2010.