-
Transformers learn variable-order Markov chains in-context
Authors:
Ruida Zhou,
Chao Tian,
Suhas Diggavi
Abstract:
Large language models have demonstrated impressive in-context learning (ICL) capability. However, it is still unclear how the underlying transformers accomplish it, especially in more complex scenarios. Toward this goal, several recent works studied how transformers learn fixed-order Markov chains (FOMC) in context, yet natural languages are more suitably modeled by variable-order Markov chains (V…
▽ More
Large language models have demonstrated impressive in-context learning (ICL) capability. However, it is still unclear how the underlying transformers accomplish it, especially in more complex scenarios. Toward this goal, several recent works studied how transformers learn fixed-order Markov chains (FOMC) in context, yet natural languages are more suitably modeled by variable-order Markov chains (VOMC), i.e., context trees (CTs). In this work, we study the ICL of VOMC by viewing language modeling as a form of data compression and focus on small alphabets and low-order VOMCs. This perspective allows us to leverage mature compression algorithms, such as context-tree weighting (CTW) and prediction by partial matching (PPM) algorithms as baselines, the former of which is Bayesian optimal for a class of CTW priors. We empirically observe a few phenomena: 1) Transformers can indeed learn to compress VOMC in-context, while PPM suffers significantly; 2) The performance of transformers is not very sensitive to the number of layers, and even a two-layer transformer can learn in-context quite well; and 3) Transformers trained and tested on non-CTW priors can significantly outperform the CTW algorithm. To explain these phenomena, we analyze the attention map of the transformers and extract two mechanisms, on which we provide two transformer constructions: 1) A construction with $D+2$ layers that can mimic the CTW algorithm accurately for CTs of maximum order $D$, 2) A 2-layer transformer that utilizes the feed-forward network for probability blending. One distinction from the FOMC setting is that a counting mechanism appears to play an important role. We implement these synthetic transformer layers and show that such hybrid transformers can match the ICL performance of transformers, and more interestingly, some of them can perform even better despite the much-reduced parameter sets.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Reframing Data Value for Large Language Models Through the Lens of Plausibility
Authors:
Mohamad Rida Rammal,
Ruida Zhou,
Suhas Diggavi
Abstract:
Data valuation seeks to answer the important question, "How much is this data worth?" Existing data valuation methods have largely focused on discriminative models, primarily examining data value through the lens of its utility in training. However, with the push for ever-larger language models, relying on valuation methods that require training becomes increasingly expensive and dependent on spec…
▽ More
Data valuation seeks to answer the important question, "How much is this data worth?" Existing data valuation methods have largely focused on discriminative models, primarily examining data value through the lens of its utility in training. However, with the push for ever-larger language models, relying on valuation methods that require training becomes increasingly expensive and dependent on specific techniques. We propose an alternative perspective on the data value problem for language models, centering around the plausibility of the data. We posit that data holds lesser value if it can be plausibly generated by the model itself. Starting from some intuitive criteria that align with our notions of valuable data, we develop a novel value function that is computationally tractable and derived from first principles with provable properties. We conduct a theoretical analysis of our value function and evaluate it across multiple scenarios and datasets.
△ Less
Submitted 15 October, 2024; v1 submitted 30 August, 2024;
originally announced September 2024.
-
On the Efficiency and Robustness of Vibration-based Foundation Models for IoT Sensing: A Case Study
Authors:
Tomoyoshi Kimura,
Jinyang Li,
Tianshi Wang,
Denizhan Kara,
Yizhuo Chen,
Yigong Hu,
Ruijie Wang,
Maggie Wigness,
Shengzhong Liu,
Mani Srivastava,
Suhas Diggavi,
Tarek Abdelzaher
Abstract:
This paper demonstrates the potential of vibration-based Foundation Models (FMs), pre-trained with unlabeled sensing data, to improve the robustness of run-time inference in (a class of) IoT applications. A case study is presented featuring a vehicle classification application using acoustic and seismic sensing. The work is motivated by the success of foundation models in the areas of natural lang…
▽ More
This paper demonstrates the potential of vibration-based Foundation Models (FMs), pre-trained with unlabeled sensing data, to improve the robustness of run-time inference in (a class of) IoT applications. A case study is presented featuring a vehicle classification application using acoustic and seismic sensing. The work is motivated by the success of foundation models in the areas of natural language processing and computer vision, leading to generalizations of the FM concept to other domains as well, where significant amounts of unlabeled data exist that can be used for self-supervised pre-training. One such domain is IoT applications. Foundation models for selected sensing modalities in the IoT domain can be pre-trained in an environment-agnostic fashion using available unlabeled sensor data and then fine-tuned to the deployment at hand using a small amount of labeled data. The paper shows that the pre-training/fine-tuning approach improves the robustness of downstream inference and facilitates adaptation to different environmental conditions. More specifically, we present a case study in a real-world setting to evaluate a simple (vibration-based) FM-like model, called FOCAL, demonstrating its superior robustness and adaptation, compared to conventional supervised deep neural networks (DNNs). We also demonstrate its superior convergence over supervised solutions. Our findings highlight the advantages of vibration-based FMs (and FM-inspired selfsupervised models in general) in terms of inference robustness, runtime efficiency, and model adaptation (via fine-tuning) in resource-limited IoT settings.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning
Authors:
Kaan Ozkara,
Bruce Huang,
Ruida Zhou,
Suhas Diggavi
Abstract:
Statistical heterogeneity of clients' local data is an important characteristic in federated learning, motivating personalized algorithms tailored to the local data statistics. Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored. We initiate a systematic stu…
▽ More
Statistical heterogeneity of clients' local data is an important characteristic in federated learning, motivating personalized algorithms tailored to the local data statistics. Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored. We initiate a systematic study of such personalized unsupervised learning by developing algorithms based on optimization criteria inspired by a hierarchical Bayesian statistical framework. We develop adaptive algorithms that discover the balance between using limited local data and collaborative information. We do this in the context of two unsupervised learning tasks: personalized dimensionality reduction and personalized diffusion models. We develop convergence analyses for our adaptive algorithms which illustrate the dependence on problem parameters (e.g., heterogeneity, local sample size). We also develop a theoretical framework for personalized diffusion models, which shows the benefits of collaboration even under heterogeneity. We finally evaluate our proposed algorithms using synthetic and real data, demonstrating the effective sample amplification for personalized tasks, induced through collaboration, despite data heterogeneity.
△ Less
Submitted 25 February, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
Authors:
Shengzhong Liu,
Tomoyoshi Kimura,
Dongxin Liu,
Ruijie Wang,
Jinyang Li,
Suhas Diggavi,
Mani Srivastava,
Tarek Abdelzaher
Abstract:
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understan…
▽ More
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective. In contrast, the private space extracts modality-exclusive information through a transformation-invariant objective. Second, we propose a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. Extensive evaluations are performed on four multimodal sensing datasets with two backbone encoders and two classifiers to demonstrate the superiority of FOCAL. It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels. The code and self-collected dataset are available at https://github.com/tomoyoshki/focal.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Representation Transfer Learning via Multiple Pre-trained models for Linear Regression
Authors:
Navjot Singh,
Suhas Diggavi
Abstract:
In this paper, we consider the problem of learning a linear regression model on a data domain of interest (target) given few samples. To aid learning, we are provided with a set of pre-trained regression models that are trained on potentially different data domains (sources). Assuming a representation structure for the data generating linear models at the sources and the target domains, we propose…
▽ More
In this paper, we consider the problem of learning a linear regression model on a data domain of interest (target) given few samples. To aid learning, we are provided with a set of pre-trained regression models that are trained on potentially different data domains (sources). Assuming a representation structure for the data generating linear models at the sources and the target domains, we propose a representation transfer based learning method for constructing the target model. The proposed scheme is comprised of two phases: (i) utilizing the different source representations to construct a representation that is adapted to the target data, and (ii) using the obtained model as an initialization to a fine-tuning procedure that re-trains the entire (over-parameterized) regression model on the target data. For each phase of the training method, we provide excess risk bounds for the learned model compared to the true data generating target model. The derived bounds show a gain in sample complexity for our proposed method compared to the baseline method of not leveraging source representations when achieving the same excess risk, therefore, theoretically demonstrating the effectiveness of transfer learning for linear regression.
△ Less
Submitted 24 June, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Common Information Dimension
Authors:
Osama Hanna,
Xinlin Li,
Suhas Diggavi,
Christina Fragouli
Abstract:
The exact common information between a set of random variables $X_1,...,X_n$ is defined as the minimum entropy of a shared random variable that allows for the exact distributive simulation of $X_1,...,X_n$. It has been established that, in certain instances, infinite entropy is required to achieve distributive simulation, suggesting that continuous random variables may be needed in such scenarios.…
▽ More
The exact common information between a set of random variables $X_1,...,X_n$ is defined as the minimum entropy of a shared random variable that allows for the exact distributive simulation of $X_1,...,X_n$. It has been established that, in certain instances, infinite entropy is required to achieve distributive simulation, suggesting that continuous random variables may be needed in such scenarios. However, to date, there is no established metric to characterize such cases. In this paper, we propose the concept of Common Information Dimension (CID) with respect to a given class of functions $\mathcal{F}$, defined as the minimum dimension of a random variable $W$ required to distributively simulate a set of random variables $X_1,...,X_n$, such that $W$ can be expressed as a function of $X_1,\cdots,X_n$ using a member of $\mathcal{F}$. Our main contributions include the computation of the common information dimension for jointly Gaussian random vectors in a closed form, with $\mathcal{F}$ being the linear functions class.
△ Less
Submitted 7 July, 2024; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Multi-Message Shuffled Privacy in Federated Learning
Authors:
Antonious M. Girgis,
Suhas Diggavi
Abstract:
We study differentially private distributed optimization under communication constraints. A server using SGD for optimization aggregates the client-side local gradients for model updates using distributed mean estimation (DME). We develop a communication-efficient private DME, using the recently developed multi-message shuffled (MMS) privacy framework. We analyze our proposed DME scheme to show th…
▽ More
We study differentially private distributed optimization under communication constraints. A server using SGD for optimization aggregates the client-side local gradients for model updates using distributed mean estimation (DME). We develop a communication-efficient private DME, using the recently developed multi-message shuffled (MMS) privacy framework. We analyze our proposed DME scheme to show that it achieves the order-optimal privacy-communication-performance tradeoff resolving an open question in [1], whether the shuffled models can improve the tradeoff obtained in Secure Aggregation. This also resolves an open question on the optimal trade-off for private vector sum in the MMS model. We achieve it through a novel privacy mechanism that non-uniformly allocates privacy at different resolutions of the local gradient vectors. These results are directly applied to give guarantees on private distributed learning algorithms using this for private gradient aggregation iteratively. We also numerically evaluate the private DME algorithms.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
HQAlign: Aligning nanopore reads for SV detection using current-level modeling
Authors:
Dhaivat Joshi,
Suhas Diggavi,
Mark J. P. Chaisson,
Sreeram Kannan
Abstract:
Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing…
▽ More
Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing very long reads but with high error rates, making accurate alignment challenging. Many errors induced by nanopore sequencing have a bias because of the physics of the sequencing process and proper utilization of these error characteristics can play an important role in designing a robust aligner for SV detection problems. In this paper, we design and evaluate HQAlign, an aligner for SV detection using nanopore sequenced reads. The key ideas of HQAlign include (i) using basecalled nanopore reads along with the nanopore physics to improve alignments for SVs (ii) incorporating SV specific changes to the alignment pipeline (iii) adapting these into existing state-of-the-art long read aligner pipeline, minimap2 (v2.24), for efficient alignments.
Results: We show that HQAlign captures about 4%-6% complementary SVs across different datasets which are missed by minimap2 alignments while having a standalone performance at par with minimap2 for real nanopore reads data. For the common SV calls between HQAlign and minimap2, HQAlign improves the start and the end breakpoint accuracy for about 10%-50% of SVs across different datasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2 85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13 assembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to GRCh37 human genome.
△ Less
Submitted 10 January, 2023;
originally announced January 2023.
-
Differentially Private Stochastic Linear Bandits: (Almost) for Free
Authors:
Osama A. Hanna,
Antonious M. Girgis,
Christina Fragouli,
Suhas Diggavi
Abstract:
In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear…
▽ More
In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear bandits, while the best previously known algorithm achieves $\tilde{O}(\frac{1}ε\sqrt{T})$. In the local case, we achieve a regret of $\tilde{O}(\frac{1}ε{\sqrt{T}})$ which matches the non-private regret for constant $ε$, but suffers a regret penalty when $ε$ is small. In the shuffled model, we also achieve regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ %for small $ε$ as in the central case, while the best previously known algorithm suffers a regret of $\tilde{O}(\frac{1}ε{T^{3/5}})$. Our numerical evaluation validates our theoretical results.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
A Generative Framework for Personalized Learning and Estimation: Theory, Algorithms, and Privacy
Authors:
Kaan Ozkara,
Antonious M. Girgis,
Deepesh Data,
Suhas Diggavi
Abstract:
A distinguishing characteristic of federated learning is that the (local) client data could have statistical heterogeneity. This heterogeneity has motivated the design of personalized learning, where individual (personalized) models are trained, through collaboration. There have been various personalization methods proposed in literature, with seemingly very different forms and methods ranging fro…
▽ More
A distinguishing characteristic of federated learning is that the (local) client data could have statistical heterogeneity. This heterogeneity has motivated the design of personalized learning, where individual (personalized) models are trained, through collaboration. There have been various personalization methods proposed in literature, with seemingly very different forms and methods ranging from use of a single global model for local regularization and model interpolation, to use of multiple global models for personalized clustering, etc. In this work, we begin with a generative framework that could potentially unify several different algorithms as well as suggest new algorithms. We apply our generative framework to personalized estimation, and connect it to the classical empirical Bayes' methodology. We develop private personalized estimation under this framework. We then use our generative framework for learning, which unifies several known personalized FL algorithms and also suggests new ones; we propose and study a new algorithm AdaPeD based on a Knowledge Distillation, which numerically outperforms several known algorithms. We also develop privacy for personalized learning methods with guarantees for user-level privacy and composition. We numerically evaluate the performance as well as the privacy for both the estimation and learning problems, demonstrating the advantages of our proposed methods.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
On Leave-One-Out Conditional Mutual Information For Generalization
Authors:
Mohamad Rida Rammal,
Alessandro Achille,
Aditya Golatkar,
Suhas Diggavi,
Stefano Soatto
Abstract:
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not exploit the structure of the problem and may be hard to evaluate in practice, our loo-CMI bounds can be computed easily and can be interpreted in connection to…
▽ More
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not exploit the structure of the problem and may be hard to evaluate in practice, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation, stability of the optimization algorithm, and the geometry of the loss-landscape. It applies both to the output of training algorithms as well as their predictions. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning. In particular, our bounds are non-vacuous on large-scale image-classification tasks.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Improving Group Testing via Gradient Descent
Authors:
Sundara Rajan Srinivasavaradhan,
Pavlos Nikolopoulos,
Christina Fragouli,
Suhas Diggavi
Abstract:
We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different,…
▽ More
We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use? We explore this question for the Definite Non-Defectives (DND) decoder. We formulate a (non-convex) optimization problem, where the objective function is the expected number of errors for a particular design. We find approximate solutions via gradient descent, which we further optimize with informed initialization. We illustrate through simulations that our method can achieve significant performance improvement over traditional approaches.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Decentralized Multi-Task Stochastic Optimization With Compressed Communications
Authors:
Navjot Singh,
Xuanyu Cao,
Suhas Diggavi,
Tamer Basar
Abstract:
We consider a multi-agent network where each node has a stochastic (local) cost function that depends on the decision variable of that node and a random variable, and further the decision variables of neighboring nodes are pairwise constrained. There is an aggregate objective function for the network, composed additively of the expected values of the local cost functions at the nodes, and the over…
▽ More
We consider a multi-agent network where each node has a stochastic (local) cost function that depends on the decision variable of that node and a random variable, and further the decision variables of neighboring nodes are pairwise constrained. There is an aggregate objective function for the network, composed additively of the expected values of the local cost functions at the nodes, and the overall goal of the network is to obtain the minimizing solution to this aggregate objective function subject to all the pairwise constraints. This is to be achieved at the node level using decentralized information and local computation, with exchanges of only compressed information allowed by neighboring nodes. The paper develops algorithms and obtains performance bounds for two different models of local information availability at the nodes: (i) sample feedback, where each node has direct access to samples of the local random variable to evaluate its local cost, and (ii) bandit feedback, where samples of the random variables are not available, but only the values of the local cost functions at two random points close to the decision are available to each node. For both models, with compressed communication between neighbors, we have developed decentralized saddle-point algorithms that deliver performances no different (in order sense) from those without communication compression; specifically, we show that deviation from the global minimum value and violations of the constraints are upper-bounded by $\mathcal{O}(T^{-\frac{1}{2}})$ and $\mathcal{O}(T^{-\frac{1}{4}})$, respectively, where $T$ is the number of iterations. Numerical examples provided in the paper corroborate these bounds and demonstrate the communication efficiency of the proposed method.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Coded Estimation: Design of Backscatter Array Codes for 3D Orientation Estimation
Authors:
Mohamad Rida Rammal,
Suhas Diggavi,
Ashutosh Sabharwal
Abstract:
We consider the problem of estimating the orientation of a 3D object with the assistance of configurable backscatter tags. We explore the idea of designing tag response codes to improve the accuracy of orientation estimation. To minimize the difference between the true and estimated orientation, we propose two code design criteria. We also derive a lower bound on the worst-case error using Le Cam'…
▽ More
We consider the problem of estimating the orientation of a 3D object with the assistance of configurable backscatter tags. We explore the idea of designing tag response codes to improve the accuracy of orientation estimation. To minimize the difference between the true and estimated orientation, we propose two code design criteria. We also derive a lower bound on the worst-case error using Le Cam's method and provide simulation results for multiple scenarios including line-of-sight only and multipath, comparing the theoretical bounds to those achieved by the designs.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning
Authors:
Kaan Ozkara,
Navjot Singh,
Deepesh Data,
Suhas Diggavi
Abstract:
Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with {\em diverse resources}. In this work, we introduce a \textit{quantized} and \textit{personalized} FL algorithm QuPeD that facilitates collectiv…
▽ More
Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with {\em diverse resources}. In this work, we introduce a \textit{quantized} and \textit{personalized} FL algorithm QuPeD that facilitates collective (personalized model compression) training via \textit{knowledge distillation} (KD) among clients who have access to heterogeneous data and resources. For personalization, we allow clients to learn \textit{compressed personalized models} with different quantization parameters and model dimensions/structures. Towards this, first we propose an algorithm for learning quantized models through a relaxed optimization problem, where quantization values are also optimized over. When each client participating in the (federated) learning process has different requirements for the compressed model (both in model dimension and precision), we formulate a compressed personalization framework by introducing knowledge distillation loss for local client objectives collaborating through a global model. We develop an alternating proximal gradient update for solving this compressed personalization problem, and analyze its convergence properties. Numerically, we validate that QuPeD outperforms competing personalized FL methods, FedAvg, and local training of clients in various heterogeneous settings.
△ Less
Submitted 5 July, 2022; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Renyi Differential Privacy of the Subsampled Shuffle Model in Distributed Learning
Authors:
Antonious M. Girgis,
Deepesh Data,
Suhas Diggavi
Abstract:
We study privacy in a distributed learning framework, where clients collaboratively build a learning model iteratively through interactions with a server from whom we need privacy. Motivated by stochastic optimization and the federated learning (FL) paradigm, we focus on the case where a small fraction of data samples are randomly sub-sampled in each round to participate in the learning process, w…
▽ More
We study privacy in a distributed learning framework, where clients collaboratively build a learning model iteratively through interactions with a server from whom we need privacy. Motivated by stochastic optimization and the federated learning (FL) paradigm, we focus on the case where a small fraction of data samples are randomly sub-sampled in each round to participate in the learning process, which also enables privacy amplification. To obtain even stronger local privacy guarantees, we study this in the shuffle privacy model, where each client randomizes its response using a local differentially private (LDP) mechanism and the server only receives a random permutation (shuffle) of the clients' responses without their association to each client. The principal result of this paper is a privacy-optimization performance trade-off for discrete randomization mechanisms in this sub-sampled shuffle privacy model. This is enabled through a new theoretical technique to analyze the Renyi Differential Privacy (RDP) of the sub-sampled shuffle model. We numerically demonstrate that, for important regimes, with composition our bound yields significant improvement in privacy guarantee over the state-of-the-art approximate Differential Privacy (DP) guarantee (with strong composition) for sub-sampled shuffled models. We also demonstrate numerically significant improvement in privacy-learning performance operating point using real data sets.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
A Field Guide to Federated Optimization
Authors:
Jianyu Wang,
Zachary Charles,
Zheng Xu,
Gauri Joshi,
H. Brendan McMahan,
Blaise Aguera y Arcas,
Maruan Al-Shedivat,
Galen Andrew,
Salman Avestimehr,
Katharine Daly,
Deepesh Data,
Suhas Diggavi,
Hubert Eichner,
Advait Gadhikar,
Zachary Garrett,
Antonious M. Girgis,
Filip Hanzely,
Andrew Hard,
Chaoyang He,
Samuel Horvath,
Zhouyuan Huo,
Alex Ingerman,
Martin Jaggi,
Tara Javidi,
Peter Kairouz
, et al. (28 additional authors not shown)
Abstract:
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and…
▽ More
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Dynamic group testing to control and monitor disease progression in a population
Authors:
Sundara Rajan Srinivasavaradhan,
Pavlos Nikolopoulos,
Christina Fragouli,
Suhas Diggavi
Abstract:
In the context of a pandemic like COVID-19, and until most people are vaccinated, proactive testing and interventions have been proved to be the only means to contain the disease spread. Recent academic work has offered significant evidence in this regard, but a critical question is still open: Can we accurately identify all new infections that happen every day, without this being forbiddingly exp…
▽ More
In the context of a pandemic like COVID-19, and until most people are vaccinated, proactive testing and interventions have been proved to be the only means to contain the disease spread. Recent academic work has offered significant evidence in this regard, but a critical question is still open: Can we accurately identify all new infections that happen every day, without this being forbiddingly expensive, i.e., using only a fraction of the tests needed to test everyone everyday (complete testing)? Group testing offers a powerful toolset for minimizing the number of tests, but it does not account for the time dynamics behind the infections. Moreover, it typically assumes that people are infected independently, while infections are governed by community spread. Epidemiology, on the other hand, does explore time dynamics and community correlations through the well-established continuous-time SIR stochastic network model, but the standard model does not incorporate discrete-time testing and interventions. In this paper, we introduce a "discrete-time SIR stochastic block model" that also allows for group testing and interventions on a daily basis. Our model can be regarded as a discrete version of the continuous-time SIR stochastic network model over a specific type of weighted graph that captures the underlying community structure. We analyze that model w.r.t. the minimum number of group tests needed everyday to identify all infections with vanishing error probability. We find that one can leverage the knowledge of the community and the model to inform nonadaptive group testing algorithms that are order-optimal, and therefore achieve the same performance as complete testing using a much smaller number of tests.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
On the Renyi Differential Privacy of the Shuffle Model
Authors:
Antonious M. Girgis,
Deepesh Data,
Suhas Diggavi,
Ananda Theertha Suresh,
Peter Kairouz
Abstract:
The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model. In the shuffle model, each of the $n$ clients randomizes its response using a local differentially private (LDP) mechanism and the untrusted server only receives a random permutation (shuffle) of the client responses without association to ea…
▽ More
The central question studied in this paper is Renyi Differential Privacy (RDP) guarantees for general discrete local mechanisms in the shuffle privacy model. In the shuffle model, each of the $n$ clients randomizes its response using a local differentially private (LDP) mechanism and the untrusted server only receives a random permutation (shuffle) of the client responses without association to each client. The principal result in this paper is the first non-trivial RDP guarantee for general discrete local randomization mechanisms in the shuffled privacy model, and we develop new analysis techniques for deriving our results which could be of independent interest. In applications, such an RDP guarantee is most useful when we use it for composing several private interactions. We numerically demonstrate that, for important regimes, with composition our bound yields an improvement in privacy guarantee by a factor of $8\times$ over the state-of-the-art approximate Differential Privacy (DP) guarantee (with standard composition) for shuffled models. Moreover, combining with Poisson subsampling, our result leads to at least $10\times$ improvement over subsampled approximate DP with standard composition.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
QuPeL: Quantized Personalization with Applications to Federated Learning
Authors:
Kaan Ozkara,
Navjot Singh,
Deepesh Data,
Suhas Diggavi
Abstract:
Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with {\em diverse resources}. In this work, we introduce a \textit{quantized} and \textit{personalized} FL algorithm QuPeL that facilitates collectiv…
▽ More
Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with {\em diverse resources}. In this work, we introduce a \textit{quantized} and \textit{personalized} FL algorithm QuPeL that facilitates collective training with heterogeneous clients while respecting resource diversity. For personalization, we allow clients to learn \textit{compressed personalized models} with different quantization parameters depending on their resources. Towards this, first we propose an algorithm for learning quantized models through a relaxed optimization problem, where quantization values are also optimized over. When each client participating in the (federated) learning process has different requirements of the quantized model (both in value and precision), we formulate a quantized personalization framework by introducing a penalty term for local client objectives against a globally trained model to encourage collaboration. We develop an alternating proximal gradient update for solving this quantized personalization problem, and we analyze its convergence properties. Numerically, we show that optimizing over the quantization levels increases the performance and we validate that QuPeL outperforms both FedAvg and local training of clients in a heterogeneous setting.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Quantizing data for distributed learning
Authors:
Osama A. Hanna,
Yahya H. Ezzeldin,
Christina Fragouli,
Suhas Diggavi
Abstract:
We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alt…
▽ More
We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach to learn from distributed data that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach leverages the dependency of the computed gradient on data samples, which lie in a much smaller space in order to perform the quantization in the smaller dimension data space. At the cost of an extra gradient computation, the gradient estimate can be refined by conveying the difference between the gradient at the quantized data point and the original gradient using a small number of bits. Lastly, in order to save communication, our approach adds a layer that decides whether to transmit a quantized data sample or not based on its importance for learning. We analyze the convergence of the proposed approach for smooth convex and non-convex objective functions and show that we can achieve order optimal convergence rates with communication that mostly depends on the data rather than the model (gradient) dimension. We use our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet datasets, and show that we can achieve an order of magnitude savings over gradient compression methods. These communication savings come at the cost of increasing computation at the learning agent, and thus our approach is beneficial in scenarios where communication load is the main problem.
△ Less
Submitted 8 September, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Group testing for overlapping communities
Authors:
Pavlos Nikolopoulos,
Sundara Rajan Srinivasavaradhan,
Tao Guo,
Christina Fragouli,
Suhas Diggavi
Abstract:
In this paper, we propose algorithms that leverage a known community structure to make group testing more efficient. We consider a population organized in connected communities: each individual participates in one or more communities, and the infection probability of each individual depends on the communities (s)he participates in. Use cases include students who participate in several classes, and…
▽ More
In this paper, we propose algorithms that leverage a known community structure to make group testing more efficient. We consider a population organized in connected communities: each individual participates in one or more communities, and the infection probability of each individual depends on the communities (s)he participates in. Use cases include students who participate in several classes, and workers who share common spaces. Group testing reduces the number of tests needed to identify the infected individuals by pooling diagnostic samples and testing them together. We show that making testing algorithms aware of the community structure, can significantly reduce the number of tests needed both for adaptive and non-adaptive group testing.
△ Less
Submitted 16 March, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs
Authors:
Antonious M. Girgis,
Deepesh Data,
Suhas Diggavi,
Peter Kairouz,
Ananda Theertha Suresh
Abstract:
We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework. Unique challenges to the traditional ERM problem in the context of FL include (i) need to provide privacy guarantees on clients' data, (ii) compress the communication between clients and the server, since client…
▽ More
We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework. Unique challenges to the traditional ERM problem in the context of FL include (i) need to provide privacy guarantees on clients' data, (ii) compress the communication between clients and the server, since clients might have low-bandwidth links, (iii) work with a dynamic client population at each round of communication between the server and the clients, as a small fraction of clients are sampled at each round. To address these challenges we develop (optimal) communication-efficient schemes for private mean estimation for several $\ell_p$ spaces, enabling efficient gradient aggregation for each iteration of the optimization solution of the ERM. We also provide lower and upper bounds for mean estimation with privacy and communication constraints for arbitrary $\ell_p$ spaces. To get the overall communication, privacy, and optimization performance operation point, we combine this with privacy amplification opportunities inherent to this setup. Our solution takes advantage of the inherent privacy amplification provided by client sampling and data sampling at each client (through Stochastic Gradient Descent) as well as the recently developed privacy framework using anonymization, which effectively presents to the server responses that are randomly shuffled with respect to the clients. Putting these together, we demonstrate that one can get the same privacy, optimization-performance operating point developed in recent methods that use full-precision communication, but at a much lower communication cost, i.e., effectively getting communication efficiency for "free".
△ Less
Submitted 23 September, 2020; v1 submitted 17 August, 2020;
originally announced August 2020.
-
Community aware group testing
Authors:
Pavlos Nikolopoulos,
Tao Guo,
Sundara Rajan Srinivasavaradhan,
Christina Fragouli,
Suhas Diggavi
Abstract:
In this paper, we propose algorithms that leverage a known community structure to make group testing more efficient. We consider a population organized in disjoint communities: each individual participates in a community, and its infection probability depends on the community (s)he participates in. Use cases include families, students who participate in several classes, and workers who share commo…
▽ More
In this paper, we propose algorithms that leverage a known community structure to make group testing more efficient. We consider a population organized in disjoint communities: each individual participates in a community, and its infection probability depends on the community (s)he participates in. Use cases include families, students who participate in several classes, and workers who share common spaces. Group testing reduces the number of tests needed to identify the infected individuals by pooling diagnostic samples and testing them together. We show that if we design the testing strategy taking into account the community structure, we can significantly reduce the number of tests needed for adaptive and non-adaptive group testing, and can improve the reliability in cases where tests are noisy.
△ Less
Submitted 16 March, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Distortion based Light-weight Security for Cyber-Physical Systems
Authors:
Gaurav Kumar Agarwal,
Mohammed Karmoose,
Suhas Diggavi,
Christina Fragouli,
Paulo Tabuada
Abstract:
In Cyber-Physical Systems (CPS), inference based on communicated data is of critical significance as it can be used to manipulate or damage the control operations by adversaries. This calls for efficient mechanisms for secure transmission of data since control systems are becoming increasingly distributed over larger geographical areas. Distortion based security, recently proposed as one candidate…
▽ More
In Cyber-Physical Systems (CPS), inference based on communicated data is of critical significance as it can be used to manipulate or damage the control operations by adversaries. This calls for efficient mechanisms for secure transmission of data since control systems are becoming increasingly distributed over larger geographical areas. Distortion based security, recently proposed as one candidate for secure transmissions in CPS, is not only more appropriate for these applications but also quite frugal in terms of prior requirements on shared keys. In this paper, we propose distortion-based metrics to protect CPS communication and show that it is possible to confuse adversaries with just a few bits of pre-shared keys. In particular, we will show that a linear dynamical system can communicate its state in a manner that prevents an eavesdropper from accurately learning the state.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Byzantine-Resilient High-Dimensional Federated Learning
Authors:
Deepesh Data,
Suhas Diggavi
Abstract:
We study stochastic gradient descent (SGD) with local iterations in the presence of malicious/Byzantine clients, motivated by the federated learning. The clients, instead of communicating with the central server in every iteration, maintain their local models, which they update by taking several SGD iterations based on their own datasets and then communicate the net update with the server, thereby…
▽ More
We study stochastic gradient descent (SGD) with local iterations in the presence of malicious/Byzantine clients, motivated by the federated learning. The clients, instead of communicating with the central server in every iteration, maintain their local models, which they update by taking several SGD iterations based on their own datasets and then communicate the net update with the server, thereby achieving communication-efficiency. Furthermore, only a subset of clients communicate with the server, and this subset may be different at different synchronization times. The Byzantine clients may collaborate and send arbitrary vectors to the server to disrupt the learning process. To combat the adversary, we employ an efficient high-dimensional robust mean estimation algorithm from Steinhardt et al.~\cite[ITCS 2018]{Resilience_SCV18} at the server to filter-out corrupt vectors; and to analyze the outlier-filtering procedure, we develop a novel matrix concentration result that may be of independent interest.
We provide convergence analyses for strongly-convex and non-convex smooth objectives in the heterogeneous data setting, where different clients may have different local datasets, and we do not make any probabilistic assumptions on data generation. We believe that ours is the first Byzantine-resilient algorithm and analysis with local iterations. We derive our convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets). We also extend our results to the case when clients compute full-batch gradients.
△ Less
Submitted 16 August, 2020; v1 submitted 21 June, 2020;
originally announced June 2020.
-
Coded Caching for Heterogeneous Wireless Networks
Authors:
Jad Hachem,
Nikhil Karamchandani,
Suhas Diggavi,
Sharayu Moharir
Abstract:
This chapter provides an overview of coded caching in the context of heterogeneous wireless networks. We begin by briefly describing the key idea behind coded caching and then discuss in detail the impact of various aspects such as non-uniform content popularity, multiple cache access, and interference.
This chapter provides an overview of coded caching in the context of heterogeneous wireless networks. We begin by briefly describing the key idea behind coded caching and then discuss in detail the impact of various aspects such as non-uniform content popularity, multiple cache access, and interference.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Algorithms for reconstruction over single and multiple deletion channels
Authors:
Sundara Rajan Srinivasavaradhan,
Michelle Du,
Suhas Diggavi,
Christina Fragouli
Abstract:
Recent advances in DNA sequencing technology and DNA storage systems have rekindled the interest in deletion channels. Multiple recent works have looked at variants of sequence reconstruction over a single and over multiple deletion channels, a notoriously difficult problem due to its highly combinatorial nature. Although works in theoretical computer science have provided algorithms which guarant…
▽ More
Recent advances in DNA sequencing technology and DNA storage systems have rekindled the interest in deletion channels. Multiple recent works have looked at variants of sequence reconstruction over a single and over multiple deletion channels, a notoriously difficult problem due to its highly combinatorial nature. Although works in theoretical computer science have provided algorithms which guarantee perfect reconstruction with multiple independent observations from the deletion channel, they are only applicable in the large blocklength regime and more restrictively, when the number of observations is also large. Indeed, with only a few observations, perfect reconstruction of the input sequence may not even be possible in most cases. In such situations, maximum likelihood (ML) and maximum aposteriori (MAP) estimates for the deletion channels are natural questions that arise and these have remained open to the best of our knowledge. In this work, we take steps to answer the two aforementioned questions. Specifically: 1. We show that solving for the ML estimate over the single deletion channel (which can be cast as a discrete optimization problem) is equivalent to solving its relaxation, a continuous optimization problem; 2. We exactly compute the symbolwise posterior distributions (under some assumptions on the priors) for both the single as well as multiple deletion channels. As part of our contributions, we also introduce tools to visualize and analyze error events, which we believe could be useful in other related problems concerning deletion channels.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
Successive Refinement of Privacy
Authors:
Antonious M. Girgis,
Deepesh Data,
Kamalika Chaudhuri,
Christina Fragouli,
Suhas Diggavi
Abstract:
This work examines a novel question: how much randomness is needed to achieve local differential privacy (LDP)? A motivating scenario is providing {\em multiple levels of privacy} to multiple analysts, either for distribution or for heavy-hitter estimation, using the \emph{same} (randomized) output. We call this setting \emph{successive refinement of privacy}, as it provides hierarchical access to…
▽ More
This work examines a novel question: how much randomness is needed to achieve local differential privacy (LDP)? A motivating scenario is providing {\em multiple levels of privacy} to multiple analysts, either for distribution or for heavy-hitter estimation, using the \emph{same} (randomized) output. We call this setting \emph{successive refinement of privacy}, as it provides hierarchical access to the raw data with different privacy levels. For example, the same randomized output could enable one analyst to reconstruct the input, while another can only estimate the distribution subject to LDP requirements. This extends the classical Shannon (wiretap) security setting to local differential privacy. We provide (order-wise) tight characterizations of privacy-utility-randomness trade-offs in several cases for distribution estimation, including the standard LDP setting under a randomness constraint. We also provide a non-trivial privacy mechanism for multi-level privacy. Furthermore, we show that we cannot reuse random keys over time while preserving privacy of each user.
△ Less
Submitted 24 May, 2020;
originally announced May 2020.
-
Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data
Authors:
Deepesh Data,
Suhas Diggavi
Abstract:
We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation prop…
▽ More
We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our {\em heterogeneous} data setting where workers compute {\em stochastic} gradients, we derive a new matrix concentration result, which may be of independent interest.
We provide convergence analyses for smooth strongly-convex and non-convex objectives. We derive our results under the bounded variance assumption on local stochastic gradients and a {\em deterministic} condition on datasets, namely, gradient dissimilarity; and for both these quantities, we provide concrete bounds in the statistical heterogeneous data model. We give a trade-off between the mini-batch size for stochastic gradients and the approximation error. Our algorithm can tolerate up to $\frac{1}{4}$ fraction Byzantine workers. It can find approximate optimal parameters in the strongly-convex setting exponentially fast and reach to an approximate stationary point in the non-convex setting with a linear speed, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting.
We also propose and analyze a Byzantine-resilient SGD algorithm with gradient compression, where workers send $k$ random coordinates of their gradients. Under mild conditions, we show a $\frac{d}{k}$-factor saving in communication bits as well as decoding complexity over our compression-free algorithm without affecting its convergence rate (order-wise) and the approximation error.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization
Authors:
Navjot Singh,
Deepesh Data,
Jemin George,
Suhas Diggavi
Abstract:
In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov's momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide conver…
▽ More
In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov's momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.
△ Less
Submitted 11 October, 2021; v1 submitted 12 May, 2020;
originally announced May 2020.
-
On Distributed Quantization for Classification
Authors:
Osama A. Hanna,
Yahya H. Ezzeldin,
Tara Sadjadpour,
Christina Fragouli,
Suhas Diggavi
Abstract:
We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that…
▽ More
We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that help the central node reconstruct the original signal as accurately as possible, our focus is not reconstruction accuracy, but instead correct classification. Our work does not make any apriori distributional assumptions on the data, but instead uses training data for the quantizer design. Our main contributions include: we prove NP-hardness of finding optimal quantizers in the general case; we design an optimal scheme for a special case; we propose quantization algorithms, that leverage discrete neural representations and training data, and can be designed in polynomial-time for any number of features, any number of classes, and arbitrary division of features across the distributed nodes. We find that tailoring the quantizers to the classification task can offer significant savings: as compared to alternatives, we can achieve more than a factor of two reduction in terms of the number of bits communicated, for the same classification accuracy.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization
Authors:
Navjot Singh,
Deepesh Data,
Jemin George,
Suhas Diggavi
Abstract:
In this paper, we propose and analyze SPARQ-SGD, which is an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD each node takes at least a fixed number ($H$) of local gradient steps and…
▽ More
In this paper, we propose and analyze SPARQ-SGD, which is an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD each node takes at least a fixed number ($H$) of local gradient steps and then checks if the model parameters have significantly changed compared to its last update; it communicates further compressed model parameters only when there is a significant change, as specified by a (design) criterion. We prove that the SPARQ-SGD converges as $O(\frac{1}{nT})$ and $O(\frac{1}{\sqrt{nT}})$ in the strongly-convex and non-convex settings, respectively, demonstrating that such aggressive compression, including event-triggered communication, model sparsification and quantization does not affect the overall convergence rate as compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for "free". We evaluate SPARQ-SGD over real datasets to demonstrate significant amount of savings in communication over the state-of-the-art.
△ Less
Submitted 24 February, 2020; v1 submitted 31 October, 2019;
originally announced October 2019.
-
Data Encoding for Byzantine-Resilient Distributed Optimization
Authors:
Deepesh Data,
Linqi Song,
Suhas Diggavi
Abstract:
We study distributed optimization in the presence of Byzantine adversaries, where both data and computation are distributed among $m$ worker machines, $t$ of which may be corrupt. The compromised nodes may collaboratively and arbitrarily deviate from their pre-specified programs, and a designated (master) node iteratively computes the model/parameter vector for generalized linear models. In this w…
▽ More
We study distributed optimization in the presence of Byzantine adversaries, where both data and computation are distributed among $m$ worker machines, $t$ of which may be corrupt. The compromised nodes may collaboratively and arbitrarily deviate from their pre-specified programs, and a designated (master) node iteratively computes the model/parameter vector for generalized linear models. In this work, we primarily focus on two iterative algorithms: Proximal Gradient Descent (PGD) and Coordinate Descent (CD). Gradient descent (GD) is a special case of these algorithms. PGD is typically used in the data-parallel setting, where data is partitioned across different samples, whereas, CD is used in the model-parallelism setting, where data is partitioned across the parameter space.
In this paper, we propose a method based on data encoding and error correction over real numbers to combat adversarial attacks. We can tolerate up to $t\leq \lfloor\frac{m-1}{2}\rfloor$ corrupt worker nodes, which is information-theoretically optimal. We give deterministic guarantees, and our method does not assume any probability distribution on the data. We develop a {\em sparse} encoding scheme which enables computationally efficient data encoding and decoding. We demonstrate a trade-off between the corruption threshold and the resource requirements (storage, computational, and communication complexity). As an example, for $t\leq\frac{m}{3}$, our scheme incurs only a {\em constant} overhead on these resources, over that required by the plain distributed PGD/CD algorithms which provide no adversarial protection. To the best of our knowledge, ours is the first paper that makes CD secure against adversarial attacks.
Our encoding scheme extends efficiently to the data streaming model and for stochastic gradient descent (SGD). We also give experimental results to show the efficacy of our proposed schemes.
△ Less
Submitted 4 November, 2020; v1 submitted 4 July, 2019;
originally announced July 2019.
-
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Authors:
Debraj Basu,
Deepesh Data,
Can Karakus,
Suhas Diggavi
Abstract:
Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper, we propose \emph{Qsparse-local-SGD} algorithm, which combines aggressive spars…
▽ More
Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper, we propose \emph{Qsparse-local-SGD} algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients. We propose both synchronous and asynchronous implementations of \emph{Qsparse-local-SGD}. We analyze convergence for \emph{Qsparse-local-SGD} in the \emph{distributed} setting for smooth non-convex and convex objective functions. We demonstrate that \emph{Qsparse-local-SGD} converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. We use \emph{Qsparse-local-SGD} to train ResNet-50 on ImageNet and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.
△ Less
Submitted 2 November, 2019; v1 submitted 5 June, 2019;
originally announced June 2019.
-
Securing State Estimation Under Sensor and Actuator Attacks: Theory and Design
Authors:
Mehrdad Showkatbakhsh,
Yasser Shoukry,
Suhas Diggavi,
Paulo Tabuada
Abstract:
This paper discusses the problem of estimating the state of a linear time-invariant system when some of its sensors and actuators are compromised by an adversarial agent. In the model considered in this paper, the malicious agent attacks an input (output) by manipulating its value arbitrarily, i.e., we impose no constraints (statistical or otherwise) on how control commands (sensor measurements) a…
▽ More
This paper discusses the problem of estimating the state of a linear time-invariant system when some of its sensors and actuators are compromised by an adversarial agent. In the model considered in this paper, the malicious agent attacks an input (output) by manipulating its value arbitrarily, i.e., we impose no constraints (statistical or otherwise) on how control commands (sensor measurements) are changed by the adversary. In the first part of this paper, we introduce the notion of sparse strong observability and we show that is a necessary and sufficient condition for correctly reconstructing the state despite the considered attacks. In the second half of this work, we propose an estimator to harness the complexity of this intrinsically combinatorial problem, by leveraging satisfiability modulo theory solving. Numerical simulations demonstrate the effectiveness and scalability of our estimator.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Differentially Private Consensus-Based Distributed Optimization
Authors:
Mehrdad Showkatbakhsh,
Can Karakus,
Suhas Diggavi
Abstract:
Data privacy is an important concern in learning, when datasets contain sensitive information about individuals. This paper considers consensus-based distributed optimization under data privacy constraints. Consensus-based optimization consists of a set of computational nodes arranged in a graph, each having a local objective that depends on their local data, where in every step nodes take a linea…
▽ More
Data privacy is an important concern in learning, when datasets contain sensitive information about individuals. This paper considers consensus-based distributed optimization under data privacy constraints. Consensus-based optimization consists of a set of computational nodes arranged in a graph, each having a local objective that depends on their local data, where in every step nodes take a linear combination of their neighbors' messages, as well as taking a new gradient step. Since the algorithm requires exchanging messages that depend on local data, private information gets leaked at every step. Taking $(ε, δ)$-differential privacy (DP) as our criterion, we consider the strategy where the nodes add random noise to their messages before broadcasting it, and show that the method achieves convergence with a bounded mean-squared error, while satisfying $(ε, δ)$-DP. By relaxing the more stringent $ε$-DP requirement in previous work, we strengthen a known convergence result in the literature. We conclude the paper with numerical results demonstrating the effectiveness of our methods for mean estimation.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
Privacy-Utility Trade-off of Linear Regression under Random Projections and Additive Noise
Authors:
Mehrdad Showkatbakhsh,
Can Karakus,
Suhas Diggavi
Abstract:
Data privacy is an important concern in machine learning, and is fundamentally at odds with the task of training useful learning models, which typically require the acquisition of large amounts of private user data. One possible way of fulfilling the machine learning task while preserving user privacy is to train the model on a transformed, noisy version of the data, which does not reveal the data…
▽ More
Data privacy is an important concern in machine learning, and is fundamentally at odds with the task of training useful learning models, which typically require the acquisition of large amounts of private user data. One possible way of fulfilling the machine learning task while preserving user privacy is to train the model on a transformed, noisy version of the data, which does not reveal the data itself directly to the training procedure. In this work, we analyze the privacy-utility trade-off of two such schemes for the problem of linear regression: additive noise, and random projections. In contrast to previous work, we consider a recently proposed notion of differential privacy that is based on conditional mutual information (MI-DP), which is stronger than the conventional $(ε, δ)$-differential privacy, and use relative objective error as the utility metric. We find that projecting the data to a lower-dimensional subspace before adding noise attains a better trade-off in general. We also make a connection between privacy problem and (non-coherent) SIMO, which has been extensively studied in wireless communication, and use tools from there for the analysis. We present numerical results demonstrating the performance of the schemes.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
On the Generalized Degrees of Freedom of Noncoherent Interference Channel
Authors:
Joyson Sebastian,
Suhas Diggavi
Abstract:
We study the generalized degrees of freedom (gDoF) of the block-fading noncoherent 2-user interference channel (IC) with a coherence time of T symbol durations and symmetric fading statistics. We demonstrate that a natural training-based scheme for the noncoherent IC, is suboptimal in several regimes. We study and analyze several alternate schemes: the first is a new noncoherent scheme using rate-…
▽ More
We study the generalized degrees of freedom (gDoF) of the block-fading noncoherent 2-user interference channel (IC) with a coherence time of T symbol durations and symmetric fading statistics. We demonstrate that a natural training-based scheme for the noncoherent IC, is suboptimal in several regimes. We study and analyze several alternate schemes: the first is a new noncoherent scheme using rate-splitting. We also consider a scheme that treats interference-as-noise (TIN) and a time division multiplexing (TDM) scheme. We show that a standard training-based scheme for the noncoherent IC is outperformed by one of these schemes in several regimes: our results demonstrate that in the very weak interference regime, the TIN scheme is the best; in the strong interference regime, the TDM scheme and the noncoherent rate-splitting scheme give better performance; in other cases either of the TIN, TDM or noncoherent rate-splitting scheme could be preferred. We also study the noncoherent IC with feedback and propose another noncoherent rate-splitting scheme. Again for the feedback case, our results demonstrate that a natural training-based scheme can be outperformed by other schemes.
△ Less
Submitted 10 May, 2021; v1 submitted 9 December, 2018;
originally announced December 2018.
-
Distorting an Adversary's View in Cyber-Physical Systems
Authors:
Gaurav Kumar Agarwal,
Mohammed Karmoose,
Suhas Diggavi,
Christina Fragouli,
Paulo Tabuada
Abstract:
In Cyber-Physical Systems (CPSs), inference based on communicated data is of critical significance as it can be used to manipulate or damage the control operations by adversaries. This calls for efficient mechanisms for secure transmission of data since control systems are becoming increasingly distributed over larger geographical areas. Distortion based security, recently proposed as one candidat…
▽ More
In Cyber-Physical Systems (CPSs), inference based on communicated data is of critical significance as it can be used to manipulate or damage the control operations by adversaries. This calls for efficient mechanisms for secure transmission of data since control systems are becoming increasingly distributed over larger geographical areas. Distortion based security, recently proposed as one candidate for CPSs security, is not only more appropriate for these applications but also quite frugal in terms of prior requirements on shared keys. In this paper, we propose distortion-based metrics to protect CPSs communication and show that it is possible to confuse adversaries with just a few bits of pre-shared keys.
△ Less
Submitted 12 September, 2018;
originally announced September 2018.
-
Implementation and Analysis of Stable PUFs Using Gate Oxide Breakdown
Authors:
Wei-Che Wang,
Yair Yona,
Yizhang Wu,
Suhas Diggavi,
Puneet Gupta
Abstract:
We implement and analyze highly stable PUFs using two random gate oxide breakdown mechanisms: plasma induced breakdown and voltage stressed breakdown. These gate oxide breakdown PUFs can be easily implemented in commercial silicon processes, and they are highly stable. We fabricated bit generation units for the stable PUFs on 99 testchips with 65nm CMOS bulk technology. Measurement results show th…
▽ More
We implement and analyze highly stable PUFs using two random gate oxide breakdown mechanisms: plasma induced breakdown and voltage stressed breakdown. These gate oxide breakdown PUFs can be easily implemented in commercial silicon processes, and they are highly stable. We fabricated bit generation units for the stable PUFs on 99 testchips with 65nm CMOS bulk technology. Measurement results show that the plasma induced breakdown can generate complete stable responses. For the voltage stressed breakdown, the responses are with 0.12\% error probability at a worst case corner, which can be effectively accommodated by taking the majority vote from multiple measurements. Both PUFs show significant area reduction compared to SRAM PUF. We compare methods for evaluating the security level of PUFs such as min-entropy, mutual information and guesswork as well as inter- and intra-FHD, and the popular NIST test suite. We show that guesswork can be viewed as a generalization of min-entropy and mutual information. In addition, we analyze our testchip data and show through various statistical distance measures that the bits are independent. Finally, we propose guesswork as a new statistical measure for the level of statistical independence that also has an operational meaning in terms of security.
△ Less
Submitted 4 August, 2018;
originally announced August 2018.
-
Energy-Efficiency Gains of Caching for Interference Channels
Authors:
Jad Hachem,
Urs Niesen,
Suhas Diggavi
Abstract:
This paper initiates the study of energy-efficiency gains provided by caching. We focus on the cache-aided Gaussian interference channel in the low-SNR regime. We propose a strategy that creates content overlaps at the transmitter caches to allow for co-operation between the transmitters. This co-operation yields a beamforming gain, which has to be traded off against a multicasting gain. We evalua…
▽ More
This paper initiates the study of energy-efficiency gains provided by caching. We focus on the cache-aided Gaussian interference channel in the low-SNR regime. We propose a strategy that creates content overlaps at the transmitter caches to allow for co-operation between the transmitters. This co-operation yields a beamforming gain, which has to be traded off against a multicasting gain. We evaluate the performance of this strategy and show its approximate optimality in both the single-receiver case and the single-transmitter case.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Using mm-Waves for Secret Key Establishment
Authors:
Mohammed Karmoose,
Christina Fragouli,
Suhas Diggavi,
Rafael Misoczki,
Lily L. Yang,
Zhenliang Zhang
Abstract:
The fact that Millimeter Wave (mmWave) communication needs to be directional is usually perceived as a challenge; in this paper we argue that it enables efficient secret key sharing that are unconditionally secure from passive eavesdroppers, by building on packet erasures. We showcase the potential of our approach in two setups: mmWave-based WiFi networks and vehicle platooning. We show that in th…
▽ More
The fact that Millimeter Wave (mmWave) communication needs to be directional is usually perceived as a challenge; in this paper we argue that it enables efficient secret key sharing that are unconditionally secure from passive eavesdroppers, by building on packet erasures. We showcase the potential of our approach in two setups: mmWave-based WiFi networks and vehicle platooning. We show that in the first case, we can establish a few hundred secret bits with minimal changes to standard communication protocol; while in both cases, with the right choice of parameters, we can potentially establish keys in the order of tenths of Mbps. These first results are based on some simplifying assumptions, yet we believe they give incentives to further explore such techniques.
△ Less
Submitted 1 May, 2019; v1 submitted 21 March, 2018;
originally announced March 2018.
-
Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning
Authors:
Can Karakus,
Yifan Sun,
Suhas Diggavi,
Wotao Yin
Abstract:
Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left out of the computation at ev…
▽ More
Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left out of the computation at every iteration, whose loss is compensated by the embedded redundancy. We show that oblivious application of several popular optimization algorithms on encoded data, including gradient descent, L-BFGS, proximal gradient under data parallelism, and coordinate descent under model parallelism, converge to either approximate or exact solutions of the original problem when stragglers are treated as erasures. These convergence results are deterministic, i.e., they establish sample path convergence for arbitrary sequences of delay patterns or distributions on the nodes, and are independent of the tail behavior of the delay distribution. We demonstrate that equiangular tight frames have desirable properties as encoding matrices, and propose efficient mechanisms for encoding large-scale data. We implement the proposed technique on Amazon EC2 clusters, and demonstrate its performance over several learning problems, including matrix factorization, LASSO, ridge regression and logistic regression, and compare the proposed method with uncoded, asynchronous, and data replication strategies.
△ Less
Submitted 14 March, 2018;
originally announced March 2018.
-
Generalized Degrees of Freedom of Noncoherent Diamond Networks
Authors:
Joyson Sebastian,
Suhas Diggavi
Abstract:
We study the generalized degrees of freedom (gDoF) of the block-fading noncoherent diamond (parallel relay) wireless network with asymmetric distributions of link strengths, and a coherence time of T symbol duration. We first derive an outer bound for this channel and then derive the optimal signaling structure for this outer bound. Using the optimal signaling structure we solve the outer bound op…
▽ More
We study the generalized degrees of freedom (gDoF) of the block-fading noncoherent diamond (parallel relay) wireless network with asymmetric distributions of link strengths, and a coherence time of T symbol duration. We first derive an outer bound for this channel and then derive the optimal signaling structure for this outer bound. Using the optimal signaling structure we solve the outer bound optimization problem in terms of its gDoF. Using insights from our outer bound signaling solution, we devise an achievability strategy based on a novel scheme that we call train-scale quantize-map-forward (TS-QMF). This uses training in the links from the source to the relays, scaling and quantizing at the relays combined with nontraining-based schemes. We show the optimality of this scheme with respect to the outer bound in terms of the gDoF. In noncoherent point-to-point multiple-input-multiple-output (MIMO) channels, where the fading channel is unknown to transmitter and receiver, an important tradeoff between communication and channel learning was revealed by Zheng and Tse, by demonstrating that not all the available antennas might be used, as it is suboptimal to learn all their channel parameters. Our results in this paper for the diamond network demonstrates that in certain regimes the optimal scheme uses a subnetwork, demonstrating a tradeoff between channel learning and communications. In some regimes, it is gDoF optimal to do relay selection, i.e, use a part of the network. In the other regimes, even when it is essential to use the entire network, it is suboptimal to learn the channel states for all the links in the network, i.e, traditional training-based schemes are suboptimal in these regimes.
△ Less
Submitted 19 February, 2020; v1 submitted 7 February, 2018;
originally announced February 2018.
-
Straggler Mitigation in Distributed Optimization Through Data Encoding
Authors:
Can Karakus,
Yifan Sun,
Suhas Diggavi,
Wotao Yin
Abstract:
Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an al…
▽ More
Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an alternate approach where we embed the redundancy directly in the data itself, and allow the computation to proceed completely oblivious to encoding. We propose several encoding schemes, and demonstrate that popular batch algorithms, such as gradient descent and L-BFGS, applied in a coding-oblivious manner, deterministically achieve sample path linear convergence to an approximate solution of the original problem, using an arbitrarily varying subset of the nodes at each iteration. Moreover, this approximation can be controlled by the amount of redundancy and the number of nodes used in each iteration. We provide experimental results demonstrating the advantage of the approach over uncoded and data replication strategies.
△ Less
Submitted 22 January, 2018; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Approximate Capacity of Fast Fading Interference Channels with No Instantaneous CSIT
Authors:
Joyson Sebastian,
Can Karakus,
Suhas Diggavi
Abstract:
We develop a characterization of fading models, which assigns a number called logarithmic Jensen's gap to a given fading model. We show that as a consequence of a finite logarithmic Jensen's gap, approximate capacity region can be obtained for fast fading interference channels (FF-IC) for several scenarios. We illustrate three instances where a constant capacity gap can be obtained as a function o…
▽ More
We develop a characterization of fading models, which assigns a number called logarithmic Jensen's gap to a given fading model. We show that as a consequence of a finite logarithmic Jensen's gap, approximate capacity region can be obtained for fast fading interference channels (FF-IC) for several scenarios. We illustrate three instances where a constant capacity gap can be obtained as a function of the logarithmic Jensen's gap. Firstly for an FF-IC with neither feedback nor instantaneous channel state information at transmitter (CSIT), if the fading distribution has finite logarithmic Jensen's gap, we show that a rate-splitting scheme based on average interference-to-noise ratio (inr) can achieve its approximate capacity. Secondly we show that a similar scheme can achieve the approximate capacity of FF-IC with feedback and delayed CSIT, if the fading distribution has finite logarithmic Jensen's gap. Thirdly, when this condition holds, we show that point-to-point codes can achieve approximate capacity for a class of FF-IC with feedback. We prove that the logarithmic Jensen's gap is finite for common fading models, including Rayleigh and Nakagami fading, thereby obtaining the approximate capacity region of FF-IC with these fading models. For Rayleigh fading the capacity gap is obtained as 1.83 bits per channel use for non-feedback case and 2.83 bits per channel use for feedback case. Our analysis also yields approximate capacity results for fading 2-tap ISI channel and fading interference multiple access channel as corollaries.
△ Less
Submitted 3 June, 2018; v1 submitted 12 June, 2017;
originally announced June 2017.
-
Models and information-theoretic bounds for nanopore sequencing
Authors:
Wei Mao,
Suhas Diggavi,
Sreeram Kannan
Abstract:
Nanopore sequencing is an emerging new technology for sequencing DNA, which can read long fragments of DNA (~50,000 bases) in contrast to most current short-read sequencing technologies which can only read hundreds of bases. While nanopore sequencers can acquire long reads, the high error rates (20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is migrated through a nanopore and…
▽ More
Nanopore sequencing is an emerging new technology for sequencing DNA, which can read long fragments of DNA (~50,000 bases) in contrast to most current short-read sequencing technologies which can only read hundreds of bases. While nanopore sequencers can acquire long reads, the high error rates (20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is migrated through a nanopore and current variations are measured. The DNA sequence is inferred from this observed current pattern using an algorithm called a base-caller. In this paper, we propose a mathematical model for the "channel" from the input DNA sequence to the observed current, and calculate bounds on the information extraction capacity of the nanopore sequencer. This model incorporates impairments like (non-linear) inter-symbol interference, deletions, as well as random response. These information bounds have two-fold application: (1) The decoding rate with a uniform input distribution can be used to calculate the average size of the plausible list of DNA sequences given an observed current trace. This bound can be used to benchmark existing base-calling algorithms, as well as serving a performance objective to design better nanopores. (2) When the nanopore sequencer is used as a reader in a DNA storage system, the storage capacity is quantified by our bounds.
△ Less
Submitted 17 February, 2018; v1 submitted 31 May, 2017;
originally announced May 2017.
-
Generalized Degrees Freedom of Noncoherent MIMO Channels with Asymmetric Link Strengths
Authors:
Joyson Sebastian,
Suhas. N. Diggavi
Abstract:
We study the generalized degrees of freedom (gDoF) of block-fading noncoherent multiple input multiple output (MIMO) channels with asymmetric distributions of link strengths and a coherence time of T symbol durations. We derive the optimal signaling structure for communication for the asymmetric MIMO channel, which is distinct from that for the MIMO channel with independent and identically distrib…
▽ More
We study the generalized degrees of freedom (gDoF) of block-fading noncoherent multiple input multiple output (MIMO) channels with asymmetric distributions of link strengths and a coherence time of T symbol durations. We derive the optimal signaling structure for communication for the asymmetric MIMO channel, which is distinct from that for the MIMO channel with independent and identically distributed (i.i.d.) links. We extend the existing results for the single input multiple output (SIMO) channel with i.i.d. links to the asymmetric case, proving that selecting the statistically best antenna is gDoF-optimal. Using the gDoF result for the SIMO channel, we prove that for T=1, the gDoF is zero for MIMO channels with arbitrary link strengths., extending the result for MIMO with i.i.d. links We show that selecting the statistically best antenna is gDoF-optimal for the multiple input single output (MISO) channel. We also derive the gDoF for the 2X2 MIMO channel with different exponents in the direct and cross links. In this setting, we show that it is always necessary to use both the antennas to achieve the gDoF, in contrast to the results for the 2X2 MIMO channel with i.i.d. links. We show that having weaker crosslinks, gives gDoF gain compared to the case with i.i.d. links. For the noncoherent MIMO channel with i.i.d. links, the traditional method of training each transmit antenna independently is degrees of freedom (DoF) optimal, whereas we observe that for the asymmetric 2X2 MIMO channel, the traditional training is not gDoF-optimal. We extend this observation to a larger MX M MIMO channel by demonstrating a strategy that can achieve larger gDoF than a traditional training-based method.
△ Less
Submitted 18 November, 2019; v1 submitted 20 May, 2017;
originally announced May 2017.