Skip to main content

Showing 1–50 of 74 results for author: Thrampoulidis, C

  1. arXiv:2410.10024  [pdf, other

    cs.LG cs.IT stat.ML

    Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods

    Authors: Hossein Taheri, Christos Thrampoulidis, Arya Mazumdar

    Abstract: In this paper, we study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation. Our first result is a novel bound on the excess risk of deep networks trained by the logistic loss, via an alogirthmic stability analysis. Compared to previous works, our results improve upon the shortcomings of the well-established Rademacher complexit… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  2. arXiv:2410.09344  [pdf, other

    cs.LG cs.AI cs.CL

    DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

    Authors: Wenlong Deng, Yize Zhao, Vala Vakilian, Minghui Chen, Xiaoxiao Li, Christos Thrampoulidis

    Abstract: Storing open-source fine-tuned models separately introduces redundancy and increases response times in applications utilizing multiple models. Delta-parameter pruning (DPP), particularly the random drop and rescale (DARE) method proposed by Yu et al., addresses this by pruning the majority of delta parameters--the differences between fine-tuned and pre-trained model weights--while typically mainta… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  3. arXiv:2408.15417  [pdf, other

    cs.CL cs.LG

    Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

    Authors: Yize Zhao, Tina Behnia, Vala Vakilian, Christos Thrampoulidis

    Abstract: Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models. Yet, it remains unclear how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations. We frame training of large language models as soft-label classification over sparse probabilistic label vectors, coupled with an analytical appr… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted at COLM 2024

  4. arXiv:2405.13718  [pdf, other

    cs.LG math.OC

    Next-token prediction capacity: general upper bounds and a lower bound for transformers

    Authors: Liam Madden, Curtis Fox, Christos Thrampoulidis

    Abstract: Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributi… ▽ More

    Submitted 16 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: V2: added a section axiomatizing the probabilistic assumptions of next-token prediction

    MSC Class: 15A03; 26B35

  5. arXiv:2402.18884  [pdf, ps, other

    cs.LG stat.ML

    Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features

    Authors: Tina Behnia, Christos Thrampoulidis

    Abstract: Recent findings reveal that over-parameterized deep neural networks, trained beyond zero training-error, exhibit a distinctive structural pattern at the final layer, termed as Neural-collapse (NC). These results indicate that the final hidden-layer outputs in such networks display minimal within-class variations over the training set. While existing research extensively investigates this phenomeno… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 10 pages

  6. arXiv:2402.18551  [pdf, other

    cs.LG cs.CL stat.ML

    Implicit Bias of Next-Token Prediction

    Authors: Christos Thrampoulidis

    Abstract: Next-token prediction (NTP), the go-to training paradigm in training large language models, involves predicting the next token in a sequence. Departing from traditional one-hot classification, in NTP, multiple tokens with varying frequencies follow each given context. This work frames NTP training as cross-entropy minimization over distinct contexts, each associated with a sparse empirical probabi… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  7. arXiv:2402.14208  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    LLM-Assisted Content Conditional Debiasing for Fair Text Embedding

    Authors: Wenlong Deng, Blair Chen, Beidi Zhao, Chiyu Zhang, Xiaoxiao Li, Christos Thrampoulidis

    Abstract: Mitigating biases in machine learning models has become an increasing concern in Natural Language Processing (NLP), particularly in developing fair text embeddings, which are crucial yet challenging for real-world applications like search engines. In response, this paper proposes a novel method for learning fair text embeddings. First, we define a novel content-conditional equal distance (CCED) fa… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  8. arXiv:2402.05738  [pdf, other

    cs.LG math.OC stat.ML

    Implicit Bias and Fast Convergence Rates for Self-attention

    Authors: Bhavya Vasudeva, Puneesh Deora, Christos Thrampoulidis

    Abstract: Self-attention, the core mechanism of transformers, distinguishes them from traditional neural networks and drives their outstanding performance. Towards developing the fundamental optimization principles of self-attention, we investigate the implicit bias of gradient descent (GD) in training a self-attention layer with fixed linear decoder in binary classification. Drawing inspiration from the st… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 41 pages, 7 figures

  9. arXiv:2401.14343  [pdf, other

    cs.LG cs.CY stat.ML

    Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

    Authors: Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

    Abstract: Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting,… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures

  10. arXiv:2310.18285  [pdf, other

    cs.LG cs.CV

    Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

    Authors: Wenlong Deng, Christos Thrampoulidis, Xiaoxiao Li

    Abstract: Vision Transformers (ViT) and Visual Prompt Tuning (VPT) achieve state-of-the-art performance with improved efficiency in various computer vision tasks. This suggests a promising paradigm shift of adapting pre-trained ViT models to Federated Learning (FL) settings. However, the challenge of data heterogeneity among FL clients presents a significant hurdle in effectively deploying ViT models. Exist… ▽ More

    Submitted 24 February, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

  11. arXiv:2310.12680  [pdf, other

    cs.LG math.OC stat.ML

    On the Optimization and Generalization of Multi-head Attention

    Authors: Puneesh Deora, Rouzbeh Ghaderi, Hossein Taheri, Christos Thrampoulidis

    Abstract: The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attent… ▽ More

    Submitted 12 October, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Corrected miscalculation of Hessian upper bound in Proposition 5

  12. arXiv:2310.00893  [pdf, other

    cs.LG

    Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss

    Authors: Jaidev Gill, Vala Vakilian, Christos Thrampoulidis

    Abstract: Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks that makes use of similarities in the embedding space to allow for richer representations. In this work, we propose methods to engineer the geometry of these learnt feature embeddings by modifying the contrastive loss. In pursuit of adjusting the geometry we explore the impact of prototypes, fixed em… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 5 pages, 3 figures

  13. arXiv:2308.16898  [pdf, other

    cs.LG cs.AI cs.CL math.OC

    Transformers as Support Vector Machines

    Authors: Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak

    Abstract: Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise similarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence… ▽ More

    Submitted 22 February, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: The proof of global convergence for gradient descent in the equal score setting has been fixed, referring to Theorem 2 of [TLZO23], and the experimental results have been extended

  14. arXiv:2308.02001  [pdf, other

    cs.LG math.OC

    Memory capacity of two layer neural networks with smooth activations

    Authors: Liam Madden, Christos Thrampoulidis

    Abstract: Determining the memory capacity of two layer neural networks with $m$ hidden neurons and input dimension $d$ (i.e., $md+2m$ total trainable parameters), which refers to the largest size of general data the network can memorize, is a fundamental machine learning question. For activations that are real analytic at a point and, if restricting to a polynomial there, have sufficiently high degree, we e… ▽ More

    Submitted 1 May, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: V3: the result was generalized to activations which are real analytic at a point by including a bias vector. The presentation and rigor were also improved

    MSC Class: 15A03; 26B10

    Journal ref: SIAM Journal on Mathematics of Data Science, 6(3):679-702, 2024

  15. arXiv:2306.07960  [pdf, other

    cs.LG stat.ML

    Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching

    Authors: Ganesh Ramachandra Kini, Vala Vakilian, Tina Behnia, Jaidev Gill, Christos Thrampoulidis

    Abstract: Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectiv… ▽ More

    Submitted 18 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: change of title and additional experimental results

  16. arXiv:2306.03435  [pdf, other

    cs.LG cs.CL stat.ML

    On the Role of Attention in Prompt-tuning

    Authors: Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis

    Abstract: Prompt-tuning is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting. In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mi… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Published at ICML 2023

  17. arXiv:2306.02010  [pdf, other

    cs.LG

    Memorization Capacity of Multi-Head Attention in Transformers

    Authors: Sadegh Mahdavi, Renjie Liao, Christos Thrampoulidis

    Abstract: Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive. This paper investigates the memorization abilities of multi-head attention mechanisms, examining how many example sequences they can memorize, as a function of the number of heads and sequence length. Motivated by experimental findings o… ▽ More

    Submitted 2 March, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 (Spotlight)

  18. arXiv:2305.18666  [pdf, other

    cs.LG math.OC

    BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

    Authors: Chen Fan, Gaspard Choné-Ducasse, Mark Schmidt, Christos Thrampoulidis

    Abstract: The popularity of bi-level optimization (BO) in deep learning has spurred a growing interest in studying gradient-based BO algorithms. However, existing algorithms involve two coupled learning rates that can be affected by approximation errors when computing hypergradients, making careful fine-tuning necessary to ensure fast convergence. To alleviate this issue, we investigate the use of recently… ▽ More

    Submitted 2 November, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  19. arXiv:2305.13471  [pdf, other

    cs.LG

    Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

    Authors: Hossein Taheri, Christos Thrampoulidis

    Abstract: Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate… ▽ More

    Submitted 26 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  20. arXiv:2304.00459  [pdf, other

    cs.LG math.OC

    Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

    Authors: Chen Fan, Christos Thrampoulidis, Mark Schmidt

    Abstract: Modern machine learning models are often over-parameterized and as a result they can interpolate the training data. Under such a scenario, we study the convergence properties of a sampling-without-replacement variant of stochastic gradient descent (SGD) known as random reshuffling (RR). Unlike SGD that samples data with replacement at every iteration, RR chooses a random permutation of data at the… ▽ More

    Submitted 2 April, 2023; originally announced April 2023.

  21. arXiv:2303.07608  [pdf, other

    cs.LG

    On the Implicit Geometry of Cross-Entropy Parameterizations for Label-Imbalanced Data

    Authors: Tina Behnia, Ganesh Ramachandra Kini, Vala Vakilian, Christos Thrampoulidis

    Abstract: Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE for training large models on label-imbalanced data far beyond the zero train error regime. The driving force behind those designs has been the theory of implicit bias, which for linear(ized) models, explains why they successfully induce bias on the optimization path towards sol… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Short version of this accepted at AISTATS 2023

  22. arXiv:2302.09235  [pdf, ps, other

    stat.ML cs.LG

    Generalization and Stability of Interpolating Neural Networks with Minimal Width

    Authors: Hossein Taheri, Christos Thrampoulidis

    Abstract: We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $ε$ and their distance from initialization is $g(ε)$, we demonstrate that gradient descent with $n$ training data achieves training error… ▽ More

    Submitted 27 March, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: With significant changes: Stating results without homogeneity assumption, Discussing results under NTK-separability in Section 4

  23. arXiv:2211.00692  [pdf, other

    cs.LG

    Towards Better Out-of-Distribution Generalization of Neural Algorithmic Reasoning Tasks

    Authors: Sadegh Mahdavi, Kevin Swersky, Thomas Kipf, Milad Hashemi, Christos Thrampoulidis, Renjie Liao

    Abstract: In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal is to learn an algorithm (e.g., sorting, breadth-first search, and depth-first search) from input-output pairs using deep neural networks. First, we argue that OOD generalization in this setting is significantly different than common OOD settings. For example, some phenomena in OOD generalization o… ▽ More

    Submitted 18 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Transactions on Machine Learning Research (TMLR), 2023

  24. arXiv:2209.07116  [pdf, other

    cs.LG cs.DC cs.MA eess.SP

    On Generalization of Decentralized Learning with Separable Data

    Authors: Hossein Taheri, Christos Thrampoulidis

    Abstract: Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentr… ▽ More

    Submitted 27 March, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Minor changes: fixing typos, few more references. Title changed to the title of conference version

  25. arXiv:2208.05512  [pdf, other

    cs.LG stat.ML

    Imbalance Trouble: Revisiting Neural-Collapse Geometry

    Authors: Christos Thrampoulidis, Ganesh R. Kini, Vala Vakilian, Tina Behnia

    Abstract: Neural Collapse refers to the remarkable structural properties characterizing the geometry of class embeddings and classifier weights, found by deep nets when trained beyond zero training error. However, this characterization only holds for balanced data. Here we thus ask whether it can be made invariant to class imbalances. Towards this end, we adopt the unconstrained-features model (UFM), a rece… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: 55 pages, 21 figures

  26. arXiv:2206.12739  [pdf, other

    cs.LG

    On how to avoid exacerbating spurious correlations when models are overparameterized

    Authors: Tina Behnia, Ke Wang, Christos Thrampoulidis

    Abstract: Overparameterized models fail to generalize well in the presence of data imbalance even when combined with traditional techniques for mitigating imbalances. This paper focuses on imbalanced classification datasets, in which a small subset of the population -- a minority -- may contain features that correlate spuriously with the class label. For a parametric family of cross-entropy loss modificatio… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Comments: Short version published at ISIT 2022

  27. arXiv:2205.12808  [pdf, other

    cs.LG

    Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

    Authors: Haoyuan Sun, Kwangjun Ahn, Christos Thrampoulidis, Navid Azizan

    Abstract: Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question. To this end, there has been substantial effort to characterize the implicit bias of the optimization algorithms used, such as gradient descent (GD), and the structural properties of their preferred solutions. Thi… ▽ More

    Submitted 29 September, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Journal ref: Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  28. arXiv:2205.06326  [pdf, other

    cs.LG

    Multi-Environment Meta-Learning in Stochastic Linear Bandits

    Authors: Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh

    Abstract: In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when tas… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Journal ref: IEEE International Symposium on Information Theory (ISIT), 2022

  29. arXiv:2205.02215  [pdf, other

    cs.LG math.OC

    FedNest: Federated Bilevel, Minimax, and Compositional Optimization

    Authors: Davoud Ataee Tarzanagh, Mingchen Li, Christos Thrampoulidis, Samet Oymak

    Abstract: Standard federated optimization methods successfully apply to stochastic problems with single-level structure. However, many contemporary ML problems -- including adversarial robustness, hyperparameter tuning, and actor-critic -- fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose \fedblo: A federated alternating stochastic gradient… ▽ More

    Submitted 13 September, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: ICML 2022 (accepted as a long presentation), 34 pages, 6 figures

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:21146-21179, 2022

  30. arXiv:2201.01212  [pdf, other

    cs.LG

    AutoBalance: Optimized Loss Functions for Imbalanced Data

    Authors: Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, Samet Oymak

    Abstract: Imbalanced datasets are commonplace in modern machine learning problems. The presence of under-represented classes or groups with sensitive attributes results in concerns about generalization and fairness. Such concerns are further exacerbated by the fact that large capacity deep nets can perfectly fit the training data and appear to achieve perfect accuracy and fairness during training, but perfo… ▽ More

    Submitted 4 January, 2022; originally announced January 2022.

  31. arXiv:2109.09859  [pdf, other

    math.OC math.PR math.ST stat.ML

    Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective

    Authors: Kabir Aladin Chandrasekher, Ashwin Pananjady, Christos Thrampoulidis

    Abstract: We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of fitting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms for this task from a random initialization. In particular, provided each iteration can be written as the solution to a convex optimization problem satisfying s… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  32. arXiv:2106.10865  [pdf, other

    stat.ML cs.IT cs.LG

    Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation

    Authors: Ke Wang, Vidya Muthukumar, Christos Thrampoulidis

    Abstract: The literature on "benign overfitting" in overparameterized models has been mostly restricted to regression or binary classification; however, modern machine learning operates in the multiclass setting. Motivated by this discrepancy, we study benign overfitting in multiclass linear classification. Specifically, we consider the following training algorithms on separable data: (i) empirical risk min… ▽ More

    Submitted 11 July, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Corrected subtle issue in the proofs of Lemmas 4 an 5. Relaxed Assumptions 1 and 2 and added error bound for ETF geometry

  33. arXiv:2106.06239  [pdf, ps, other

    cs.LG stat.ML

    Safe Reinforcement Learning with Linear Function Approximation

    Authors: Sanae Amani, Christos Thrampoulidis, Lin F. Yang

    Abstract: Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unkn… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  34. arXiv:2103.11489  [pdf, ps, other

    cs.LG stat.ML

    UCB-based Algorithms for Multinomial Logistic Regression Bandits

    Authors: Sanae Amani, Christos Thrampoulidis

    Abstract: Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

    Comments: 27 pages, 5 figures

  35. arXiv:2103.01550  [pdf, other

    cs.LG stat.ML

    Label-Imbalanced and Group-Sensitive Classification under Overparameterization

    Authors: Ganesh Ramachandra Kini, Orestis Paraskevas, Samet Oymak, Christos Thrampoulidis

    Abstract: The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity. Classical methods, such as weighted cross-entropy, fail when training deep nets to the terminal phase of training (TPT), that is training beyond zero training error. This observation has motivated recent flurry of activity in developing heuristic alternative… ▽ More

    Submitted 8 November, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  36. arXiv:2012.08749  [pdf, other

    cs.LG stat.ML

    Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks

    Authors: Xiangyu Chang, Yingcong Li, Samet Oymak, Christos Thrampoulidis

    Abstract: Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models. Specifically, it suggests that overparameterization benefits model pruning / sparsification. This paper… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: to appear at AAAI 2021

  37. arXiv:2012.00314  [pdf, ps, other

    cs.LG stat.ML

    Decentralized Multi-Agent Linear Bandits with Safety Constraints

    Authors: Sanae Amani, Christos Thrampoulidis

    Abstract: We study decentralized stochastic linear bandits, where a network of $N$ agents acts cooperatively to efficiently solve a linear bandit-optimization problem over a $d$-dimensional space. For this problem, we propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. At each round of the algorithm each agent chooses its actions following an upper co… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  38. arXiv:2011.09148  [pdf, other

    stat.ML cs.LG math.ST

    Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization

    Authors: Ke Wang, Christos Thrampoulidis

    Abstract: Deep neural networks generalize well despite being exceedingly overparameterized and being trained without explicit regularization. This curious phenomenon has inspired extensive research activity in establishing its statistical principles: Under what conditions is it observed? How do these depend on the data and on the training algorithm? When does regularization benefit generalization? While suc… ▽ More

    Submitted 15 September, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: New results: Extensions to model with label noise

  39. arXiv:2011.07729  [pdf, other

    cs.LG math.ST stat.ML

    Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

    Authors: Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi

    Abstract: Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: To Appear at NeurIPS 2020. 62 pages, 7 figures

  40. arXiv:2010.13275  [pdf, other

    stat.ML cs.IT cs.LG eess.SP

    Asymptotic Behavior of Adversarial Training in Binary Classification

    Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

    Abstract: It has been consistently reported that many machine learning models are susceptible to adversarial attacks i.e., small additive adversarial perturbations applied to data points can cause misclassification. Adversarial training using empirical risk minimization is considered to be the state-of-the-art method for defense against adversarial attacks. Despite being successful in practice, several prob… ▽ More

    Submitted 13 July, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: V3: additional theoretical results, extensions to correlated features

  41. arXiv:2010.00081  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Stage-wise Conservative Linear Bandits

    Authors: Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

    Abstract: We study stage-wise conservative linear stochastic bandits: an instance of bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials. At each stage, the learner must choose actions that not only maximize cumulative reward across the entire time horizon but further satisfy a linear baseline constraint that takes th… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: 28 pages, 5 figures

    Journal ref: Thirty-fourth Conference on Neural Information Processing Systems, NeurIPS 2020

  42. arXiv:2008.03326  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

    Authors: Marco Mondelli, Christos Thrampoulidis, Ramji Venkataramanan

    Abstract: We study the problem of recovering an unknown signal $\boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $\hat{\boldsymbol x}^{\rm L}$ and a spectral estimator $\hat{\boldsymbol x}^{\rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its… ▽ More

    Submitted 25 June, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

    Comments: 49 pages, 6 figures

  43. arXiv:2006.10903  [pdf, other

    cs.LG stat.ML

    Exploring Weight Importance and Hessian Bias in Model Pruning

    Authors: Mingchen Li, Yahya Sattar, Christos Thrampoulidis, Samet Oymak

    Abstract: Model pruning is an essential procedure for building compact and computationally-efficient machine learning models. A key feature of a good pruning algorithm is that it accurately quantifies the relative importance of the model weights. While model pruning has a rich history, we still don't have a full grasp of the pruning mechanics even for relatively simple problems involving linear models or sh… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 28 pages

  44. arXiv:2006.08917  [pdf, other

    stat.ML cs.IT cs.LG eess.SP

    Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

    Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

    Abstract: Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characteri… ▽ More

    Submitted 5 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

  45. arXiv:2005.01936  [pdf, ps, other

    cs.LG stat.ML

    Regret Bounds for Safe Gaussian Process Bandit Optimization

    Authors: Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

    Abstract: Many applications require a learner to make sequential decisions given uncertainty regarding both the system's payoff function and safety constraints. In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constrai… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: 22 pages, 17 figures

  46. arXiv:2002.07284  [pdf, other

    math.ST cs.IT eess.SP stat.ML

    Sharp Asymptotics and Optimal Performance for Inference in Binary Models

    Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

    Abstract: We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance amon… ▽ More

    Submitted 26 February, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

  47. arXiv:2001.11572  [pdf, ps, other

    stat.ML cs.IT cs.LG eess.SP

    Analytic Study of Double Descent in Binary Classification: The Impact of Loss

    Authors: Ganesh Kini, Christos Thrampoulidis

    Abstract: Extensive empirical evidence reveals that, for a wide range of different learning methods and datasets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In a recent paper [Zeyu,Kammoun,Thrampoulidis,2019] the authors studied binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this pape… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

  48. arXiv:1911.05822  [pdf, ps, other

    stat.ML cs.LG eess.SP

    A Model of Double Descent for High-dimensional Binary Linear Classification

    Authors: Zeyu Deng, Abla Kammoun, Christos Thrampoulidis

    Abstract: We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient descent (GD) on logistic loss. For this model, we investigate the dependence of the classification error on the overparameterization ratio $κ=p/n$. First, building on known deterministic results on… ▽ More

    Submitted 10 May, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: Short version submitted to ICASSP 2020; Updates in 2nd version: revised proofs, typos fixed, extended discussions and numerical illustrations

  49. arXiv:1911.02156  [pdf, other

    cs.LG stat.ML

    Safe Linear Thompson Sampling with Side Information

    Authors: Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

    Abstract: The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this… ▽ More

    Submitted 29 February, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: Comparing with safe versions of linear UCB algorithms, Providing more intuition for proof sketch

  50. arXiv:1908.05814  [pdf, other

    cs.LG stat.ML

    Linear Stochastic Bandits Under Safety Constraints

    Authors: Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

    Abstract: Bandit algorithms have various application in safety-critical systems, where it is important to respect the system constraints that rely on the bandit's unknown parameters at every round. In this paper, we formulate a linear stochastic multi-armed bandit problem with safety constraints that depend (linearly) on an unknown parameter vector. As such, the learner is unable to identify all safe action… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: 23 pages, 7 figures