Skip to main content

Showing 1–44 of 44 results for author: Dziugaite, G K

  1. arXiv:2410.12949  [pdf, other

    cs.LG cs.CL

    Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

    Authors: Phillip Guo, Aaquib Syed, Abhay Sheshadri, Aidan Ewart, Gintare Karolina Dziugaite

    Abstract: Methods for knowledge editing and unlearning in large language models seek to edit or remove undesirable knowledge or capabilities without compromising general language modeling performance. This work investigates how mechanistic interpretability -- which, in part, aims to identify model components (circuits) associated to specific interpretable mechanisms that make up a model capability -- can im… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 20 pages, 19 figures, 7 tables

  2. arXiv:2410.12766  [pdf, other

    cs.LG

    The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse

    Authors: Ekansh Sharma, Daniel M. Roy, Gintare Karolina Dziugaite

    Abstract: Model merging aims to efficiently combine the weights of multiple expert models, each trained on a specific task, into a single multi-task model, with strong performance across all tasks. When applied to all but the last layer of weights, existing methods -- such as Task Arithmetic, TIES-merging, and TALL mask merging -- work well to combine expert models obtained by fine-tuning a common foundatio… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  3. arXiv:2406.09073  [pdf, other

    cs.LG

    Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition

    Authors: Eleni Triantafillou, Peter Kairouz, Fabian Pedregosa, Jamie Hayes, Meghdad Kurmanji, Kairan Zhao, Vincent Dumoulin, Julio Jacques Junior, Ioannis Mitliagkas, Jun Wan, Lisheng Sun Hosoya, Sergio Escalera, Gintare Karolina Dziugaite, Peter Triantafillou, Isabelle Guyon

    Abstract: We present the findings of the first NeurIPS competition on unlearning, which sought to stimulate the development of novel algorithms and initiate discussions on formal and robust evaluation methodologies. The competition was highly successful: nearly 1,200 teams from across the world participated, and a wealth of novel, imaginative solutions with different characteristics were contributed. In thi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2405.10425  [pdf, other

    cs.LG

    Data Selection for Transfer Unlearning

    Authors: Nazanin Mohammadi Sepahvand, Vincent Dumoulin, Eleni Triantafillou, Gintare Karolina Dziugaite

    Abstract: As deep learning models are becoming larger and data-hungrier, there are growing ethical, legal and technical concerns over use of data: in practice, agreements on data use may change over time, rendering previously-used training data impermissible for training purposes. These issues have driven increased attention to machine unlearning: removing "the influence of" a subset of training data from a… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  5. arXiv:2405.09037  [pdf, other

    cs.LG cs.AI cs.DC

    Unmasking Efficiency: Learning Salient Sparse Models in Non-IID Federated Learning

    Authors: Riyasat Ohib, Bishal Thapaliya, Gintare Karolina Dziugaite, Jingyu Liu, Vince Calhoun, Sergey Plis

    Abstract: In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are communicated… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  6. arXiv:2404.06498  [pdf, other

    cs.LG stat.ML

    Simultaneous linear connectivity of neural networks modulo permutation

    Authors: Ekansh Sharma, Devin Kwok, Tom Denton, Daniel M. Roy, David Rolnick, Gintare Karolina Dziugaite

    Abstract: Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between traine… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages, 6 figures

  7. arXiv:2404.05545  [pdf, other

    cs.LG cs.AI cs.CL stat.ME

    Evaluating Interventional Reasoning Capabilities of Large Language Models

    Authors: Tejas Kasetty, Divyat Mahajan, Gintare Karolina Dziugaite, Alexandre Drouin, Dhanya Sridhar

    Abstract: Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consider using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how L… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 17 pages

  8. arXiv:2402.09327  [pdf, other

    cs.LG

    Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

    Authors: Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy

    Abstract: In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is… ▽ More

    Submitted 18 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 41 Pages, To appear in ICML 2024

  9. arXiv:2402.08609  [pdf, other

    cs.LG cs.AI

    Mixtures of Experts Unlock Parameter Scaling for Deep RL

    Authors: Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

    Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-… ▽ More

    Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  10. arXiv:2401.01867  [pdf, other

    cs.LG

    Dataset Difficulty and the Role of Inductive Bias

    Authors: Devin Kwok, Nikhil Anand, Jonathan Frankle, Gintare Karolina Dziugaite, David Rolnick

    Abstract: Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examples within a dataset. These methods, which we call "example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. T… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 10 pages, 6 figures

  11. arXiv:2311.10291  [pdf, other

    cs.LG

    Leveraging Function Space Aggregation for Federated Learning at Scale

    Authors: Nikita Dhawan, Nicole Mitchell, Zachary Charles, Zachary Garrett, Gintare Karolina Dziugaite

    Abstract: The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, w… ▽ More

    Submitted 16 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: 23 pages, 10 figures. Transactions on Machine Learning Research, 2024

  12. arXiv:2310.04680  [pdf, other

    cs.CL cs.AI cs.LG

    The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

    Authors: Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

    Abstract: How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-conte… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  13. arXiv:2305.18761  [pdf, other

    cs.LG cs.CV

    Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

    Authors: Yu Yang, Eric Gan, Gintare Karolina Dziugaite, Baharan Mirzasoleiman

    Abstract: Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly prone to learning spurious correlations in the training data, that may not hold at test time. In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. Notably, we show that examples with spuri… ▽ More

    Submitted 6 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 26 pages, 10 figures

    Journal ref: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain. PMLR: Volume 238

  14. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  15. arXiv:2212.13556  [pdf, other

    cs.LG stat.ML

    Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

    Authors: Mahdi Haghifam, Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund, Daniel M. Roy, Gintare Karolina Dziugaite

    Abstract: To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds… ▽ More

    Submitted 13 July, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

    Comments: 49 pages, 2 figures. This version corrects a mistake in the proof of Theorem 17. Proc. International Conference on Algorithmic Learning Theory (ALT), 2023

  16. arXiv:2212.00291  [pdf, other

    cs.LG

    The Effect of Data Dimensionality on Neural Network Prunability

    Authors: Zachary Ankner, Alex Renda, Gintare Karolina Dziugaite, Jonathan Frankle, Tian Jin

    Abstract: Practitioners prune neural networks for efficiency gains and generalization improvements, but few scrutinize the factors determining the prunability of a neural network the maximum fraction of weights that pruning can remove without compromising the model's test accuracy. In this work, we study the properties of input data that may contribute to the prunability of a neural network. For high dimens… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  17. arXiv:2210.13738  [pdf, other

    cs.LG stat.ML

    Pruning's Effect on Generalization Through the Lens of Training and Regularization

    Authors: Tian Jin, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite

    Abstract: Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: 49 pages, 20 figures

    Journal ref: Advances in Neural Information Processing Systems 2022

  18. arXiv:2210.03044  [pdf, other

    cs.LG cs.AI stat.ML

    Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

    Authors: Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite

    Abstract: Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of tra… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: The first three authors contributed equally

  19. arXiv:2206.14800  [pdf, other

    cs.LG

    Understanding Generalization via Leave-One-Out Conditional Mutual Information

    Authors: Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite

    Abstract: We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 18 pages

  20. arXiv:2206.01278  [pdf, other

    cs.LG cs.AI stat.ML

    Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

    Authors: Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

    Abstract: A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this ear… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: The first two authors contributed equally

  21. arXiv:2111.05275  [pdf, other

    cs.IT cs.LG stat.ML

    Towards a Unified Information-Theoretic Framework for Generalization

    Authors: Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

    Abstract: In this work, we investigate the expressiveness of the "conditional mutual information" (CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs… ▽ More

    Submitted 17 November, 2021; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: 22 Pages, NeurIPS 2021, This submission subsumes [arXiv:2011.02970] ("On the Information Complexity of Proper Learners for VC Classes in the Realizable Case")

  22. arXiv:2110.11804  [pdf, other

    stat.ML cs.LG

    Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

    Authors: Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite

    Abstract: We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the induced stochastic predictor in the setting of linear regression, and observe a data-adaptive L1 regularization term, in contrast to the dataadaptive L2 regular… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: 34 pages, 10 figures

  23. arXiv:2107.07075  [pdf, other

    cs.LG

    Deep Learning on a Data Diet: Finding Important Examples Early in Training

    Authors: Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite

    Abstract: Recent success in deep learning has partially been driven by training increasingly overparametrized networks on ever larger datasets. It is therefore natural to ask: how much of the data is superfluous, which examples are important for generalization, and how do we find them? In this work, we make the striking observation that, in standard vision datasets, simple scores averaged over several weigh… ▽ More

    Submitted 28 March, 2023; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: 21 pages, 18 figures

    Journal ref: Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

  24. arXiv:2102.00931  [pdf, other

    cs.LG stat.ML

    Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

    Authors: Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

    Abstract: We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds dep… ▽ More

    Submitted 15 August, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: COLT 2021

  25. arXiv:2012.07976  [pdf, other

    cs.LG stat.ML

    NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

    Authors: Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur

    Abstract: Understanding generalization in deep learning is arguably one of the most important questions in deep learning. Deep learning has been successfully adopted to a large number of problems ranging from pattern recognition to complex decision making, but many recent researchers have raised many concerns about deep learning, among which the most important is generalization. Despite numerous attempts, c… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: 20 pages, 2 figures. Accepted for NeurIPS 2020 Competitions Track. Lead organizer: Yiding Jiang

  26. arXiv:2011.02970  [pdf, other

    cs.LG cs.IT

    On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

    Authors: Mahdi Haghifam, Gintare Karolina Dziugaite, Shay Moran, Daniel M. Roy

    Abstract: We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d \log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be boun… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: 5 Pages

  27. arXiv:2010.15110  [pdf, other

    cs.LG stat.ML

    Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

    Authors: Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

    Abstract: In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: 19 pages, 19 figures, In Advances in Neural Information Processing Systems 34 (NeurIPS 2020)

  28. arXiv:2010.13764  [pdf, other

    cs.LG stat.ML

    Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability

    Authors: Gintare Karolina Dziugaite, Shai Ben-David, Daniel M. Roy

    Abstract: To date, there has been no formal study of the statistical cost of interpretability in machine learning. As such, the discourse around potential trade-offs is often informal and misconceptions abound. In this work, we aim to initiate a formal study of these trade-offs. A seemingly insurmountable roadblock is the lack of any agreed upon definition of interpretability. Instead, we propose a shift in… ▽ More

    Submitted 28 October, 2020; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: 12 pages; minor edits

  29. arXiv:2010.11924  [pdf, other

    cs.LG stat.ML

    In Search of Robust Measures of Generalization

    Authors: Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy

    Abstract: One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neu… ▽ More

    Submitted 20 January, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 27 pages, 11 figures, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  30. arXiv:2009.08576  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Pruning Neural Networks at Initialization: Why are We Missing the Mark?

    Authors: Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

    Abstract: Recent work has explored the possibility of pruning neural networks at initialization. We assess proposals for doing so: SNIP (Lee et al., 2019), GraSP (Wang et al., 2020), SynFlow (Tanaka et al., 2020), and magnitude pruning. Although these methods surpass the trivial baseline of random pruning, they remain below the accuracy of magnitude pruning after training, and we endeavor to understand why.… ▽ More

    Submitted 21 March, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Published in ICLR 2021

  31. arXiv:2006.10929  [pdf, other

    cs.LG stat.ML

    On the role of data in PAC-Bayes bounds

    Authors: Gintare Karolina Dziugaite, Kyle Hsu, Waseem Gharbieh, Gabriel Arpino, Daniel M. Roy

    Abstract: The dominant term in PAC-Bayes bounds is often the Kullback--Leibler divergence between the posterior and prior. For so-called linear PAC-Bayes risk bounds based on the empirical risk of a fixed posterior kernel, it is possible to minimize the expected value of the bound by choosing the prior to be the expected posterior, which we call the oracle prior on the account that it is distribution depend… ▽ More

    Submitted 26 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: 28 pages, 8 figures

  32. arXiv:2004.12983  [pdf, other

    stat.ML cs.IT cs.LG

    Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms

    Authors: Mahdi Haghifam, Jeffrey Negrea, Ashish Khisti, Daniel M. Roy, Gintare Karolina Dziugaite

    Abstract: The information-theoretic framework of Russo and J. Zou (2016) and Xu and Raginsky (2017) provides bounds on the generalization error of a learning algorithm in terms of the mutual information between the algorithm's output and the training sample. In this work, we study the proposal, by Steinke and Zakynthinou (2020), to reason about the generalization error of a learning algorithm by introducing… ▽ More

    Submitted 23 October, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

    Comments: 23 Pages, 3 Figures. To appear in, Advances in Neural Information Processing Systems (34), 2020

  33. arXiv:2003.11630  [pdf, other

    cs.LG stat.ML

    RelatIF: Identifying Explanatory Training Examples via Relative Influence

    Authors: Elnaz Barshan, Marc-Etienne Brunet, Gintare Karolina Dziugaite

    Abstract: In this work, we focus on the use of influence functions to identify relevant training examples that one might hope "explain" the predictions of a machine learning model. One shortcoming of influence functions is that the training examples deemed most "influential" are often outliers or mislabelled, making them poor choices for explanation. In order to address this shortcoming, we separate the rol… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

  34. arXiv:1912.05671  [pdf, other

    cs.LG cs.NE stat.ML

    Linear Mode Connectivity and the Lottery Ticket Hypothesis

    Authors: Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

    Abstract: We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pr… ▽ More

    Submitted 18 July, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: Published in ICML 2020. This submission subsumes arXiv:1903.01611 ("Stabilizing the Lottery Ticket Hypothesis" and "The Lottery Ticket Hypothesis at Scale")

  35. arXiv:1912.04265  [pdf, other

    cs.LG stat.ML

    In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors

    Authors: Jeffrey Negrea, Gintare Karolina Dziugaite, Daniel M. Roy

    Abstract: We propose to study the generalization error of a learned predictor $\hat h$ in terms of that of a surrogate (potentially randomized) predictor that is coupled to $\hat h$ and designed to trade empirical risk for control of generalization error. In the case where $\hat h$ interpolates the data, it is interesting to consider theoretical surrogate classifiers that are partially derandomized or reran… ▽ More

    Submitted 10 September, 2021; v1 submitted 9 December, 2019; originally announced December 2019.

    Comments: 14 pages before references and appendices. 23 pages total. Includes a correction to Lemma 5.3 and Theorem 5.4, and their proofs

  36. arXiv:1911.02151  [pdf, other

    stat.ML cs.IT cs.LG

    Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

    Authors: Jeffrey Negrea, Mahdi Haghifam, Gintare Karolina Dziugaite, Ashish Khisti, Daniel M. Roy

    Abstract: In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mu… ▽ More

    Submitted 25 January, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: 23 pages, 1 figure. To appear in, Advances in Neural Information Processing Systems (33), 2019

  37. arXiv:1906.04282  [pdf, other

    cs.LG stat.ML

    Stochastic Neural Network with Kronecker Flow

    Authors: Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

    Abstract: Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this w… ▽ More

    Submitted 13 February, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: Proceedings of the 23rdInternational Conference on ArtificialIntelligence and Statistics (AISTATS) 2020

  38. arXiv:1903.01611  [pdf, other

    cs.LG cs.CV stat.ML

    Stabilizing the Lottery Ticket Hypothesis

    Authors: Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin

    Abstract: Pruning is a well-established technique for removing unnecessary structure from neural networks after training to improve the performance of inference. Several recent results have explored the possibility of pruning at initialization time to provide similar benefits during training. In particular, the "lottery ticket hypothesis" conjectures that typical neural networks contain small subnetworks th… ▽ More

    Submitted 20 July, 2020; v1 submitted 4 March, 2019; originally announced March 2019.

    Comments: This article has been subsumed by "Linear Mode Connectivity and the Lottery Ticket Hypothesis" (arXiv:1912.05671, ICML 2020). Please read/cite that article instead

  39. arXiv:1802.09583  [pdf, other

    cs.LG stat.ML

    Data-dependent PAC-Bayes priors via differential privacy

    Authors: Gintare Karolina Dziugaite, Daniel M. Roy

    Abstract: The Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999) can incorporate knowledge about the learning algorithm and (data) distribution through the use of distribution-dependent priors, yielding tighter generalization bounds on data-dependent posteriors. Using this flexibility, however, is difficult, especially when the data distribution is presumed to be unknown. We show how an… ▽ More

    Submitted 19 April, 2019; v1 submitted 26 February, 2018; originally announced February 2018.

    Comments: 18 pages, 2 figures; equivalent to camera ready, but includes supplementary materials; subsumes and extends some results first reported in arXiv:1712.09376

    Journal ref: Advances in Neural Information Processing Systems, 31 (2018), pp. 8430-8441

  40. arXiv:1712.09376  [pdf, other

    stat.ML cs.LG

    Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors

    Authors: Gintare Karolina Dziugaite, Daniel M. Roy

    Abstract: We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by optimizing the bound's prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen… ▽ More

    Submitted 19 April, 2019; v1 submitted 26 December, 2017; originally announced December 2017.

    Comments: 18 pages, 6 figures; combines ICML camera ready with supplementary materials

    Journal ref: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1377-1386, 2018

  41. arXiv:1703.11008  [pdf, other

    cs.LG

    Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

    Authors: Gintare Karolina Dziugaite, Daniel M. Roy

    Abstract: One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable that simple algorithms like SGD reliably return solutions with low test error. One roadblock to explaining these phenomena in terms of implicit regularization, structural properties of the solution, and/or e… ▽ More

    Submitted 18 October, 2017; v1 submitted 31 March, 2017; originally announced March 2017.

    Comments: 14 pages, 1 table, 2 figures. Corresponds with UAI camera ready and supplement. Includes additional references and related experiments

    Journal ref: Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2016, August 11--15, 2017, Sydney, NSW, Australia

  42. arXiv:1608.00853  [pdf, other

    cs.CV cs.LG

    A study of the effect of JPG compression on adversarial images

    Authors: Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy

    Abstract: Neural network image classifiers are known to be vulnerable to adversarial images, i.e., natural images which have been modified by an adversarial perturbation specifically designed to be imperceptible to humans yet fool the classifier. Not only can adversarial images be generated easily, but these images will often be adversarial for networks trained on disjoint subsets of data or with different… ▽ More

    Submitted 2 August, 2016; originally announced August 2016.

    Comments: 8 pages, 4 figures

  43. arXiv:1511.06443  [pdf, other

    cs.LG stat.ML

    Neural Network Matrix Factorization

    Authors: Gintare Karolina Dziugaite, Daniel M. Roy

    Abstract: Data often comes in the form of an array or matrix. Matrix factorization techniques attempt to recover missing or corrupted entries by assuming that the matrix can be written as the product of two low-rank matrices. In other words, matrix factorization approximates the entries of the matrix by a simple, fixed function---namely, the inner product---acting on the latent feature vectors for the corre… ▽ More

    Submitted 14 December, 2015; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Minor modifications to notation. Added additional experiments and discussion. 7 pages, 2 tables

  44. arXiv:1505.03906  [pdf, other

    stat.ML cs.LG

    Training generative neural networks via Maximum Mean Discrepancy optimization

    Authors: Gintare Karolina Dziugaite, Daniel M. Roy, Zoubin Ghahramani

    Abstract: We consider training a deep neural network to generate samples from an unknown distribution given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an unbiased estimate of the maximum mea… ▽ More

    Submitted 14 May, 2015; originally announced May 2015.

    Comments: 10 pages, to appear in Uncertainty in Artificial Intelligence (UAI) 2015