-
MYCROFT: Towards Effective and Efficient External Data Augmentation
Authors:
Zain Sarwar,
Van Tran,
Arjun Nitin Bhagoji,
Nick Feamster,
Ben Y. Zhao,
Supriyo Chakraborty
Abstract:
Machine learning (ML) models often require large amounts of data to perform well. When the available data is limited, model trainers may need to acquire more data from external sources. Often, useful data is held by private entities who are hesitant to share their data due to propriety and privacy concerns. This makes it challenging and expensive for model trainers to acquire the data they need to…
▽ More
Machine learning (ML) models often require large amounts of data to perform well. When the available data is limited, model trainers may need to acquire more data from external sources. Often, useful data is held by private entities who are hesitant to share their data due to propriety and privacy concerns. This makes it challenging and expensive for model trainers to acquire the data they need to improve model performance. To address this challenge, we propose Mycroft, a data-efficient method that enables model trainers to evaluate the relative utility of different data sources while working with a constrained data-sharing budget. By leveraging feature space distances and gradient matching, Mycroft identifies small but informative data subsets from each owner, allowing model trainers to maximize performance with minimal data exposure. Experimental results across four tasks in two domains show that Mycroft converges rapidly to the performance of the full-information baseline, where all data is shared. Moreover, Mycroft is robust to noise and can effectively rank data owners by utility. Mycroft can pave the way for democratized training of high performance ML models.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Feasibility of State Space Models for Network Traffic Generation
Authors:
Andrew Chu,
Xi Jiang,
Shinan Liu,
Arjun Bhagoji,
Francesco Bronzino,
Paul Schmitt,
Nick Feamster
Abstract:
Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of rea…
▽ More
Many problems in computer networking rely on parsing collections of network traces (e.g., traffic prioritization, intrusion detection). Unfortunately, the availability and utility of these collections is limited due to privacy concerns, data staleness, and low representativeness. While methods for generating data to augment collections exist, they often fall short in replicating the quality of real-world traffic In this paper, we i) survey the evolution of traffic simulators/generators and ii) propose the use of state-space models, specifically Mamba, for packet-level, synthetic network trace generation by modeling it as an unsupervised sequence generation problem. Early evaluation shows that state-space models can generate synthetic network traffic with higher statistical similarity to real traffic than the state-of-the-art. Our approach thus has the potential to reliably generate realistic, informative synthetic network traces for downstream tasks.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
"Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies
Authors:
Brennan Schaffner,
Arjun Nitin Bhagoji,
Siyuan Cheng,
Jacqueline Mei,
Jay L. Shen,
Grace Wang,
Marshini Chetty,
Nick Feamster,
Genevieve Lakier,
Chenhao Tan
Abstract:
Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and t…
▽ More
Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and topics. This paper presents the first systematic study of these policies from the 43 largest online platforms hosting user-generated content, focusing on policies around copyright infringement, harmful speech, and misleading content. We build a custom web-scraper to obtain policy text and develop a unified annotation scheme to analyze the text for the presence of critical components. We find significant structural and compositional variation in policies across topics and platforms, with some variation attributable to disparate legal groundings. We lay the groundwork for future studies of ever-evolving content moderation policies and their impact on users.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Towards Scalable and Robust Model Versioning
Authors:
Wenxin Ding,
Arjun Nitin Bhagoji,
Ben Y. Zhao,
Haitao Zheng
Abstract:
As the deployment of deep learning models continues to expand across industries, the threat of malicious incursions aimed at gaining access to these deployed models is on the rise. Should an attacker gain access to a deployed model, whether through server breaches, insider attacks, or model inversion techniques, they can then construct white-box adversarial attacks to manipulate the model's classi…
▽ More
As the deployment of deep learning models continues to expand across industries, the threat of malicious incursions aimed at gaining access to these deployed models is on the rise. Should an attacker gain access to a deployed model, whether through server breaches, insider attacks, or model inversion techniques, they can then construct white-box adversarial attacks to manipulate the model's classification outcomes, thereby posing significant risks to organizations that rely on these models for critical tasks. Model owners need mechanisms to protect themselves against such losses without the necessity of acquiring fresh training data - a process that typically demands substantial investments in time and capital.
In this paper, we explore the feasibility of generating multiple versions of a model that possess different attack properties, without acquiring new training data or changing model architecture. The model owner can deploy one version at a time and replace a leaked version immediately with a new version. The newly deployed model version can resist adversarial attacks generated leveraging white-box access to one or all previously leaked versions. We show theoretically that this can be accomplished by incorporating parameterized hidden distributions into the model training data, forcing the model to learn task-irrelevant features uniquely defined by the chosen data. Additionally, optimal choices of hidden distributions can produce a sequence of model versions capable of resisting compound transferability attacks over time. Leveraging our analytical insights, we design and implement a practical model versioning method for DNN classifiers, which leads to significant robustness improvements over existing methods. We believe our work presents a promising direction for safeguarding DNN services beyond their initial deployment.
△ Less
Submitted 10 March, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
NetDiffusion: Network Data Augmentation Through Protocol-Constrained Traffic Generation
Authors:
Xi Jiang,
Shinan Liu,
Aaron Gember-Jacobson,
Arjun Nitin Bhagoji,
Paul Schmitt,
Francesco Bronzino,
Nick Feamster
Abstract:
Datasets of labeled network traces are essential for a multitude of machine learning (ML) tasks in networking, yet their availability is hindered by privacy and maintenance concerns, such as data staleness. To overcome this limitation, synthetic network traces can often augment existing datasets. Unfortunately, current synthetic trace generation methods, which typically produce only aggregated flo…
▽ More
Datasets of labeled network traces are essential for a multitude of machine learning (ML) tasks in networking, yet their availability is hindered by privacy and maintenance concerns, such as data staleness. To overcome this limitation, synthetic network traces can often augment existing datasets. Unfortunately, current synthetic trace generation methods, which typically produce only aggregated flow statistics or a few selected packet attributes, do not always suffice, especially when model training relies on having features that are only available from packet traces. This shortfall manifests in both insufficient statistical resemblance to real traces and suboptimal performance on ML tasks when employed for data augmentation. In this paper, we apply diffusion models to generate high-resolution synthetic network traffic traces. We present NetDiffusion, a tool that uses a finely-tuned, controlled variant of a Stable Diffusion model to generate synthetic network traffic that is high fidelity and conforms to protocol specifications. Our evaluation demonstrates that packet captures generated from NetDiffusion can achieve higher statistical similarity to real data and improved ML model performance than current state-of-the-art approaches (e.g., GAN-based approaches). Furthermore, our synthetic traces are compatible with common network analysis tools and support a myriad of network tasks, suggesting that NetDiffusion can serve a broader spectrum of network analysis and testing tasks, extending beyond ML-centric applications.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Characterizing the Optimal 0-1 Loss for Multi-class Classification with a Test-time Attacker
Authors:
Sihui Dai,
Wenxin Ding,
Arjun Nitin Bhagoji,
Daniel Cullina,
Ben Y. Zhao,
Haitao Zheng,
Prateek Mittal
Abstract:
Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a given data distribution and comparing it to that achieved by state-of-the-art training methods is thus an important diagnostic tool. In this paper, we find achievable information-theoretic lower bounds on loss in the p…
▽ More
Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a given data distribution and comparing it to that achieved by state-of-the-art training methods is thus an important diagnostic tool. In this paper, we find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset. We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints. We further define other variants of the attacker-classifier game that determine the range of the optimal loss more efficiently than the full-fledged hypergraph construction. Our evaluation shows, for the first time, an analysis of the gap to optimal robustness for classifiers in the multi-class setting on benchmark datasets.
△ Less
Submitted 6 December, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Augmenting Rule-based DNS Censorship Detection at Scale with Machine Learning
Authors:
Jacob Brown,
Xi Jiang,
Van Tran,
Arjun Nitin Bhagoji,
Nguyen Phong Hoang,
Nick Feamster,
Prateek Mittal,
Vinod Yegneswaran
Abstract:
The proliferation of global censorship has led to the development of a plethora of measurement platforms to monitor and expose it. Censorship of the domain name system (DNS) is a key mechanism used across different countries. It is currently detected by applying heuristics to samples of DNS queries and responses (probes) for specific destinations. These heuristics, however, are both platform-speci…
▽ More
The proliferation of global censorship has led to the development of a plethora of measurement platforms to monitor and expose it. Censorship of the domain name system (DNS) is a key mechanism used across different countries. It is currently detected by applying heuristics to samples of DNS queries and responses (probes) for specific destinations. These heuristics, however, are both platform-specific and have been found to be brittle when censors change their blocking behavior, necessitating a more reliable automated process for detecting censorship.
In this paper, we explore how machine learning (ML) models can (1) help streamline the detection process, (2) improve the potential of using large-scale datasets for censorship detection, and (3) discover new censorship instances and blocking signatures missed by existing heuristic methods. Our study shows that supervised models, trained using expert-derived labels on instances of known anomalies and possible censorship, can learn the detection heuristics employed by different measurement platforms. More crucially, we find that unsupervised models, trained solely on uncensored instances, can identify new instances and variations of censorship missed by existing heuristics. Moreover, both methods demonstrate the capability to uncover a substantial number of new DNS blocking signatures, i.e., injected fake IP addresses overlooked by existing heuristics. These results are underpinned by an important methodological finding: comparing the outputs of models trained using the same probes but with labels arising from independent processes allows us to more reliably detect cases of censorship in the absence of ground-truth labels of censorship.
△ Less
Submitted 15 June, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
Natural Backdoor Datasets
Authors:
Emily Wenger,
Roma Bhattacharjee,
Arjun Nitin Bhagoji,
Josephine Passananti,
Emilio Andere,
Haitao Zheng,
Ben Y. Zhao
Abstract:
Extensive literature on backdoor poison attacks has studied attacks and defenses for backdoors using "digital trigger patterns." In contrast, "physical backdoors" use physical objects as triggers, have only recently been identified, and are qualitatively different enough to resist all defenses targeting digital trigger backdoors. Research on physical backdoors is limited by access to large dataset…
▽ More
Extensive literature on backdoor poison attacks has studied attacks and defenses for backdoors using "digital trigger patterns." In contrast, "physical backdoors" use physical objects as triggers, have only recently been identified, and are qualitatively different enough to resist all defenses targeting digital trigger backdoors. Research on physical backdoors is limited by access to large datasets containing real images of physical objects co-located with targets of classification. Building these datasets is time- and labor-intensive. This works seeks to address the challenge of accessibility for research on physical backdoor attacks. We hypothesize that there may be naturally occurring physically co-located objects already present in popular datasets such as ImageNet. Once identified, a careful relabeling of these data can transform them into training samples for physical backdoor attacks. We propose a method to scalably identify these subsets of potential triggers in existing datasets, along with the specific classes they can poison. We call these naturally occurring trigger-class subsets natural backdoor datasets. Our techniques successfully identify natural backdoors in widely-available datasets, and produce models behaviorally equivalent to those trained on manually curated datasets. We release our code to allow the research community to create their own datasets for research on physical backdoor attacks.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Understanding Robust Learning through the Lens of Representation Similarities
Authors:
Christian Cianfarani,
Arjun Nitin Bhagoji,
Vikash Sehwag,
Ben Y. Zhao,
Prateek Mittal,
Haitao Zheng
Abstract:
Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial examples has emerged as a desirable property for DNNs, spurring the development of robust training methods that account for adversarial examples. In this paper,…
▽ More
Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial examples has emerged as a desirable property for DNNs, spurring the development of robust training methods that account for adversarial examples. In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training. This is critical to diagnosing numerous salient pitfalls in robust networks, such as, degradation of performance on benign inputs, poor generalization of robustness, and increase in over-fitting. We utilize a powerful set of tools known as representation similarity metrics, across three vision datasets, to obtain layer-wise comparisons between robust and non-robust DNNs with different training procedures, architectural parameters and adversarial constraints. Our experiments highlight hitherto unseen properties of robust representations that we posit underlie the behavioral differences of robust networks. We discover a lack of specialization in robust networks' representations along with a disappearance of `block structure'. We also find overfitting during robust training largely impacts deeper layers. These, along with other findings, suggest ways forward for the design and training of better robust networks.
△ Less
Submitted 15 September, 2022; v1 submitted 20 June, 2022;
originally announced June 2022.
-
On the Permanence of Backdoors in Evolving Models
Authors:
Huiying Li,
Arjun Nitin Bhagoji,
Yuxin Chen,
Haitao Zheng,
Ben Y. Zhao
Abstract:
Existing research on training-time attacks for deep neural networks (DNNs), such as backdoors, largely assume that models are static once trained, and hidden backdoors trained into models remain active indefinitely. In practice, models are rarely static but evolve continuously to address distribution drifts in the underlying data. This paper explores the behavior of backdoor attacks in time-varyin…
▽ More
Existing research on training-time attacks for deep neural networks (DNNs), such as backdoors, largely assume that models are static once trained, and hidden backdoors trained into models remain active indefinitely. In practice, models are rarely static but evolve continuously to address distribution drifts in the underlying data. This paper explores the behavior of backdoor attacks in time-varying models, whose model weights are continually updated via fine-tuning to adapt to data drifts. Our theoretical analysis shows how fine-tuning with fresh data progressively "erases" the injected backdoors, and our empirical study illustrates how quickly a time-varying model "forgets" backdoors under a variety of training and attack settings. We also show that novel fine-tuning strategies using smart learning rates can significantly accelerate backdoor forgetting. Finally, we discuss the need for new backdoor defenses that target time-varying models specifically.
△ Less
Submitted 8 February, 2023; v1 submitted 7 June, 2022;
originally announced June 2022.
-
SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification
Authors:
Ashwinee Panda,
Saeed Mahloujifar,
Arjun N. Bhagoji,
Supriyo Chakraborty,
Prateek Mittal
Abstract:
Federated learning is inherently vulnerable to model poisoning attacks because its decentralized nature allows attackers to participate with compromised devices. In model poisoning attacks, the attacker reduces the model's performance on targeted sub-tasks (e.g. classifying planes as birds) by uploading "poisoned" updates. In this report we introduce \algoname{}, a novel defense that uses global t…
▽ More
Federated learning is inherently vulnerable to model poisoning attacks because its decentralized nature allows attackers to participate with compromised devices. In model poisoning attacks, the attacker reduces the model's performance on targeted sub-tasks (e.g. classifying planes as birds) by uploading "poisoned" updates. In this report we introduce \algoname{}, a novel defense that uses global top-k update sparsification and device-level gradient clipping to mitigate model poisoning attacks. We propose a theoretical framework for analyzing the robustness of defenses against poisoning attacks, and provide robustness and convergence analysis of our algorithm. To validate its empirical efficacy we conduct an open-source evaluation at scale across multiple benchmark datasets for computer vision and federated learning.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Poison Forensics: Traceback of Data Poisoning Attacks in Neural Networks
Authors:
Shawn Shan,
Arjun Nitin Bhagoji,
Haitao Zheng,
Ben Y. Zhao
Abstract:
In adversarial machine learning, new defenses against attacks on deep learning systems are routinely broken soon after their release by more powerful attacks. In this context, forensic tools can offer a valuable complement to existing defenses, by tracing back a successful attack to its root cause, and offering a path forward for mitigation to prevent similar attacks in the future.
In this paper…
▽ More
In adversarial machine learning, new defenses against attacks on deep learning systems are routinely broken soon after their release by more powerful attacks. In this context, forensic tools can offer a valuable complement to existing defenses, by tracing back a successful attack to its root cause, and offering a path forward for mitigation to prevent similar attacks in the future.
In this paper, we describe our efforts in developing a forensic traceback tool for poison attacks on deep neural networks. We propose a novel iterative clustering and pruning solution that trims "innocent" training samples, until all that remains is the set of poisoned data responsible for the attack. Our method clusters training samples based on their impact on model parameters, then uses an efficient data unlearning method to prune innocent clusters. We empirically demonstrate the efficacy of our system on three types of dirty-label (backdoor) poison attacks and three types of clean-label poison attacks, across domains of computer vision and malware classification. Our system achieves over 98.4% precision and 96.8% recall across all attacks. We also show that our system is robust against four anti-forensics measures specifically designed to attack it.
△ Less
Submitted 15 June, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
LEAF: Navigating Concept Drift in Cellular Networks
Authors:
Shinan Liu,
Francesco Bronzino,
Paul Schmitt,
Arjun Nitin Bhagoji,
Nick Feamster,
Hector Garcia Crespo,
Timothy Coyle,
Brian Ward
Abstract:
Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target to be predicted changes. Mitigating concept drift is an essential part of operationalizing machine learning models in…
▽ More
Operational networks commonly rely on machine learning models for many tasks, including detecting anomalies, inferring application performance, and forecasting demand. Yet, model accuracy can degrade due to concept drift, whereby the relationship between the features and the target to be predicted changes. Mitigating concept drift is an essential part of operationalizing machine learning models in general, but is of particular importance in networking's highly dynamic deployment environments. In this paper, we first characterize concept drift in a large cellular network for a major metropolitan area in the United States. We find that concept drift occurs across many important key performance indicators (KPIs), independently of the model, training set size, and time interval -- thus necessitating practical approaches to detect, explain, and mitigate it. We then show that frequent model retraining with newly available data is not sufficient to mitigate concept drift, and can even degrade model accuracy further. Finally, we develop a new methodology for concept drift mitigation, Local Error Approximation of Features (LEAF). LEAF works by detecting drift; explaining the features and time intervals that contribute the most to drift; and mitigates it using forgetting and over-sampling. We evaluate LEAF against industry-standard mitigation approaches (notably, periodic retraining) with more than four years of cellular KPI data. Our initial tests with a major cellular provider in the US show that LEAF consistently outperforms periodic and triggered retraining on complex, real-world data while reducing costly retraining operations.
△ Less
Submitted 2 February, 2023; v1 submitted 7 September, 2021;
originally announced September 2021.
-
Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries
Authors:
Arjun Nitin Bhagoji,
Daniel Cullina,
Vikash Sehwag,
Prateek Mittal
Abstract:
Understanding the fundamental limits of robust supervised learning has emerged as a problem of immense interest, from both practical and theoretical standpoints. In particular, it is critical to determine classifier-agnostic bounds on the training loss to establish when learning is possible. In this paper, we determine optimal lower bounds on the cross-entropy loss in the presence of test-time adv…
▽ More
Understanding the fundamental limits of robust supervised learning has emerged as a problem of immense interest, from both practical and theoretical standpoints. In particular, it is critical to determine classifier-agnostic bounds on the training loss to establish when learning is possible. In this paper, we determine optimal lower bounds on the cross-entropy loss in the presence of test-time adversaries, along with the corresponding optimal classification outputs. Our formulation of the bound as a solution to an optimization problem is general enough to encompass any loss function depending on soft classifier outputs. We also propose and provide a proof of correctness for a bespoke algorithm to compute this lower bound efficiently, allowing us to determine lower bounds for multiple practical datasets of interest. We use our lower bounds as a diagnostic tool to determine the effectiveness of current robust training methods and find a gap from optimality at larger budgets. Finally, we investigate the possibility of using of optimal classification outputs as soft labels to empirically improve robust training.
△ Less
Submitted 4 June, 2021; v1 submitted 16 April, 2021;
originally announced April 2021.
-
A Real-time Defense against Website Fingerprinting Attacks
Authors:
Shawn Shan,
Arjun Nitin Bhagoji,
Haitao Zheng,
Ben Y. Zhao
Abstract:
Anonymity systems like Tor are vulnerable to Website Fingerprinting (WF) attacks, where a local passive eavesdropper infers the victim's activity. Current WF attacks based on deep learning classifiers have successfully overcome numerous proposed defenses. While recent defenses leveraging adversarial examples offer promise, these adversarial examples can only be computed after the network session h…
▽ More
Anonymity systems like Tor are vulnerable to Website Fingerprinting (WF) attacks, where a local passive eavesdropper infers the victim's activity. Current WF attacks based on deep learning classifiers have successfully overcome numerous proposed defenses. While recent defenses leveraging adversarial examples offer promise, these adversarial examples can only be computed after the network session has concluded, thus offer users little protection in practical settings.
We propose Dolos, a system that modifies user network traffic in real time to successfully evade WF attacks. Dolos injects dummy packets into traffic traces by computing input-agnostic adversarial patches that disrupt deep learning classifiers used in WF attacks. Patches are then applied to alter and protect user traffic in real time. Importantly, these patches are parameterized by a user-side secret, ensuring that attackers cannot use adversarial training to defeat Dolos. We experimentally demonstrate that Dolos provides 94+% protection against state-of-the-art WF attacks under a variety of settings. Against prior defenses, Dolos outperforms in terms of higher protection performance and lower information leakage and bandwidth overhead. Finally, we show that Dolos is robust against a variety of adaptive countermeasures to detect or disrupt the defense.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
A Critical Evaluation of Open-World Machine Learning
Authors:
Liwei Song,
Vikash Sehwag,
Arjun Nitin Bhagoji,
Prateek Mittal
Abstract:
Open-world machine learning (ML) combines closed-world models trained on in-distribution data with out-of-distribution (OOD) detectors, which aim to detect and reject OOD inputs. Previous works on open-world ML systems usually fail to test their reliability under diverse, and possibly adversarial conditions. Therefore, in this paper, we seek to understand how resilient are state-of-the-art open-wo…
▽ More
Open-world machine learning (ML) combines closed-world models trained on in-distribution data with out-of-distribution (OOD) detectors, which aim to detect and reject OOD inputs. Previous works on open-world ML systems usually fail to test their reliability under diverse, and possibly adversarial conditions. Therefore, in this paper, we seek to understand how resilient are state-of-the-art open-world ML systems to changes in system components? With our evaluation across 6 OOD detectors, we find that the choice of in-distribution data, model architecture and OOD data have a strong impact on OOD detection performance, inducing false positive rates in excess of $70\%$. We further show that OOD inputs with 22 unintentional corruptions or adversarial perturbations render open-world ML systems unusable with false positive rates of up to $100\%$. To increase the resilience of open-world ML, we combine robust classifiers with OOD detection techniques and uncover a new trade-off between OOD detection and robustness.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Backdoor Attacks Against Deep Learning Systems in the Physical World
Authors:
Emily Wenger,
Josephine Passananti,
Arjun Bhagoji,
Yuanshun Yao,
Haitao Zheng,
Ben Y. Zhao
Abstract:
Backdoor attacks embed hidden malicious behaviors into deep learning models, which only activate and cause misclassifications on model inputs containing a specific trigger. Existing works on backdoor attacks and defenses, however, mostly focus on digital attacks that use digitally generated patterns as triggers. A critical question remains unanswered: can backdoor attacks succeed using physical ob…
▽ More
Backdoor attacks embed hidden malicious behaviors into deep learning models, which only activate and cause misclassifications on model inputs containing a specific trigger. Existing works on backdoor attacks and defenses, however, mostly focus on digital attacks that use digitally generated patterns as triggers. A critical question remains unanswered: can backdoor attacks succeed using physical objects as triggers, thus making them a credible threat against deep learning systems in the real world? We conduct a detailed empirical study to explore this question for facial recognition, a critical deep learning task. Using seven physical objects as triggers, we collect a custom dataset of 3205 images of ten volunteers and use it to study the feasibility of physical backdoor attacks under a variety of real-world conditions. Our study reveals two key findings. First, physical backdoor attacks can be highly successful if they are carefully configured to overcome the constraints imposed by physical objects. In particular, the placement of successful triggers is largely constrained by the target model's dependence on key facial features. Second, four of today's state-of-the-art defenses against (digital) backdoors are ineffective against physical backdoors, because the use of physical objects breaks core assumptions used to construct these defenses. Our study confirms that (physical) backdoor attacks are not a hypothetical phenomenon but rather pose a serious real-world threat to critical classification tasks. We need new and more robust defenses against backdoors in the physical world.
△ Less
Submitted 7 September, 2021; v1 submitted 25 June, 2020;
originally announced June 2020.
-
PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking
Authors:
Chong Xiang,
Arjun Nitin Bhagoji,
Vikash Sehwag,
Prateek Mittal
Abstract:
Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified, and defending against such attacks is an unsolved/open problem. In this paper, we propose a general defense framework…
▽ More
Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image. Such attacks can be realized in the physical world by attaching the adversarial patch to the object to be misclassified, and defending against such attacks is an unsolved/open problem. In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches. The cornerstone of PatchGuard involves the use of CNNs with small receptive fields to impose a bound on the number of features corrupted by an adversarial patch. Given a bounded number of corrupted features, the problem of designing an adversarial patch defense reduces to that of designing a secure feature aggregation mechanism. Towards this end, we present our robust masking defense that robustly detects and masks corrupted features to recover the correct prediction. Notably, we can prove the robustness of our defense against any adversary within our threat model. Our extensive evaluation on ImageNet, ImageNette (a 10-class subset of ImageNet), and CIFAR-10 datasets demonstrates that our defense achieves state-of-the-art performance in terms of both provable robust accuracy and clean accuracy.
△ Less
Submitted 31 March, 2021; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Advances and Open Problems in Federated Learning
Authors:
Peter Kairouz,
H. Brendan McMahan,
Brendan Avent,
Aurélien Bellet,
Mehdi Bennis,
Arjun Nitin Bhagoji,
Kallista Bonawitz,
Zachary Charles,
Graham Cormode,
Rachel Cummings,
Rafael G. L. D'Oliveira,
Hubert Eichner,
Salim El Rouayheb,
David Evans,
Josh Gardner,
Zachary Garrett,
Adrià Gascón,
Badih Ghazi,
Phillip B. Gibbons,
Marco Gruteser,
Zaid Harchaoui,
Chaoyang He,
Lie He,
Zhouyuan Huo,
Ben Hutchinson
, et al. (34 additional authors not shown)
Abstract:
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re…
▽ More
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
△ Less
Submitted 8 March, 2021; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Lower Bounds on Adversarial Robustness from Optimal Transport
Authors:
Arjun Nitin Bhagoji,
Daniel Cullina,
Prateek Mittal
Abstract:
While progress has been made in understanding the robustness of machine learning classifiers to test-time adversaries (evasion attacks), fundamental questions remain unresolved. In this paper, we use optimal transport to characterize the minimum possible loss in an adversarial classification scenario. In this setting, an adversary receives a random labeled example from one of two classes, perturbs…
▽ More
While progress has been made in understanding the robustness of machine learning classifiers to test-time adversaries (evasion attacks), fundamental questions remain unresolved. In this paper, we use optimal transport to characterize the minimum possible loss in an adversarial classification scenario. In this setting, an adversary receives a random labeled example from one of two classes, perturbs the example subject to a neighborhood constraint, and presents the modified example to the classifier. We define an appropriate cost function such that the minimum transportation cost between the distributions of the two classes determines the minimum $0-1$ loss for any classifier. When the classifier comes from a restricted hypothesis class, the optimal transportation cost provides a lower bound. We apply our framework to the case of Gaussian data with norm-bounded adversaries and explicitly show matching bounds for the classification and transport problems as well as the optimality of linear classifiers. We also characterize the sample complexity of learning in this setting, deriving and extending previously known results as a special case. Finally, we use our framework to study the gap between the optimal classification performance possible and that currently achieved by state-of-the-art robustly trained neural networks for datasets of interest, namely, MNIST, Fashion MNIST and CIFAR-10.
△ Less
Submitted 30 October, 2019; v1 submitted 26 September, 2019;
originally announced September 2019.
-
Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples
Authors:
Vikash Sehwag,
Arjun Nitin Bhagoji,
Liwei Song,
Chawin Sitawarin,
Daniel Cullina,
Mung Chiang,
Prateek Mittal
Abstract:
A large body of recent work has investigated the phenomenon of evasion attacks using adversarial examples for deep learning systems, where the addition of norm-bounded perturbations to the test inputs leads to incorrect output classification. Previous work has investigated this phenomenon in closed-world systems where training and test inputs follow a pre-specified distribution. However, real-worl…
▽ More
A large body of recent work has investigated the phenomenon of evasion attacks using adversarial examples for deep learning systems, where the addition of norm-bounded perturbations to the test inputs leads to incorrect output classification. Previous work has investigated this phenomenon in closed-world systems where training and test inputs follow a pre-specified distribution. However, real-world implementations of deep learning applications, such as autonomous driving and content classification are likely to operate in the open-world environment. In this paper, we demonstrate the success of open-world evasion attacks, where adversarial examples are generated from out-of-distribution inputs (OOD adversarial examples). In our study, we use 11 state-of-the-art neural network models trained on 3 image datasets of varying complexity. We first demonstrate that state-of-the-art detectors for out-of-distribution data are not robust against OOD adversarial examples. We then consider 5 known defenses for adversarial examples, including state-of-the-art robust training methods, and show that against these defenses, OOD adversarial examples can achieve up to 4$\times$ higher target success rates compared to adversarial examples generated from in-distribution data. We also take a quantitative look at how open-world evasion attacks may affect real-world systems. Finally, we present the first steps towards a robust open-world machine learning system.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.
-
Analyzing Federated Learning through an Adversarial Lens
Authors:
Arjun Nitin Bhagoji,
Supriyo Chakraborty,
Prateek Mittal,
Seraphin Calo
Abstract:
Federated learning distributes model training among a multitude of agents, who, guided by privacy concerns, perform training using their local data but share only model parameter updates, for iterative aggregation at the server. In this work, we explore the threat of model poisoning attacks on federated learning initiated by a single, non-colluding malicious agent where the adversarial objective i…
▽ More
Federated learning distributes model training among a multitude of agents, who, guided by privacy concerns, perform training using their local data but share only model parameter updates, for iterative aggregation at the server. In this work, we explore the threat of model poisoning attacks on federated learning initiated by a single, non-colluding malicious agent where the adversarial objective is to cause the model to misclassify a set of chosen inputs with high confidence. We explore a number of strategies to carry out this attack, starting with simple boosting of the malicious agent's update to overcome the effects of other agents' updates. To increase attack stealth, we propose an alternating minimization strategy, which alternately optimizes for the training loss and the adversarial objective. We follow up by using parameter estimation for the benign agents' updates to improve on attack success. Finally, we use a suite of interpretability techniques to generate visual explanations of model decisions for both benign and malicious models and show that the explanations are nearly visually indistinguishable. Our results indicate that even a highly constrained adversary can carry out model poisoning attacks while simultaneously maintaining stealth, thus highlighting the vulnerability of the federated learning setting and the need to develop effective defense strategies.
△ Less
Submitted 24 November, 2019; v1 submitted 29 November, 2018;
originally announced November 2018.
-
PAC-learning in the presence of evasion adversaries
Authors:
Daniel Cullina,
Arjun Nitin Bhagoji,
Prateek Mittal
Abstract:
The existence of evasion attacks during the test phase of machine learning algorithms represents a significant challenge to both their deployment and understanding. These attacks can be carried out by adding imperceptible perturbations to inputs to generate adversarial examples and finding effective defenses and detectors has proven to be difficult. In this paper, we step away from the attack-defe…
▽ More
The existence of evasion attacks during the test phase of machine learning algorithms represents a significant challenge to both their deployment and understanding. These attacks can be carried out by adding imperceptible perturbations to inputs to generate adversarial examples and finding effective defenses and detectors has proven to be difficult. In this paper, we step away from the attack-defense arms race and seek to understand the limits of what can be learned in the presence of an evasion adversary. In particular, we extend the Probably Approximately Correct (PAC)-learning framework to account for the presence of an adversary. We first define corrupted hypothesis classes which arise from standard binary hypothesis classes in the presence of an evasion adversary and derive the Vapnik-Chervonenkis (VC)-dimension for these, denoted as the adversarial VC-dimension. We then show that sample complexity upper bounds from the Fundamental Theorem of Statistical learning can be extended to the case of evasion adversaries, where the sample complexity is controlled by the adversarial VC-dimension. We then explicitly derive the adversarial VC-dimension for halfspace classifiers in the presence of a sample-wise norm-constrained adversary of the type commonly studied for evasion attacks and show that it is the same as the standard VC-dimension, closing an open question. Finally, we prove that the adversarial VC-dimension can be either larger or smaller than the standard VC-dimension depending on the hypothesis class and adversary, making it an interesting object of study in its own right.
△ Less
Submitted 6 June, 2018; v1 submitted 4 June, 2018;
originally announced June 2018.
-
On the Local Equivalence of 2D Color Codes and Surface Codes with Applications
Authors:
Arun B. Aloshious,
Arjun Nitin Bhagoji,
Pradeep Kiran Sarvepalli
Abstract:
In recent years, there have been many studies on local stabilizer codes. Under the assumption of translation and scale invariance Yoshida classified such codes. His result implies that translation invariant 2D color codes are equivalent to copies of toric codes. Independently, Bombin, Duclos-Cianci, and Poulin showed that a local translation invariant 2D topological stabilizer code is locally equi…
▽ More
In recent years, there have been many studies on local stabilizer codes. Under the assumption of translation and scale invariance Yoshida classified such codes. His result implies that translation invariant 2D color codes are equivalent to copies of toric codes. Independently, Bombin, Duclos-Cianci, and Poulin showed that a local translation invariant 2D topological stabilizer code is locally equivalent to a finite number of copies of Kitaev's toric code. In this paper we focus on 2D topological color codes and relax the assumption of translation invariance. Using a linear algebraic framework we show that any 2D color code (without boundaries) is locally equivalent to two copies of a related surface code. This surface code is induced by color code. We study the application of this equivalence to the decoding of 2D color codes over the bit flip channel as well as the quantum erasure channel. We report the performance of the color code on the square octagonal lattice over the quantum erasure channel. Further, we provide explicit circuits that perform the transformation between 2D color codes and surface codes. Our circuits do not require any additional ancilla qubits.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
DARTS: Deceiving Autonomous Cars with Toxic Signs
Authors:
Chawin Sitawarin,
Arjun Nitin Bhagoji,
Arsalan Mosenia,
Mung Chiang,
Prateek Mittal
Abstract:
Sign recognition is an integral part of autonomous cars. Any misclassification of traffic signs can potentially lead to a multitude of disastrous consequences, ranging from a life-threatening accident to even a large-scale interruption of transportation services relying on autonomous cars. In this paper, we propose and examine security attacks against sign recognition systems for Deceiving Autonom…
▽ More
Sign recognition is an integral part of autonomous cars. Any misclassification of traffic signs can potentially lead to a multitude of disastrous consequences, ranging from a life-threatening accident to even a large-scale interruption of transportation services relying on autonomous cars. In this paper, we propose and examine security attacks against sign recognition systems for Deceiving Autonomous caRs with Toxic Signs (we call the proposed attacks DARTS). In particular, we introduce two novel methods to create these toxic signs. First, we propose Out-of-Distribution attacks, which expand the scope of adversarial examples by enabling the adversary to generate these starting from an arbitrary point in the image space compared to prior attacks which are restricted to existing training/test data (In-Distribution). Second, we present the Lenticular Printing attack, which relies on an optical phenomenon to deceive the traffic sign recognition system. We extensively evaluate the effectiveness of the proposed attacks in both virtual and real-world settings and consider both white-box and black-box threat models. Our results demonstrate that the proposed attacks are successful under both settings and threat models. We further show that Out-of-Distribution attacks can outperform In-Distribution attacks on classifiers defended using the adversarial training defense, exposing a new attack vector for these defenses.
△ Less
Submitted 31 May, 2018; v1 submitted 18 February, 2018;
originally announced February 2018.
-
Rogue Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logos
Authors:
Chawin Sitawarin,
Arjun Nitin Bhagoji,
Arsalan Mosenia,
Prateek Mittal,
Mung Chiang
Abstract:
We propose a new real-world attack against the computer vision based systems of autonomous vehicles (AVs). Our novel Sign Embedding attack exploits the concept of adversarial examples to modify innocuous signs and advertisements in the environment such that they are classified as the adversary's desired traffic sign with high confidence. Our attack greatly expands the scope of the threat posed to…
▽ More
We propose a new real-world attack against the computer vision based systems of autonomous vehicles (AVs). Our novel Sign Embedding attack exploits the concept of adversarial examples to modify innocuous signs and advertisements in the environment such that they are classified as the adversary's desired traffic sign with high confidence. Our attack greatly expands the scope of the threat posed to AVs since adversaries are no longer restricted to just modifying existing traffic signs as in previous work. Our attack pipeline generates adversarial samples which are robust to the environmental conditions and noisy image transformations present in the physical world. We ensure this by including a variety of possible image transformations in the optimization problem used to generate adversarial samples. We verify the robustness of the adversarial samples by printing them out and carrying out drive-by tests simulating the conditions under which image capture would occur in a real-world scenario. We experimented with physical attack samples for different distances, lighting conditions and camera angles. In addition, extensive evaluations were carried out in the virtual setting for a variety of image transformations. The adversarial samples generated using our method have adversarial success rates in excess of 95% in the physical as well as virtual settings.
△ Less
Submitted 26 March, 2018; v1 submitted 8 January, 2018;
originally announced January 2018.
-
Exploring the Space of Black-box Attacks on Deep Neural Networks
Authors:
Arjun Nitin Bhagoji,
Warren He,
Bo Li,
Dawn Song
Abstract:
Existing black-box attacks on deep neural networks (DNNs) so far have largely focused on transferability, where an adversarial instance generated for a locally trained model can "transfer" to attack other learning models. In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model's class probabilities, which do not rely on transferab…
▽ More
Existing black-box attacks on deep neural networks (DNNs) so far have largely focused on transferability, where an adversarial instance generated for a locally trained model can "transfer" to attack other learning models. In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model's class probabilities, which do not rely on transferability. We also propose strategies to decouple the number of queries required to generate each adversarial sample from the dimensionality of the input. An iterative variant of our attack achieves close to 100% adversarial success rates for both targeted and untargeted attacks on DNNs. We carry out extensive experiments for a thorough comparative evaluation of black-box attacks and show that the proposed Gradient Estimation attacks outperform all transferability based black-box attacks we tested on both MNIST and CIFAR-10 datasets, achieving adversarial success rates similar to well known, state-of-the-art white-box attacks. We also apply the Gradient Estimation attacks successfully against a real-world Content Moderation classifier hosted by Clarifai. Furthermore, we evaluate black-box attacks against state-of-the-art defenses. We show that the Gradient Estimation attacks are very effective even against these defenses.
△ Less
Submitted 26 December, 2017;
originally announced December 2017.
-
Enhancing Robustness of Machine Learning Systems via Data Transformations
Authors:
Arjun Nitin Bhagoji,
Daniel Cullina,
Chawin Sitawarin,
Prateek Mittal
Abstract:
We propose the use of data transformations as a defense against evasion attacks on ML classifiers. We present and investigate strategies for incorporating a variety of data transformations including dimensionality reduction via Principal Component Analysis and data `anti-whitening' to enhance the resilience of machine learning, targeting both the classification and the training phase. We empirical…
▽ More
We propose the use of data transformations as a defense against evasion attacks on ML classifiers. We present and investigate strategies for incorporating a variety of data transformations including dimensionality reduction via Principal Component Analysis and data `anti-whitening' to enhance the resilience of machine learning, targeting both the classification and the training phase. We empirically evaluate and demonstrate the feasibility of linear transformations of data as a defense mechanism against evasion attacks using multiple real-world datasets. Our key findings are that the defense is (i) effective against the best known evasion attacks from the literature, resulting in a two-fold increase in the resources required by a white-box adversary with knowledge of the defense for a successful attack, (ii) applicable across a range of ML classifiers, including Support Vector Machines and Deep Neural Networks, and (iii) generalizable to multiple application domains, including image classification and human activity classification.
△ Less
Submitted 29 November, 2017; v1 submitted 9 April, 2017;
originally announced April 2017.
-
Equivalence of 2D color codes (without translational symmetry) to surface codes
Authors:
Arjun Bhagoji,
Pradeep Sarvepalli
Abstract:
In a recent work, Bombin, Duclos-Cianci, and Poulin showed that every local translationally invariant 2D topological stabilizer code is locally equivalent to a finite number of copies of Kitaev's toric code. For 2D color codes, Delfosse relaxed the constraint on translation invariance and mapped a 2D color code onto three surface codes. In this paper, we propose an alternate map based on linear al…
▽ More
In a recent work, Bombin, Duclos-Cianci, and Poulin showed that every local translationally invariant 2D topological stabilizer code is locally equivalent to a finite number of copies of Kitaev's toric code. For 2D color codes, Delfosse relaxed the constraint on translation invariance and mapped a 2D color code onto three surface codes. In this paper, we propose an alternate map based on linear algebra. We show that any 2D color code can be mapped onto exactly two copies of a related surface code. The surface code in our map is induced by the color code and easily derived from the color code. Furthermore, our map does not require any ancilla qubits for the surface codes.
△ Less
Submitted 27 April, 2015; v1 submitted 10 March, 2015;
originally announced March 2015.
-
A Nano-satellite Mission to Study Charged Particle Precipitation from the Van Allen Radiation Belts caused due to Seismo-Electromagnetic Emissions
Authors:
Nithin Sivadas,
Akshay Gulati,
Deepti Kannapan,
Ananth Saran Yalamarthy,
Ankit Dhiman,
Arjun Bhagoji,
Athreya Shankar,
Nitin Prasad,
Harishankar Ramachandran,
R. David Koilpillai
Abstract:
In the past decade, several attempts have been made to study the effects of seismo-electromagnetic emissions - an earthquake precursor, on the ionosphere and the radiation belts. The IIT Madras nano-satellite (IITMSAT) mission is designed to make sensitive measurements of charged particle fluxes in a Low Earth Orbit to study the nature of charged particle precipitation from the Van Allen radiation…
▽ More
In the past decade, several attempts have been made to study the effects of seismo-electromagnetic emissions - an earthquake precursor, on the ionosphere and the radiation belts. The IIT Madras nano-satellite (IITMSAT) mission is designed to make sensitive measurements of charged particle fluxes in a Low Earth Orbit to study the nature of charged particle precipitation from the Van Allen radiation belts caused due to such emissions. With the Space-based Proton Electron Energy Detector on-board a single nano-satellite, the mission will attempt to gather statistically significant data to verify possible correlations with seismo-electromagnetic emissions before major earthquakes.
△ Less
Submitted 21 November, 2014;
originally announced November 2014.