subscribe to arXiv mailings

Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation

Authors: Tobias Leemann, Periklis Petridis, Giuseppe Vietri, Dionysis Manousakas, Aaron Roth, Sergul Aydore

Abstract: While retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. One common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natura… ▽ More While retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs, LLMs still suffer from hallucination, generating incorrect or irrelevant information. One common detection strategy involves prompting the LLM again to assess whether its response is grounded in the retrieved evidence, but this approach is costly. Alternatively, lightweight natural language inference (NLI) models for efficient grounding verification can be used at inference time. While existing pre-trained NLI models offer potential solutions, their performance remains subpar compared to larger models on realistic RAG inputs. RAG inputs are more complex than most datasets used for training NLI models and have characteristics specific to the underlying knowledge base, requiring adaptation of the NLI models to a specific target domain. Additionally, the lack of labeled instances in the target domain makes supervised domain adaptation, e.g., through fine-tuning, infeasible. To address these challenges, we introduce Automatic Generative Domain Adaptation (Auto-GDA). Our framework enables unsupervised domain adaptation through synthetic data generation. Unlike previous methods that rely on handcrafted filtering and augmentation strategies, Auto-GDA employs an iterative process to continuously improve the quality of generated samples using weak labels from less efficient teacher models and discrete optimization to select the most promising augmented samples. Experimental results demonstrate the effectiveness of our approach, with models fine-tuned on synthetic data using Auto-GDA often surpassing the performance of the teacher model and reaching the performance level of LLMs at 10 % of their computational cost. △ Less

Submitted 4 October, 2024; originally announced October 2024.

arXiv:2409.20070 [pdf, other]

Deciphering the Interface Laws of Turing Mixtures and Foams

Authors: Henrik Weyer, Tobias A. Roth, Erwin Frey

Abstract: For cellular functions like division and polarization, protein pattern formation driven by NTPase cycles is a central spatial control strategy. Operating far from equilibrium, no general theory links microscopic reaction networks and parameters to the pattern type and dynamics. We discover a generic mechanism giving rise to an effective interfacial tension organizing the macroscopic structure of n… ▽ More For cellular functions like division and polarization, protein pattern formation driven by NTPase cycles is a central spatial control strategy. Operating far from equilibrium, no general theory links microscopic reaction networks and parameters to the pattern type and dynamics. We discover a generic mechanism giving rise to an effective interfacial tension organizing the macroscopic structure of non-equilibrium steady-state patterns. Namely, maintaining protein-density interfaces by cyclic protein attachment and detachment produces curvature-dependent protein redistribution which straightens the interface. We develop a non-equilibrium Neumann angle law and Plateau vertex conditions for interface junctions and mesh patterns, thus introducing the concepts of ``Turing mixtures'' and ``Turing foams''. In contrast to liquid foams and mixtures, these non-equilibrium patterns can select an intrinsic wavelength by interrupting an equilibrium-like coarsening process. Data from in vitro experiments with the E. coli Min protein system verifies the vertex conditions and supports the wavelength dynamics. Our study uncovers interface laws with correspondence to thermodynamic relations that arise from distinct physical processes in active systems. It allows the design of specific pattern morphologies with potential applications as spatial control strategies in synthetic cells. △ Less

Submitted 30 September, 2024; originally announced September 2024.

Comments: 11 pages main text, 5 pages Methods, 64 pages Supplementary Information; 12 figures

arXiv:2409.14513 [pdf, other]

Order of Magnitude Speedups for LLM Membership Inference

Authors: Rongting Zhang, Martin Bertran, Aaron Roth

Abstract: Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose significant privacy vulnerabilities. One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs), wherein an adversary aims to determine whether a specific data point was part of the model's training… ▽ More Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose significant privacy vulnerabilities. One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs), wherein an adversary aims to determine whether a specific data point was part of the model's training set. Although this is a known risk, state of the art methodologies for MIAs rely on training multiple computationally costly shadow models, making risk evaluation prohibitive for large models. Here we adapt a recent line of work which uses quantile regression to mount membership inference attacks; we extend this work by proposing a low-cost MIA that leverages an ensemble of small quantile regression models to determine if a document belongs to the model's training set or not. We demonstrate the effectiveness of this approach on fine-tuned LLMs of varying families (OPT, Pythia, Llama) and across multiple datasets. Across all scenarios we obtain comparable or improved accuracy compared to state of the art shadow model approaches, with as little as 6% of their computation budget. We demonstrate increased effectiveness across multi-epoch trained target models, and architecture miss-specification robustness, that is, we can mount an effective attack against a model using a different tokenizer and architecture, without requiring knowledge on the target model. △ Less

Submitted 24 September, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.11504 [pdf, other]

Preventing Representational Rank Collapse in MPNNs by Splitting the Computational Graph

Authors: Andreas Roth, Franka Bause, Nils M. Kriege, Thomas Liebig

Abstract: The ability of message-passing neural networks (MPNNs) to fit complex functions over graphs is limited each iteration of message-passing over a simple makes representations more similar, a phenomenon known as rank collapse, and over-smoothing as a special case. Most approaches to mitigate over-smoothing extend common message-passing schemes, e.g., the graph convolutional network, by utilizing resi… ▽ More The ability of message-passing neural networks (MPNNs) to fit complex functions over graphs is limited each iteration of message-passing over a simple makes representations more similar, a phenomenon known as rank collapse, and over-smoothing as a special case. Most approaches to mitigate over-smoothing extend common message-passing schemes, e.g., the graph convolutional network, by utilizing residual connections, gating mechanisms, normalization, or regularization techniques. Our work contrarily proposes to directly tackle the cause of this issue by modifying the message-passing scheme and exchanging different types of messages using multi-relational graphs. We identify the necessary and sufficient condition to ensure linearly independent node representations. As one instantion, we show that operating on multiple directed acyclic graphs always satisfies our condition and propose to obtain these by defining a strict partial ordering of the nodes. We conduct comprehensive experiments that confirm the benefits of operating on multi-relational graphs to achieve more informative node representations. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.07437 [pdf, other]

A Suite for Acoustic Language Model Evaluation

Authors: Gallil Maimon, Amit Roth, Yossi Adi

Abstract: Speech language models have recently demonstrated great potential as universal speech processing systems. Such models have the ability to model the rich acoustic information existing in audio signals, beyond spoken content, such as emotion, background noise, etc. Despite this, evaluation benchmarks which evaluate awareness to a wide range of acoustic aspects, are lacking. To help bridge this gap,… ▽ More Speech language models have recently demonstrated great potential as universal speech processing systems. Such models have the ability to model the rich acoustic information existing in audio signals, beyond spoken content, such as emotion, background noise, etc. Despite this, evaluation benchmarks which evaluate awareness to a wide range of acoustic aspects, are lacking. To help bridge this gap, we introduce SALMon, a novel evaluation suite encompassing background noise, emotion, speaker identity and room impulse response. The proposed benchmarks both evaluate the consistency of the inspected element and how much it matches the spoken text. We follow a modelling based approach, measuring whether a model gives correct samples higher scores than incorrect ones. This approach makes the benchmark fast to compute even for large models. We evaluated several speech language models on SALMon, thus highlighting the strengths and weaknesses of each evaluated method. Code and data are publicly available at https://pages.cs.huji.ac.il/adiyoss-lab/salmon/ . △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.05608 [pdf, other]

The Value of Ambiguous Commitments in Multi-Follower Games

Authors: Natalie Collina, Rabanus Derr, Aaron Roth

Abstract: We study games in which a leader makes a single commitment, and then multiple followers (each with a different utility function) respond. In particular, we study ambiguous commitment strategies in these games, in which the leader may commit to a set of mixed strategies, and ambiguity-averse followers respond to maximize their worst-case utility over the set of leader strategies. Special cases of t… ▽ More We study games in which a leader makes a single commitment, and then multiple followers (each with a different utility function) respond. In particular, we study ambiguous commitment strategies in these games, in which the leader may commit to a set of mixed strategies, and ambiguity-averse followers respond to maximize their worst-case utility over the set of leader strategies. Special cases of this setting have previously been studied when there is a single follower: in these cases, it is known that the leader can increase her utility by making an ambiguous commitment if the follower is restricted to playing a pure strategy, but that no gain can be had from ambiguity if the follower may mix. We confirm that this result continues to hold in the setting of general Stackelberg games. We then develop a theory of ambiguous commitment in games with multiple followers. We begin by considering the case where the leader must make the same commitment against each follower. We establish that -- unlike the case of a single follower -- ambiguous commitment can improve the leader's utility by an unboundedly large factor, even when followers are permitted to respond with mixed strategies and even. We go on to show an advantage for the leader coupling the same commitment across all followers, even when she has the ability to make a separate commitment to each follower. In particular, there exist general sum games in which the leader can enjoy an unboundedly large advantage by coupling her ambiguous commitment across multiple followers rather than committing against each individually. In zero-sum games we show there can be no such coupling advantage. Finally, we give a polynomial time algorithm for computing the optimal leader commitment strategy in the special case in which the leader has 2 actions (and k followers may have m actions), and prove that in the general case, the problem is NP-hard. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.03956 [pdf, other]

Algorithmic Collusion Without Threats

Authors: Eshwar Ram Arunachaleswaran, Natalie Collina, Sampath Kannan, Aaron Roth, Juba Ziani

Abstract: There has been substantial recent concern that pricing algorithms might learn to ``collude.'' Supra-competitive prices can emerge as a Nash equilibrium of repeated pricing games, in which sellers play strategies which threaten to punish their competitors who refuse to support high prices, and these strategies can be automatically learned. In fact, a standard economic intuition is that supra-compet… ▽ More There has been substantial recent concern that pricing algorithms might learn to ``collude.'' Supra-competitive prices can emerge as a Nash equilibrium of repeated pricing games, in which sellers play strategies which threaten to punish their competitors who refuse to support high prices, and these strategies can be automatically learned. In fact, a standard economic intuition is that supra-competitive prices emerge from either the use of threats, or a failure of one party to optimize their payoff. Is this intuition correct? Would preventing threats in algorithmic decision-making prevent supra-competitive prices when sellers are optimizing for their own revenue? No. We show that supra-competitive prices can emerge even when both players are using algorithms which do not encode threats, and which optimize for their own revenue. We study sequential pricing games in which a first mover deploys an algorithm and then a second mover optimizes within the resulting environment. We show that if the first mover deploys any algorithm with a no-regret guarantee, and then the second mover even approximately optimizes within this now static environment, monopoly-like prices arise. The result holds for any no-regret learning algorithm deployed by the first mover and for any pricing policy of the second mover that obtains them profit at least as high as a random pricing would -- and hence the result applies even when the second mover is optimizing only within a space of non-responsive pricing distributions which are incapable of encoding threats. In fact, there exists a set of strategies, neither of which explicitly encode threats that form a Nash equilibrium of the simultaneous pricing game in algorithm space, and lead to near monopoly prices. This suggests that the definition of ``algorithmic collusion'' may need to be expanded, to include strategies without explicitly encoded threats. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2408.13430 [pdf, other]

Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning?

Authors: Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie J. Su

Abstract: We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be le… ▽ More We conducted an experiment during the review process of the 2023 International Conference on Machine Learning (ICML) that requested authors with multiple submissions to rank their own papers based on perceived quality. We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions. In this paper, we present an empirical analysis of how author-provided rankings could be leveraged to improve peer review processes at machine learning conferences. We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings. Our analysis demonstrates that the ranking-calibrated scores outperform raw scores in estimating the ground truth ``expected review scores'' in both squared and absolute error metrics. Moreover, we propose several cautious, low-risk approaches to using the Isotonic Mechanism and author-provided rankings in peer review processes, including assisting senior area chairs' oversight of area chairs' recommendations, supporting the selection of paper awards, and guiding the recruitment of emergency reviewers. We conclude the paper by addressing the study's limitations and proposing future research directions. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: See more details about the experiment at https://openrank.cc/

arXiv:2407.12206 [pdf, other]

A Language Modeling Approach to Diacritic-Free Hebrew TTS

Authors: Amit Roth, Arnon Turetzky, Yossi Adi

Abstract: We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics, which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which phonemes to use based on the context. This imposes a fundamental challenge… ▽ More We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics, which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which phonemes to use based on the context. This imposes a fundamental challenge on TTS systems to accurately map between text-to-speech. In this work, we propose to adopt a language modeling Diacritics-Free approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer. We optimize the proposed method using in-the-wild weakly supervised data and compare it to several diacritic-based TTS systems. Results suggest the proposed method is superior to the evaluated baselines considering both content preservation and naturalness of the generated speech. Samples can be found under the following link: pages.cs.huji.ac.il/adiyoss-lab/HebTTS/ △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted at Interspeech24

arXiv:2407.11876 [pdf, other]

Simplifying the Theory on Over-Smoothing

Authors: Andreas Roth

Abstract: Graph convolutions have gained popularity due to their ability to efficiently operate on data with an irregular geometric structure. However, graph convolutions cause over-smoothing, which refers to representations becoming more similar with increased depth. However, many different definitions and intuitions currently coexist, leading to research efforts focusing on incompatible directions. This p… ▽ More Graph convolutions have gained popularity due to their ability to efficiently operate on data with an irregular geometric structure. However, graph convolutions cause over-smoothing, which refers to representations becoming more similar with increased depth. However, many different definitions and intuitions currently coexist, leading to research efforts focusing on incompatible directions. This paper attempts to align these directions by showing that over-smoothing is merely a special case of power iteration. This greatly simplifies the existing theory on over-smoothing, making it more accessible. Based on the theory, we provide a novel comprehensive definition of rank collapse as a generalized form of over-smoothing and introduce the rank-one distance as a corresponding metric. Our empirical evaluation of 14 commonly used methods shows that more models than were previously known suffer from this issue. △ Less

Submitted 2 September, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10339 [pdf, other]

Supernova Pointing Capabilities of DUNE

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electr… ▽ More The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electron-neutrino charged-current absorption on $^{40}$Ar and elastic scattering of neutrinos on electrons. Procedures to reconstruct individual interactions, including a newly developed technique called ``brems flipping'', as well as the burst direction from an ensemble of interactions are described. Performance of the burst direction reconstruction is evaluated for supernovae happening at a distance of 10 kpc for a specific supernova burst flux model. The pointing resolution is found to be 3.4 degrees at 68% coverage for a perfect interaction-channel classification and a fiducial mass of 40 kton, and 6.6 degrees for a 10 kton fiducial mass respectively. Assuming a 4% rate of charged-current interactions being misidentified as elastic scattering, DUNE's burst pointing resolution is found to be 4.3 degrees (8.7 degrees) at 68% coverage. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 25 pages, 16 figures

Report number: FERMILAB-PUB-24-0319-LBNF

arXiv:2407.07566 [pdf, other]

HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing

Authors: Arnon Turetzky, Or Tal, Yael Segal-Feldman, Yehoshua Dissen, Ella Zeldes, Amit Roth, Eyal Cohen, Yosi Shrem, Bronya R. Chernyak, Olga Seleznova, Joseph Keshet, Yossi Adi

Abstract: We present HebDB, a weakly supervised dataset for spoken language processing in the Hebrew language. HebDB offers roughly 2500 hours of natural and spontaneous speech recordings in the Hebrew language, consisting of a large variety of speakers and topics. We provide raw recordings together with a pre-processed, weakly supervised, and filtered version. The goal of HebDB is to further enhance resear… ▽ More We present HebDB, a weakly supervised dataset for spoken language processing in the Hebrew language. HebDB offers roughly 2500 hours of natural and spontaneous speech recordings in the Hebrew language, consisting of a large variety of speakers and topics. We provide raw recordings together with a pre-processed, weakly supervised, and filtered version. The goal of HebDB is to further enhance research and development of spoken language processing tools for the Hebrew language. Hence, we additionally provide two baseline systems for Automatic Speech Recognition (ASR): (i) a self-supervised model; and (ii) a fully supervised model. We present the performance of these two methods optimized on HebDB and compare them to current multi-lingual ASR alternatives. Results suggest the proposed method reaches better results than the evaluated baselines considering similar model sizes. Dataset, code, and models are publicly available under https://pages.cs.huji.ac.il/adiyoss-lab/HebDB/. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted at Interspeech2024

arXiv:2405.20272 [pdf, other]

Reconstruction Attacks on Machine Unlearning: Simple Models are Vulnerable

Authors: Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu

Abstract: Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entiret… ▽ More Machine unlearning is motivated by desire for data autonomy: a person can request to have their data's influence removed from deployed models, and those models should be updated as if they were retrained without the person's data. We show that, counter-intuitively, these updates expose individuals to high-accuracy reconstruction attacks which allow the attacker to recover their data in its entirety, even when the original models are so simple that privacy risk might not otherwise have been a concern. We show how to mount a near-perfect attack on the deleted data point from linear regression models. We then generalize our attack to other loss functions and architectures, and empirically demonstrate the effectiveness of our attacks across a wide range of datasets (capturing both tabular and image data). Our work highlights that privacy risk is significant even for extremely simple model classes when individuals can request deletion of their data from the model. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.16752 [pdf, other]

Model Ensembling for Constrained Optimization

Authors: Ira Globus-Harris, Varun Gupta, Michael Kearns, Aaron Roth

Abstract: There is a long history in machine learning of model ensembling, beginning with boosting and bagging and continuing to the present day. Much of this history has focused on combining models for classification and regression, but recently there is interest in more complex settings such as ensembling policies in reinforcement learning. Strong connections have also emerged between ensembling and multi… ▽ More There is a long history in machine learning of model ensembling, beginning with boosting and bagging and continuing to the present day. Much of this history has focused on combining models for classification and regression, but recently there is interest in more complex settings such as ensembling policies in reinforcement learning. Strong connections have also emerged between ensembling and multicalibration techniques. In this work, we further investigate these themes by considering a setting in which we wish to ensemble models for multidimensional output predictions that are in turn used for downstream optimization. More precisely, we imagine we are given a number of models mapping a state space to multidimensional real-valued predictions. These predictions form the coefficients of a linear objective that we would like to optimize under specified constraints. The fundamental question we address is how to improve and combine such models in a way that outperforms the best of them in the downstream optimization problem. We apply multicalibration techniques that lead to two provably efficient and convergent algorithms. The first of these (the white box approach) requires being given models that map states to output predictions, while the second (the \emph{black box} approach) requires only policies (mappings from states to solutions to the optimization problem). For both, we provide convergence and utility guarantees. We conclude by investigating the performance and behavior of the two algorithms in a controlled experimental setting. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16739 [pdf, other]

Oracle-Efficient Reinforcement Learning for Max Value Ensembles

Authors: Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell

Abstract: Reinforcement learning (RL) in large or infinite state spaces is notoriously challenging, both theoretically (where worst-case sample and computational complexities must scale with state space cardinality) and experimentally (where function approximation and policy gradient techniques often scale poorly and suffer from instability and high variance). One line of research attempting to address thes… ▽ More Reinforcement learning (RL) in large or infinite state spaces is notoriously challenging, both theoretically (where worst-case sample and computational complexities must scale with state space cardinality) and experimentally (where function approximation and policy gradient techniques often scale poorly and suffer from instability and high variance). One line of research attempting to address these difficulties makes the natural assumption that we are given a collection of heuristic base or $\textit{constituent}$ policies upon which we would like to improve in a scalable manner. In this work we aim to compete with the $\textit{max-following policy}$, which at each state follows the action of whichever constituent policy has the highest value. The max-following policy is always at least as good as the best constituent policy, and may be considerably better. Our main result is an efficient algorithm that learns to compete with the max-following policy, given only access to the constituent policies (but not their value functions). In contrast to prior work in similar settings, our theoretical results require only the minimal assumption of an ERM oracle for value function approximation for the constituent policies (and not the global optimal policy or the max-following policy itself) on samplable distributions. We illustrate our algorithm's experimental effectiveness and behavior on several robotic simulation testbeds. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.02225 [pdf, other]

Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks

Authors: Lujing Zhang, Aaron Roth, Linjun Zhang

Abstract: This paper introduces a framework for post-processing machine learning models so that their predictions satisfy multi-group fairness guarantees. Based on the celebrated notion of multicalibration, we introduce $(\mathbf{s},\mathcal{G}, α)-$GMC (Generalized Multi-Dimensional Multicalibration) for multi-dimensional mappings $\mathbf{s}$, constraint set $\mathcal{G}$, and a pre-specified threshold le… ▽ More This paper introduces a framework for post-processing machine learning models so that their predictions satisfy multi-group fairness guarantees. Based on the celebrated notion of multicalibration, we introduce $(\mathbf{s},\mathcal{G}, α)-$GMC (Generalized Multi-Dimensional Multicalibration) for multi-dimensional mappings $\mathbf{s}$, constraint set $\mathcal{G}$, and a pre-specified threshold level $α$. We propose associated algorithms to achieve this notion in general settings. This framework is then applied to diverse scenarios encompassing different fairness concerns, including false negative rate control in image segmentation, prediction set conditional uncertainty quantification in hierarchical classification, and de-biased text generation in language models. We conduct numerical studies on several datasets and tasks. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 28 pages, 8 figures, accepted by ICML2024

arXiv:2404.09626 [pdf, other]

doi 10.1093/mnras/stae984

Hot Jupiter Diversity and the Onset of TiO/VO Revealed by a Large Grid of Non-Grey Global Circulation Models

Authors: Alexander Roth, Vivien Parmentier, Mark Hammond

Abstract: The population of hot Jupiters is extremely diverse, with large variations in their irradiation, period, gravity and chemical composition. To understand the intrinsic planet diversity through the observed population level trends, we explore the a-priori scatter in the population created by the different responses of atmospheric circulation to planetary parameters. We use the SPARC/MITgcm 3D global… ▽ More The population of hot Jupiters is extremely diverse, with large variations in their irradiation, period, gravity and chemical composition. To understand the intrinsic planet diversity through the observed population level trends, we explore the a-priori scatter in the population created by the different responses of atmospheric circulation to planetary parameters. We use the SPARC/MITgcm 3D global circulation model to simulate 345 planets spanning a wide range of instellation, metallicity, gravity and rotation periods typical for hot Jupiters, while differentiating between models with and without TiO/VO in their atmosphere. We show that the combined effect of the planetary parameters leads to a large diversity in the ability of atmospheres to transport heat from day-side to night-side at a given equilibrium temperature. We further show that the hot-spot offset is a non-monotonic function of planetary rotation period and explain our findings by a competition between the rotational and divergent parts of the circulation. As a consequence, hot-spot offset and phase curve amplitude are not necessarily correlated. Finally, we compare the observables from our grid to the population of Spitzer and Hubble observations of hot Jupiters. We find that the sudden jump in brightness temperature observed in the Spitzer secondary eclipse measurements can be naturally explained by the cold-trapping of TiO/VO at approximately 1800K. The grid of modelled spectra, phase curves and thermal structures are made available to the community, together with a python code for visualization of the grid properties, at https://doi.org/10.5281/zenodo.10785321 and http://sim3d.oca.eu/. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 28 pages, 25 figures, accepted in MNRAS

arXiv:2404.04689 [pdf, other]

Multicalibration for Confidence Scoring in LLMs

Authors: Gianluca Detommaso, Martin Bertran, Riccardo Fogliato, Aaron Roth

Abstract: This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting groupings of the data. We show how to form groupings for prompt/completion pairs that are correlated with the probability of correctnes… ▽ More This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting groupings of the data. We show how to form groupings for prompt/completion pairs that are correlated with the probability of correctness via two techniques: clustering within an embedding space, and "self-annotation" - querying the LLM by asking it various yes-or-no questions about the prompt. We also develop novel variants of multicalibration algorithms that offer performance improvements by reducing their tendency to overfit. Through systematic benchmarking across various question answering datasets and LLMs, we show how our techniques can yield confidence scores that provide substantial improvements in fine-grained measures of both calibration and accuracy compared to existing methods. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2402.17108 [pdf, ps, other]

Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and Limited Liability

Authors: Natalie Collina, Varun Gupta, Aaron Roth

Abstract: We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory -- the game induced when choosing an Agent to contract with. Fi… ▽ More We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents. We give several results aimed at understanding an under-explored aspect of contract theory -- the game induced when choosing an Agent to contract with. First, we show that this game admits a pure-strategy \emph{non-responsive} equilibrium amongst the Agents -- informally an equilibrium in which the Agent's actions depend on the history of realized states of nature, but not on the history of each other's actions, and so avoids the complexities of collusion and threats. Next, we show that if the Principal selects Agents using a \emph{monotone} bandit algorithm, then for any concave contract, in any such equilibrium, the Principal obtains no regret to contracting with the best Agent in hindsight -- not just given their realized actions, but also to the counterfactual world in which they had offered a guaranteed $T$-round contract to the best Agent in hindsight, which would have induced a different sequence of actions. Finally, we show that if the Principal selects Agents using a monotone bandit algorithm which guarantees no swap-regret, then the Principal can additionally offer only limited liability contracts (in which the Agent never needs to pay the Principal) while getting no-regret to the counterfactual world in which she offered a linear contract to the best Agent in hindsight -- despite the fact that linear contracts are not limited liability. We instantiate this theorem by demonstrating the existence of a monotone no swap-regret bandit algorithm, which to our knowledge has not previously appeared in the literature. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.11410 [pdf, ps, other]

An Elementary Predictor Obtaining $2\sqrt{T}+1$ Distance to Calibration

Authors: Eshwar Ram Arunachaleswaran, Natalie Collina, Aaron Roth, Mirah Shi

Abstract: Blasiok et al. [2023] proposed distance to calibration as a natural measure of calibration error that unlike expected calibration error (ECE) is continuous. Recently, Qiao and Zheng [2024] gave a non-constructive argument establishing the existence of an online predictor that can obtain $O(\sqrt{T})$ distance to calibration in the adversarial setting, which is known to be impossible for ECE. They… ▽ More Blasiok et al. [2023] proposed distance to calibration as a natural measure of calibration error that unlike expected calibration error (ECE) is continuous. Recently, Qiao and Zheng [2024] gave a non-constructive argument establishing the existence of an online predictor that can obtain $O(\sqrt{T})$ distance to calibration in the adversarial setting, which is known to be impossible for ECE. They leave as an open problem finding an explicit, efficient algorithm. We resolve this problem and give an extremely simple, efficient, deterministic algorithm that obtains distance to calibration error at most $2\sqrt{T}+1$. △ Less

Submitted 7 October, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.10795 [pdf, other]

Diversified Ensembling: An Experiment in Crowdsourced Machine Learning

Authors: Ira Globus-Harris, Declan Harrison, Michael Kearns, Pietro Perona, Aaron Roth

Abstract: Crowdsourced machine learning on competition platforms such as Kaggle is a popular and often effective method for generating accurate models. Typically, teams vie for the most accurate model, as measured by overall error on a holdout set, and it is common towards the end of such competitions for teams at the top of the leaderboard to ensemble or average their models outside the platform mechanism… ▽ More Crowdsourced machine learning on competition platforms such as Kaggle is a popular and often effective method for generating accurate models. Typically, teams vie for the most accurate model, as measured by overall error on a holdout set, and it is common towards the end of such competitions for teams at the top of the leaderboard to ensemble or average their models outside the platform mechanism to get the final, best global model. In arXiv:2201.10408, the authors developed an alternative crowdsourcing framework in the context of fair machine learning, in order to integrate community feedback into models when subgroup unfairness is present and identifiable. There, unlike in classical crowdsourced ML, participants deliberately specialize their efforts by working on subproblems, such as demographic subgroups in the service of fairness. Here, we take a broader perspective on this work: we note that within this framework, participants may both specialize in the service of fairness and simply to cater to their particular expertise (e.g., focusing on identifying bird species in an image classification task). Unlike traditional crowdsourcing, this allows for the diversification of participants' efforts and may provide a participation mechanism to a larger range of individuals (e.g. a machine learning novice who has insight into a specific fairness concern). We present the first medium-scale experimental evaluation of this framework, with 46 participating teams attempting to generate models to predict income from American Community Survey data. We provide an empirical analysis of teams' approaches, and discuss the novel system architecture we developed. From here, we give concrete guidance for how best to deploy such a framework. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.08753 [pdf, ps, other]

Forecasting for Swap Regret for All Downstream Agents

Authors: Aaron Roth, Mirah Shi

Abstract: We study the problem of making predictions so that downstream agents who best respond to them will be guaranteed diminishing swap regret, no matter what their utility functions are. It has been known since Foster and Vohra (1997) that agents who best-respond to calibrated forecasts have no swap regret. Unfortunately, the best known algorithms for guaranteeing calibrated forecasts in sequential adv… ▽ More We study the problem of making predictions so that downstream agents who best respond to them will be guaranteed diminishing swap regret, no matter what their utility functions are. It has been known since Foster and Vohra (1997) that agents who best-respond to calibrated forecasts have no swap regret. Unfortunately, the best known algorithms for guaranteeing calibrated forecasts in sequential adversarial environments do so at rates that degrade exponentially with the dimension of the prediction space. In this work, we show that by making predictions that are not calibrated, but are unbiased subject to a carefully selected collection of events, we can guarantee arbitrary downstream agents diminishing swap regret at rates that substantially improve over the rates that result from calibrated forecasts -- while maintaining the appealing property that our forecasts give guarantees for any downstream agent, without our forecasting algorithm needing to know their utility function. We give separate results in the ``low'' (1 or 2) dimensional setting and the ``high'' ($> 2$) dimensional setting. In the low dimensional setting, we show how to make predictions such that all agents who best respond to our predictions have diminishing swap regret -- in 1 dimension, at the optimal $O(\sqrt{T})$ rate. In the high dimensional setting we show how to make forecasts that guarantee regret scaling at a rate of $O(T^{2/3})$ (crucially, a dimension independent exponent), under the assumption that downstream agents smoothly best respond. Our results stand in contrast to rates that derive from agents who best respond to calibrated forecasts, which have an exponential dependence on the dimension of the prediction space. △ Less

Submitted 15 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2312.06589 [pdf, other]

Power sector impacts of a simultaneous European heat pump rollout

Authors: Alexander Roth

Abstract: The decarbonization of buildings requires the phase-out of fossil fuel heating systems. Heat pumps are considered a crucial technology to supply a substantial part of heating energy for buildings. Yet, their introduction is not without challenges, as heat pumps generate additional electricity demand as well as peak loads. To better understand these challenges, an ambitious simultaneous heat pump r… ▽ More The decarbonization of buildings requires the phase-out of fossil fuel heating systems. Heat pumps are considered a crucial technology to supply a substantial part of heating energy for buildings. Yet, their introduction is not without challenges, as heat pumps generate additional electricity demand as well as peak loads. To better understand these challenges, an ambitious simultaneous heat pump rollout in several central European countries with an hourly-resolved capacity expansion model of the power sector is studied. I assess the structure of hours and periods of peak heat demands and their concurrence with hours and periods of peak residual load. In a 2030 scenario, I find that meeting 25% of total heat demand in buildings with heat pumps would be covered best with additional wind power generation capacities. I also identify the important role of small thermal energy storage that could reduce the need for additional firm generation capacity. Due to the co-occurrence of heat demand, interconnection between countries does not substantially reduce the additional generation capacities needed for heat pump deployment. Based on six different weather years, my analysis cautions against relying on results based on a single weather year. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.05140 [pdf, other]

Membership Inference Attacks on Diffusion Models via Quantile Regression

Authors: Shuai Tang, Zhiwei Steven Wu, Sergul Aydore, Michael Kearns, Aaron Roth

Abstract: Recently, diffusion models have become popular tools for image synthesis because of their high-quality outputs. However, like other large-scale models, they may leak private information about their training data. Here, we demonstrate a privacy vulnerability of diffusion models through a \emph{membership inference (MI) attack}, which aims to identify whether a target example belongs to the training… ▽ More Recently, diffusion models have become popular tools for image synthesis because of their high-quality outputs. However, like other large-scale models, they may leak private information about their training data. Here, we demonstrate a privacy vulnerability of diffusion models through a \emph{membership inference (MI) attack}, which aims to identify whether a target example belongs to the training set when given the trained diffusion model. Our proposed MI attack learns quantile regression models that predict (a quantile of) the distribution of reconstruction loss on examples not used in training. This allows us to define a granular hypothesis test for determining the membership of a point in the training set, based on thresholding the reconstruction loss of that point using a custom threshold tailored to the example. We also provide a simple bootstrap technique that takes a majority membership prediction over ``a bag of weak attackers'' which improves the accuracy over individual quantile regression models. We show that our attack outperforms the prior state-of-the-art attack while being substantially less computationally expensive -- prior attacks required training multiple ``shadow models'' with the same architecture as the model under attack, whereas our attack requires training only much smaller models. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2311.07754 [pdf, other]

Efficient Prior-Free Mechanisms for No-Regret Agents

Authors: Natalie Collina, Aaron Roth, Han Shao

Abstract: We study a repeated Principal Agent problem between a long lived Principal and Agent pair in a prior free setting. In our setting, the sequence of realized states of nature may be adversarially chosen, the Agent is non-myopic, and the Principal aims for a strong form of policy regret. Following Camara, Hartline, and Johnson, we model the Agent's long-run behavior with behavioral assumptions that r… ▽ More We study a repeated Principal Agent problem between a long lived Principal and Agent pair in a prior free setting. In our setting, the sequence of realized states of nature may be adversarially chosen, the Agent is non-myopic, and the Principal aims for a strong form of policy regret. Following Camara, Hartline, and Johnson, we model the Agent's long-run behavior with behavioral assumptions that relax the common prior assumption (for example, that the Agent has no swap regret). Within this framework, we revisit the mechanism proposed by Camara et al., which informally uses calibrated forecasts of the unknown states of nature in place of a common prior. We give two main improvements. First, we give a mechanism that has an exponentially improved dependence (in terms of both running time and regret bounds) on the number of distinct states of nature. To do this, we show that our mechanism does not require truly calibrated forecasts, but rather forecasts that are unbiased subject to only a polynomially sized collection of events -- which can be produced with polynomial overhead. Second, in several important special cases -- including the focal linear contracting setting -- we show how to remove strong ``Alignment'' assumptions (which informally require that near-ties are always broken in favor of the Principal) by specifically deploying ``stable'' policies that do not have any near ties that are payoff relevant to the Principal. Taken together, our new mechanism makes the compelling framework proposed by Camara et al. much more powerful, now able to be realized over polynomially sized state spaces, and while requiring only mild assumptions on Agent behavior. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.17651 [pdf, other]

High-Dimensional Prediction for Sequential Decision Making

Authors: Georgy Noarov, Ramya Ramalingam, Aaron Roth, Stephan Xie

Abstract: We study the problem of making predictions of an adversarially chosen high-dimensional state that are unbiased subject to an arbitrary collection of conditioning events, with the goal of tailoring these events to downstream decision makers. We give efficient algorithms for solving this problem, as well as a number of applications that stem from choosing an appropriate set of conditioning events.… ▽ More We study the problem of making predictions of an adversarially chosen high-dimensional state that are unbiased subject to an arbitrary collection of conditioning events, with the goal of tailoring these events to downstream decision makers. We give efficient algorithms for solving this problem, as well as a number of applications that stem from choosing an appropriate set of conditioning events. For example, we can efficiently make predictions targeted at polynomially many decision makers, giving each of them optimal swap regret if they best-respond to our predictions. We generalize this to online combinatorial optimization, where the decision makers have a very large action space, to give the first algorithms offering polynomially many decision makers no regret on polynomially many subsequences that may depend on their actions and the context. We apply these results to get efficient no-subsequence-regret algorithms in extensive-form games (EFGs), yielding a new family of regret guarantees for EFGs that generalizes some existing EFG regret notions, e.g. regret to informed causal deviations, and is generally incomparable to other known such notions. Next, we develop a novel transparent alternative to conformal prediction for building valid online adversarial multiclass prediction sets. We produce class scores that downstream algorithms can use for producing valid-coverage prediction sets, as if these scores were the true conditional class probabilities. We show this implies strong conditional validity guarantees including set-size-conditional and multigroup-fair coverage for polynomially many downstream prediction sets. Moreover, our class scores can be guaranteed to have improved $L_2$ loss, cross-entropy loss, and generally any Bregman loss, compared to any collection of benchmark models, yielding a high-dimensional real-valued version of omniprediction. △ Less

Submitted 27 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Added references, Arxiv abstract edited

arXiv:2310.05693 [pdf, other]

doi 10.1093/mnras/stae932

CONGRuENTS (COsmic-ray, Neutrino, Gamma-ray and Radio Non-Thermal Spectra). II. Population-level correlations between galactic infrared, radio, and γ-ray emission

Authors: Matt A. Roth, Mark R. Krumholz, Roland M. Crocker, Todd A. Thompson

Abstract: Galaxies obey a number of empirical correlations between their radio, γ-ray, and infrared emission, but the physical origins of these correlations remain uncertain. Here we use the CONGRuENTS model for broadband non-thermal emission from star-forming galaxies, which self-consistently calculates energy-dependent transport and non-thermal emission from cosmic ray hadrons and leptons, to predict radi… ▽ More Galaxies obey a number of empirical correlations between their radio, γ-ray, and infrared emission, but the physical origins of these correlations remain uncertain. Here we use the CONGRuENTS model for broadband non-thermal emission from star-forming galaxies, which self-consistently calculates energy-dependent transport and non-thermal emission from cosmic ray hadrons and leptons, to predict radio and γ-ray emission for a synthetic galaxy population with properties drawn from a large deep-field survey. We show that our synthetic galaxies reproduce observed relations such as the FIR-radio correlation, the FIR-γ correlation, and the distribution of radio spectral indices, and we use the model to explain the physical origins of these relations. Our results show that the FIR-radio correlation arises because the amount of cosmic ray electron power ultimately radiated as synchrotron emission varies only weakly with galaxy star formation rate as a result of the constraints imposed on gas properties by hydrostatic balance and turbulent dynamo action; the same physics dictates the extent of proton calorimetry in different galaxies, and thus sets the FIR-γ-ray correlation. We further show that galactic radio spectral indices result primarily from competition between thermal free-free emission and energy-dependent loss of cosmic ray electrons to bremsstrahlung and escape into galactic halos, with shaping of the spectrum by inverse Compton, synchrotron, and ionisation processes typically playing a sub-dominant role. In addition to explaining existing observations, we use our analysis to predict a heretofore unseen correlation between the curvature of galaxies' radio spectra and their pion-driven γ-ray emission, a prediction that will be testable with upcoming facilities. △ Less

Submitted 15 June, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: 17 pages, 14 figures

Journal ref: MNRAS, Volume 530, Issue 2, May 2024, Pages 1849-1865

arXiv:2310.04652 [pdf, other]

Oracle Efficient Algorithms for Groupwise Regret

Authors: Krishna Acharya, Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, Juba Ziani

Abstract: We study the problem of online prediction, in which at each time step $t$, an individual $x_t$ arrives, whose label we must predict. Each individual is associated with various groups, defined based on their features such as age, sex, race etc., which may intersect. Our goal is to make predictions that have regret guarantees not just overall but also simultaneously on each sub-sequence comprised of… ▽ More We study the problem of online prediction, in which at each time step $t$, an individual $x_t$ arrives, whose label we must predict. Each individual is associated with various groups, defined based on their features such as age, sex, race etc., which may intersect. Our goal is to make predictions that have regret guarantees not just overall but also simultaneously on each sub-sequence comprised of the members of any single group. Previous work such as [Blum & Lykouris] and [Lee et al] provide attractive regret guarantees for these problems; however, these are computationally intractable on large model classes. We show that a simple modification of the sleeping experts technique of [Blum & Lykouris] yields an efficient reduction to the well-understood problem of obtaining diminishing external regret absent group considerations. Our approach gives similar regret guarantees compared to [Blum & Lykouris]; however, we run in time linear in the number of groups, and are oracle-efficient in the hypothesis class. This in particular implies that our algorithm is efficient whenever the number of groups is polynomially bounded and the external-regret problem can be solved efficiently, an improvement on [Blum & Lykouris]'s stronger condition that the model class must be small. Our approach can handle online linear regression and online combinatorial optimization problems like online shortest paths. Beyond providing theoretical regret bounds, we evaluate this algorithm with an extensive set of experiments on synthetic data and on two real data sets -- Medical costs and the Adult income dataset, both instantiated with intersecting groups defined in terms of race, sex, and other demographic characteristics. We find that uniformly across groups, our algorithm gives substantial error improvements compared to running a standard online linear regression algorithm with no groupwise regret guarantees. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.00946 [pdf, other]

Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks

Authors: Andreas Roth, Thomas Liebig

Abstract: Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify t… ▽ More Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify the variation in reasons used by nodes across models by comparing their influence distribution. Additionally, we consider the differences between nodes with a stable and an unstable prediction, positing that both equally utilize different reasons and thus provide a meaningful gradient signal to closely match two models even when the predictions for nodes are similar. Based on our analysis, we propose to minimize this ID in Knowledge Distillation, a domain where a new model should closely match an established one. As an efficient approximation, we introduce DropDistillation (DD) that matches the output for a graph perturbed by edge deletions. Our empirical evaluation of six benchmark datasets for node classification validates the differences in utilized features. DD outperforms previous methods regarding prediction stability and overall performance in all considered Knowledge Distillation experiments. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: Accepted at ACML 2023

arXiv:2309.06000 [pdf, other]

Gait Design of a Novel Arboreal Concertina Locomotion for Snake-like Robots

Authors: Shuoqi Chen, Aaron Roth

Abstract: In this paper, we propose a novel strategy for a snake robot to move straight up a cylindrical surface. Prior works on pole-climbing for a snake robot mainly utilized a rolling helix gait, and although proven to be efficient, it does not reassemble movements made by a natural snake. We take inspiration from nature and seek to imitate the Arboreal Concertina Locomotion (ACL) from real-life serpents… ▽ More In this paper, we propose a novel strategy for a snake robot to move straight up a cylindrical surface. Prior works on pole-climbing for a snake robot mainly utilized a rolling helix gait, and although proven to be efficient, it does not reassemble movements made by a natural snake. We take inspiration from nature and seek to imitate the Arboreal Concertina Locomotion (ACL) from real-life serpents. In order to represent the 3D curves that make up the key motion patterns of ACL, we establish a set of parametric equations that identify periodic functions, which produce a sequence of backbone curves. We then build up the gait equation using the curvature integration method, and finally, we propose a simple motion estimation strategy using virtual chassis and non-slip model assumptions. We present experimental results using a 20-DOF snake robot traversing outside of a straight pipe. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 4 pages, 3 figures

arXiv:2308.16800 [pdf, other]

Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks

Authors: Andreas Roth, Thomas Liebig

Abstract: Our study reveals new theoretical insights into over-smoothing and feature over-correlation in graph neural networks. Specifically, we demonstrate that with increased depth, node representations become dominated by a low-dimensional subspace that depends on the aggregation function but not on the feature transformations. For all aggregation functions, the rank of the node representations collapses… ▽ More Our study reveals new theoretical insights into over-smoothing and feature over-correlation in graph neural networks. Specifically, we demonstrate that with increased depth, node representations become dominated by a low-dimensional subspace that depends on the aggregation function but not on the feature transformations. For all aggregation functions, the rank of the node representations collapses, resulting in over-smoothing for particular aggregation functions. Our study emphasizes the importance for future research to focus on rank collapse rather than over-smoothing. Guided by our theory, we propose a sum of Kronecker products as a beneficial property that provably prevents over-smoothing, over-correlation, and rank collapse. We empirically demonstrate the shortcomings of existing models in fitting target functions of node classification tasks. △ Less

Submitted 17 September, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: LoG 2023

arXiv:2308.16516 [pdf, other]

Curvature-based Pooling within Graph Neural Networks

Authors: Cedric Sanders, Andreas Roth, Thomas Liebig

Abstract: Over-squashing and over-smoothing are two critical issues, that limit the capabilities of graph neural networks (GNNs). While over-smoothing eliminates the differences between nodes making them indistinguishable, over-squashing refers to the inability of GNNs to propagate information over long distances, as exponentially many node states are squashed into fixed-size representations. Both phenomena… ▽ More Over-squashing and over-smoothing are two critical issues, that limit the capabilities of graph neural networks (GNNs). While over-smoothing eliminates the differences between nodes making them indistinguishable, over-squashing refers to the inability of GNNs to propagate information over long distances, as exponentially many node states are squashed into fixed-size representations. Both phenomena share similar causes, as both are largely induced by the graph topology. To mitigate these problems in graph classification tasks, we propose CurvPool, a novel pooling method. CurvPool exploits the notion of curvature of a graph to adaptively identify structures responsible for both over-smoothing and over-squashing. By clustering nodes based on the Balanced Forman curvature, CurvPool constructs a graph with a more suitable structure, allowing deeper models and the combination of distant information. We compare it to other state-of-the-art pooling approaches and establish its competitiveness in terms of classification accuracy, computational complexity, and flexibility. CurvPool outperforms several comparable methods across all considered tasks. The most consistent results are achieved by pooling densely connected clusters using the sum aggregation, as this allows additional information about the size of each pool. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: ECMLPKDD 2023 - Workshop on Mining and Learning with Graphs

arXiv:2307.12918 [pdf, other]

Power sector benefits of flexible heat pumps

Authors: Alexander Roth, Carlos Gaete-Morales, Dana Kirchem, Wolf-Peter Schill

Abstract: Heat pumps play a major role in decreasing fossil fuel use in heating. They increase electricity demand, but could also foster the system integration of variable renewable energy sources. We analyze three scenarios for expanding decentralized heat pumps in Germany by 2030, focusing on the role of buffer heat storage. Using an open-source power sector model, we assess costs, capacity investments, a… ▽ More Heat pumps play a major role in decreasing fossil fuel use in heating. They increase electricity demand, but could also foster the system integration of variable renewable energy sources. We analyze three scenarios for expanding decentralized heat pumps in Germany by 2030, focusing on the role of buffer heat storage. Using an open-source power sector model, we assess costs, capacity investments, and emissions effects. We find that investments in solar photovoltaics can cost-effectively accompany the roll-out of heat pumps in case wind power expansion potentials are limited. Results further show that short-duration heat storage substantially reduces the need for firm capacity and battery storage. Larger heat storage sizes do not substantially change the results. Increasing the number of heat pumps from 1.7 to 10 million units could annually save around a quarter of Germany's overall natural gas consumption and around half of households' building-related CO2 emissions. △ Less

Submitted 15 October, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.08999 [pdf, ps, other]

Oracle Efficient Online Multicalibration and Omniprediction

Authors: Sumegha Garg, Christopher Jung, Omer Reingold, Aaron Roth

Abstract: A recent line of work has shown a surprising connection between multicalibration, a multi-group fairness notion, and omniprediction, a learning paradigm that provides simultaneous loss minimization guarantees for a large family of loss functions. Prior work studies omniprediction in the batch setting. We initiate the study of omniprediction in the online adversarial setting. Although there exist a… ▽ More A recent line of work has shown a surprising connection between multicalibration, a multi-group fairness notion, and omniprediction, a learning paradigm that provides simultaneous loss minimization guarantees for a large family of loss functions. Prior work studies omniprediction in the batch setting. We initiate the study of omniprediction in the online adversarial setting. Although there exist algorithms for obtaining notions of multicalibration in the online adversarial setting, unlike batch algorithms, they work only for small finite classes of benchmark functions $F$, because they require enumerating every function $f \in F$ at every round. In contrast, omniprediction is most interesting for learning theoretic hypothesis classes $F$, which are generally continuously large. We develop a new online multicalibration algorithm that is well defined for infinite benchmark classes $F$, and is oracle efficient (i.e. for any class $F$, the algorithm has the form of an efficient reduction to a no-regret learning algorithm for $F$). The result is the first efficient online omnipredictor -- an oracle efficient prediction algorithm that can be used to simultaneously obtain no regret guarantees to all Lipschitz convex loss functions. For the class $F$ of linear functions, we show how to make our algorithm efficient in the worst case. Also, we show upper and lower bounds on the extent to which our rates can be improved: our oracle efficient algorithm actually promises a stronger guarantee called swap-omniprediction, and we prove a lower bound showing that obtaining $O(\sqrt{T})$ bounds for swap-omniprediction is impossible in the online setting. On the other hand, we give a (non-oracle efficient) algorithm which can obtain the optimal $O(\sqrt{T})$ omniprediction bounds without going through multicalibration, giving an information theoretic separation between these two solution concepts. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2307.03694 [pdf, other]

Scalable Membership Inference Attacks via Quantile Regression

Authors: Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu

Abstract: Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) u… ▽ More Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2306.15083 [pdf, other]

doi 10.4230/LIPIcs.FORC.2024.4

Balanced Filtering via Disclosure-Controlled Proxies

Authors: Siqi Deng, Emily Diana, Michael Kearns, Aaron Roth

Abstract: We study the problem of collecting a cohort or set that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time. Specifically, our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone. To do this, we study a learner… ▽ More We study the problem of collecting a cohort or set that is balanced with respect to sensitive groups when group membership is unavailable or prohibited from use at deployment time. Specifically, our deployment-time collection mechanism does not reveal significantly more about the group membership of any individual sample than can be ascertained from base rates alone. To do this, we study a learner that can use a small set of labeled data to train a proxy function that can later be used for this filtering or selection task. We then associate the range of the proxy function with sampling probabilities; given a new example, we classify it using our proxy function and then select it with probability corresponding to its proxy classification. Importantly, we require that the proxy classification does not reveal significantly more information about the sensitive group membership of any individual example compared to population base rates alone (i.e., the level of disclosure should be controlled) and show that we can find such a proxy in a sample- and oracle-efficient manner. Finally, we experimentally evaluate our algorithm and analyze its generalization properties. △ Less

Submitted 17 June, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

Journal ref: 5th Symposium on Foundations of Responsible Computing (FORC 2024)

arXiv:2305.16887 [pdf, other]

doi 10.1093/mnras/stad1547

Awesome SOSS: Atmospheric Characterisation of WASP-96 b using the JWST Early Release Observations

Authors: Jake Taylor, Michael Radica, Luis Welbanks, Ryan J. MacDonald, Jasmina Blecic, Maria Zamyatina, Alexander Roth, Jacob L. Bean, Vivien Parmentier, Louis-Philippe Coulombe, Adina D. Feinstein, Néstor Espinoza, Björn Benneke, David Lafrenière, René Doyon, Eva-Maria Ahrer

Abstract: The newly operational JWST offers the potential to study the atmospheres of distant worlds with precision that has not been achieved before. One of the first exoplanets observed by JWST in the summer of 2022 was WASP-96 b, a hot-Saturn orbiting a G8 star. As part of the Early Release Observations program, one transit of WASP-96 b was observed with NIRISS/SOSS to capture its transmission spectrum f… ▽ More The newly operational JWST offers the potential to study the atmospheres of distant worlds with precision that has not been achieved before. One of the first exoplanets observed by JWST in the summer of 2022 was WASP-96 b, a hot-Saturn orbiting a G8 star. As part of the Early Release Observations program, one transit of WASP-96 b was observed with NIRISS/SOSS to capture its transmission spectrum from 0.6-2.85 microns. In this work, we utilise four retrieval frameworks to report precise and robust measurements of WASP-96 b's atmospheric composition. We constrain the logarithmic volume mixing ratios of multiple chemical species in its atmosphere, including: H$_2$O = $-3.59 ^{+ 0.35 }_{- 0.35 }$, CO$_2$ = $-4.38 ^{+ 0.47 }_{- 0.57 }$ and K = $-8.04 ^{+ 1.22 }_{- 1.71 }$. Notably, our results offer a first abundance constraint on potassium in WASP-96 b's atmosphere, and important inferences on carbon-bearing species such as CO$_2$ and CO. Our short wavelength NIRISS/SOSS data are best explained by the presence of an enhanced Rayleigh scattering slope, despite previous inferences of a clear atmosphere - although we find no evidence for a grey cloud deck. Finally, we explore the data resolution required to appropriately interpret observations using NIRISS/SOSS. We find that our inferences are robust against different binning schemes. That is, from low $R = 125$ to the native resolution of the instrument, the bulk atmospheric properties of the planet are consistent. Our systematic analysis of these exquisite observations demonstrates the power of NIRISS/SOSS to detect and constrain multiple molecular and atomic species in the atmospheres of hot giant planets. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 12 pages, 5 Figures. Accepted for publication in MNRAS. Companion paper to Radica et al., 2023

arXiv:2303.03451 [pdf, other]

Improved Differentially Private Regression via Gradient Boosting

Authors: Shuai Tang, Sergul Aydore, Michael Kearns, Saeyoung Rho, Aaron Roth, Yichen Wang, Yu-Xiang Wang, Zhiwei Steven Wu

Abstract: We revisit the problem of differentially private squared error linear regression. We observe that existing state-of-the-art methods are sensitive to the choice of hyperparameters -- including the ``clipping threshold'' that cannot be set optimally in a data-independent way. We give a new algorithm for private linear regression based on gradient boosting. We show that our method consistently improv… ▽ More We revisit the problem of differentially private squared error linear regression. We observe that existing state-of-the-art methods are sensitive to the choice of hyperparameters -- including the ``clipping threshold'' that cannot be set optimally in a data-independent way. We give a new algorithm for private linear regression based on gradient boosting. We show that our method consistently improves over the previous state of the art when the clipping threshold is taken to be fixed without knowledge of the data, rather than optimized in a non-private way -- and that even when we optimize the hyperparameters of competitor algorithms non-privately, our algorithm is no worse and often better. In addition to a comprehensive set of experiments, we give theoretical insights to explain this behavior. △ Less

Submitted 20 May, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.08507 [pdf, ps, other]

The Scope of Multicalibration: Characterizing Multicalibration via Property Elicitation

Authors: Georgy Noarov, Aaron Roth

Abstract: We make a connection between multicalibration and property elicitation and show that (under mild technical conditions) it is possible to produce a multicalibrated predictor for a continuous scalar distributional property $Γ$ if and only if $Γ$ is elicitable. On the negative side, we show that for non-elicitable continuous properties there exist simple data distributions on which even the true di… ▽ More We make a connection between multicalibration and property elicitation and show that (under mild technical conditions) it is possible to produce a multicalibrated predictor for a continuous scalar distributional property $Γ$ if and only if $Γ$ is elicitable. On the negative side, we show that for non-elicitable continuous properties there exist simple data distributions on which even the true distributional predictor is not calibrated. On the positive side, for elicitable $Γ$, we give simple canonical algorithms for the batch and the online adversarial setting, that learn a $Γ$-multicalibrated predictor. This generalizes past work on multicalibrated means and quantiles, and in fact strengthens existing online quantile multicalibration results. To further counter-weigh our negative result, we show that if a property $Γ^1$ is not elicitable by itself, but is elicitable conditionally on another elicitable property $Γ^0$, then there is a canonical algorithm that jointly multicalibrates $Γ^1$ and $Γ^0$; this generalizes past work on mean-moment multicalibration. Finally, as applications of our theory, we provide novel algorithmic and impossibility results for fair (multicalibrated) risk assessment. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2301.13767 [pdf, other]

Multicalibration as Boosting for Regression

Authors: Ira Globus-Harris, Declan Harrison, Michael Kearns, Aaron Roth, Jessica Sorrell

Abstract: We study the connection between multicalibration and boosting for squared error regression. First we prove a useful characterization of multicalibration in terms of a ``swap regret'' like condition on squared error. Using this characterization, we give an exceedingly simple algorithm that can be analyzed both as a boosting algorithm for regression and as a multicalibration algorithm for a class H… ▽ More We study the connection between multicalibration and boosting for squared error regression. First we prove a useful characterization of multicalibration in terms of a ``swap regret'' like condition on squared error. Using this characterization, we give an exceedingly simple algorithm that can be analyzed both as a boosting algorithm for regression and as a multicalibration algorithm for a class H that makes use only of a standard squared error regression oracle for H. We give a weak learning assumption on H that ensures convergence to Bayes optimality without the need to make any realizability assumptions -- giving us an agnostic boosting algorithm for regression. We then show that our weak learning assumption on H is both necessary and sufficient for multicalibration with respect to H to imply Bayes optimality. We also show that if H satisfies our weak learning condition relative to another class C then multicalibration with respect to H implies multicalibration with respect to C. Finally we investigate the empirical performance of our algorithm experimentally using an open source implementation that we make available. Our code repository can be found at https://github.com/Declancharrison/Level-Set-Boosting. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Comments: Code available here: https://github.com/Declancharrison/Level-Set-Boosting

arXiv:2212.09428 [pdf, other]

doi 10.1093/mnras/stad1524

CONGRuENTS (COsmic-ray, Neutrino, Gamma-ray and Radio Non-Thermal Spectra). I. A predictive model for galactic non-thermal emission

Authors: Matt A. Roth, Mark R. Krumholz, Roland M. Crocker, Todd A. Thompson

Abstract: The total luminosity and spectral shape of the non-thermal emission produced by cosmic rays depends on their interstellar environment, a dependence that gives rise to correlations between galaxies' bulk properties -- star formation rate, stellar mass, and others -- and their non-thermal spectra. Understanding the physical mechanisms of cosmic ray transport, loss, and emission is key to understandi… ▽ More The total luminosity and spectral shape of the non-thermal emission produced by cosmic rays depends on their interstellar environment, a dependence that gives rise to correlations between galaxies' bulk properties -- star formation rate, stellar mass, and others -- and their non-thermal spectra. Understanding the physical mechanisms of cosmic ray transport, loss, and emission is key to understanding these correlations. Here, in the first paper of the series, we present a new method to compute the non-thermal spectra of star-forming galaxies, and describe an open-source software package -- COsmic-ray, Neutrino, Gamma-ray and Radio Non-Thermal Spectra (CONGRuENTS) -- that implements it. As a crucial innovation, our method requires as input only a galaxy's effective radius, star formation rate, stellar mass, and redshift, all quantities that are readily available for large samples of galaxies and do not require expensive, spatially resolved gas measurements. From these inputs we derive individual, galaxy-by-galaxy models for the background gas and radiation field through which cosmic rays propagate, from which we compute steady state cosmic ray spectra for hadronic and leptonic particles in both the galactic disc and halo by solving the full kinetic equation. We invoke modern models for cosmic ray transport and include all significant emission and loss mechanisms. In this paper we describe the model and validate it against non-thermal emission measured in nearby star-forming galaxies that span four orders of magnitude in star formation rate. △ Less

Submitted 16 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: 23 pages, 14 figures, 1 table, accepted for publication in MNRAS

arXiv:2211.16419 [pdf, other]

doi 10.1016/j.isci.2023.107074

Geographical balancing of wind power decreases storage needs in a 100% renewable European power sector

Authors: Alexander Roth, Wolf-Peter Schill

Abstract: To reduce greenhouse gas emissions, many countries plan to massively expand wind power and solar photovoltaic capacities. These variable renewable energy sources require additional flexibility in the power sector. Both geographical balancing enabled by interconnection and electricity storage can provide such flexibility. In a 100% renewable energy scenario of twelve central European countries, we… ▽ More To reduce greenhouse gas emissions, many countries plan to massively expand wind power and solar photovoltaic capacities. These variable renewable energy sources require additional flexibility in the power sector. Both geographical balancing enabled by interconnection and electricity storage can provide such flexibility. In a 100% renewable energy scenario of twelve central European countries, we investigate how geographical balancing between countries reduces the need for electricity storage. Our principal contribution is to separate and quantify the different factors at play. Applying a capacity expansion model and a factorization method, we disentangle the effect of interconnection on optimal storage capacities through distinct factors: differences in countries' solar PV and wind power availability patterns, load profiles, as well as hydropower and bioenergy capacity portfolios. Results show that interconnection reduces storage needs by around 30% in contrast to a scenario without interconnection. Differences in wind power profiles between countries explain around 80% of that effect. △ Less

Submitted 21 June, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.11596 [pdf, other]

Forecasting Unobserved Node States with spatio-temporal Graph Neural Networks

Authors: Andreas Roth, Thomas Liebig

Abstract: Forecasting future states of sensors is key to solving tasks like weather prediction, route planning, and many others when dealing with networks of sensors. But complete spatial coverage of sensors is generally unavailable and would practically be infeasible due to limitations in budget and other resources during deployment and maintenance. Currently existing approaches using machine learning are… ▽ More Forecasting future states of sensors is key to solving tasks like weather prediction, route planning, and many others when dealing with networks of sensors. But complete spatial coverage of sensors is generally unavailable and would practically be infeasible due to limitations in budget and other resources during deployment and maintenance. Currently existing approaches using machine learning are limited to the spatial locations where data was observed, causing limitations to downstream tasks. Inspired by the recent surge of Graph Neural Networks for spatio-temporal data processing, we investigate whether these can also forecast the state of locations with no sensors available. For this purpose, we develop a framework, named Forecasting Unobserved Node States (FUNS), that allows forecasting the state at entirely unobserved locations based on spatio-temporal correlations and the graph inductive bias. FUNS serves as a blueprint for optimizing models only on observed data and demonstrates good generalization capabilities for predicting the state at entirely unobserved locations during the testing stage. Our framework can be combined with any spatio-temporal Graph Neural Network, that exploits spatio-temporal correlations with surrounding observed locations by using the network's graph structure. Our employed model builds on a previous model by also allowing us to exploit prior knowledge about locations of interest, e.g. the road type. Our empirical evaluation of both simulated and real-world datasets demonstrates that Graph Neural Networks are well-suited for this task. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.03128 [pdf, other]

doi 10.1073/pnas.2218605120

Confidence-Ranked Reconstruction of Census Microdata from Published Statistics

Authors: Travis Dick, Cynthia Dwork, Michael Kearns, Terrance Liu, Aaron Roth, Giuseppe Vietri, Zhiwei Steven Wu

Abstract: A reconstruction attack on a private dataset $D$ takes as input some publicly accessible information about the dataset and produces a list of candidate elements of $D$. We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of $D$ from aggregate query statistics… ▽ More A reconstruction attack on a private dataset $D$ takes as input some publicly accessible information about the dataset and produces a list of candidate elements of $D$. We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of $D$ from aggregate query statistics $Q(D)\in \mathbb{R}^m$, but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset $D$ was sampled, demonstrating that they are exploiting information in the aggregate statistics $Q(D)$, and not simply the overall structure of the distribution. In other words, the queries $Q(D)$ are permitting reconstruction of elements of this dataset, not the distribution from which $D$ was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy. △ Less

Submitted 6 February, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

arXiv:2209.15145 [pdf, other]

Batch Multivalid Conformal Prediction

Authors: Christopher Jung, Georgy Noarov, Ramya Ramalingam, Aaron Roth

Abstract: We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. Multivalid coverage guarantees are stronger than marginal coverage guarantees in two ways: (1) They hold even conditional on group membership -- that is, the target coverage level $1-α$ holds conditionally on membership in each of an arbitrary (potentially… ▽ More We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. Multivalid coverage guarantees are stronger than marginal coverage guarantees in two ways: (1) They hold even conditional on group membership -- that is, the target coverage level $1-α$ holds conditionally on membership in each of an arbitrary (potentially intersecting) group in a finite collection $\mathcal{G}$ of regions in the feature space. (2) They hold even conditional on the value of the threshold used to produce the prediction set on a given example. In fact multivalid coverage guarantees hold even when conditioning on group membership and threshold value simultaneously. We give two algorithms: both take as input an arbitrary non-conformity score and an arbitrary collection of possibly intersecting groups $\mathcal{G}$, and then can equip arbitrary black-box predictors with prediction sets. Our first algorithm (BatchGCP) is a direct extension of quantile regression, needs to solve only a single convex minimization problem, and produces an estimator which has group-conditional guarantees for each group in $\mathcal{G}$. Our second algorithm (BatchMVP) is iterative, and gives the full guarantees of multivalid conformal prediction: prediction sets that are valid conditionally both on group membership and non-conformity threshold. We evaluate the performance of both of our algorithms in an extensive set of experiments. Code to replicate all of our experiments can be found at https://github.com/ProgBelarus/BatchMultivalidConformal △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: Code to replicate all of our experiments can be found at https://github.com/ProgBelarus/BatchMultivalidConformal

arXiv:2209.09079 [pdf, other]

MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation

Authors: Aaron M. Roth, Jing Liang, Ram Sriram, Elham Tabassi, Dinesh Manocha

Abstract: We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that… ▽ More We present Multiple Scenario Verifiable Reinforcement Learning via Policy Extraction (MSVIPER), a new method for policy distillation to decision trees for improved robot navigation. MSVIPER learns an "expert" policy using any Reinforcement Learning (RL) technique involving learning a state-action mapping and then uses imitation learning to learn a decision-tree policy from it. We demonstrate that MSVIPER results in efficient decision trees and can accurately mimic the behavior of the expert policy. Moreover, we present efficient policy distillation and tree-modification techniques that take advantage of the decision tree structure to allow improvements to a policy without retraining. We use our approach to improve the performance of RL-based robot navigation algorithms for indoor and outdoor scenes. We demonstrate the benefits in terms of reduced freezing and oscillation behaviors (by up to 95\% reduction) for mobile robots navigating among dynamic obstacles and reduced vibrations and oscillation (by up to 17\%) for outdoor robot navigation on complex, uneven terrains. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 6 pages main paper, 2 pages of references, 5 page appendix (13 pages total) 5 tables, 9 algorithms, 4 figures

arXiv:2209.07400 [pdf, other]

Private Synthetic Data for Multitask Learning and Marginal Queries

Authors: Giuseppe Vietri, Cedric Archambeau, Sergul Aydore, William Brown, Michael Kearns, Aaron Roth, Ankit Siva, Shuai Tang, Zhiwei Steven Wu

Abstract: We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorica… ▽ More We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: The short version of this paper appears in the proceedings of NeurIPS-22

arXiv:2209.07375 [pdf, other]

Wealth Dynamics Over Generations: Analysis and Interventions

Authors: Krishna Acharya, Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, Juba Ziani

Abstract: We present a stylized model with feedback loops for the evolution of a population's wealth over generations. Individuals have both talent and wealth: talent is a random variable distributed identically for everyone, but wealth is a random variable that is dependent on the population one is born into. Individuals then apply to a downstream agent, which we treat as a university throughout the paper… ▽ More We present a stylized model with feedback loops for the evolution of a population's wealth over generations. Individuals have both talent and wealth: talent is a random variable distributed identically for everyone, but wealth is a random variable that is dependent on the population one is born into. Individuals then apply to a downstream agent, which we treat as a university throughout the paper (but could also represent an employer) who makes a decision about whether to admit them or not. The university does not directly observe talent or wealth, but rather a signal (representing e.g. a standardized test) that is a convex combination of both. The university knows the distributions from which an individual's type and wealth are drawn, and makes its decisions based on the posterior distribution of the applicant's characteristics conditional on their population and signal. Each population's wealth distribution at the next round then depends on the fraction of that population that was admitted by the university at the previous round. We study wealth dynamics in this model, and give conditions under which the dynamics have a single attracting fixed point (which implies population wealth inequality is transitory), and conditions under which it can have multiple attracting fixed points (which implies that population wealth inequality can be persistent). In the case in which there are multiple attracting fixed points, we study interventions aimed at eliminating or mitigating inequality, including increasing the capacity of the university to admit more people, aligning the signal generated by individuals with the preferences of the university, and making direct monetary transfers to the less wealthy population. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.07312 [pdf, other]

Multicalibrated Regression for Downstream Fairness

Authors: Ira Globus-Harris, Varun Gupta, Christopher Jung, Michael Kearns, Jamie Morgenstern, Aaron Roth

Abstract: We show how to take a regression function $\hat{f}$ that is appropriately ``multicalibrated'' and efficiently post-process it into an approximately error minimizing classifier satisfying a large variety of fairness constraints. The post-processing requires no labeled data, and only a modest amount of unlabeled data and computation. The computational and sample complexity requirements of computing… ▽ More We show how to take a regression function $\hat{f}$ that is appropriately ``multicalibrated'' and efficiently post-process it into an approximately error minimizing classifier satisfying a large variety of fairness constraints. The post-processing requires no labeled data, and only a modest amount of unlabeled data and computation. The computational and sample complexity requirements of computing $\hat f$ are comparable to the requirements for solving a single fair learning task optimally, but it can in fact be used to solve many different downstream fairness-constrained learning problems efficiently. Our post-processing method easily handles intersecting groups, generalizing prior work on post-processing regression functions to satisfy fairness constraints that only applied to disjoint groups. Our work extends recent work showing that multicalibrated regression functions are ``omnipredictors'' (i.e. can be post-processed to optimally solve unconstrained ERM problems) to constrained optimization. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.01687 [pdf, ps, other]

Reconciling Individual Probability Forecasts

Authors: Aaron Roth, Alexander Tolbert, Scott Weinstein

Abstract: Individual probabilities refer to the probabilities of outcomes that are realized only once: the probability that it will rain tomorrow, the probability that Alice will die within the next 12 months, the probability that Bob will be arrested for a violent crime in the next 18 months, etc. Individual probabilities are fundamentally unknowable. Nevertheless, we show that two parties who agree on the… ▽ More Individual probabilities refer to the probabilities of outcomes that are realized only once: the probability that it will rain tomorrow, the probability that Alice will die within the next 12 months, the probability that Bob will be arrested for a violent crime in the next 18 months, etc. Individual probabilities are fundamentally unknowable. Nevertheless, we show that two parties who agree on the data -- or on how to sample from a data distribution -- cannot agree to disagree on how to model individual probabilities. This is because any two models of individual probabilities that substantially disagree can together be used to empirically falsify and improve at least one of the two models. This can be efficiently iterated in a process of "reconciliation" that results in models that both parties agree are superior to the models they started with, and which themselves (almost) agree on the forecasts of individual probabilities (almost) everywhere. We conclude that although individual probabilities are unknowable, they are contestable via a computationally and data efficient process that must lead to agreement. Thus we cannot find ourselves in a situation in which we have two equally accurate and unimprovable models that disagree substantially in their predictions -- providing an answer to what is sometimes called the predictive or model multiplicity problem. △ Less

Submitted 6 May, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

Comments: This is the full version of a paper that appears in the proceedings of FAccT 2023: The Sixth Annual ACM Conference on Fairness, Accountability, and Transparency, 2023

Showing 1–50 of 210 results for author: Roth, A