subscribe to arXiv mailings

Reverse-Engineering the Reader

Authors: Samuel Kiegeland, Ethan Gotlieb Wilcox, Afra Amini, David Robert Reich, Ryan Cotterell

Abstract: Numerous previous studies have sought to determine to what extent language models, pretrained on natural language text, can serve as useful models of human cognition. In this paper, we are interested in the opposite question: whether we can directly optimize a language model to be a useful cognitive model by aligning it to human psychometric data. To achieve this, we introduce a novel alignment te… ▽ More Numerous previous studies have sought to determine to what extent language models, pretrained on natural language text, can serve as useful models of human cognition. In this paper, we are interested in the opposite question: whether we can directly optimize a language model to be a useful cognitive model by aligning it to human psychometric data. To achieve this, we introduce a novel alignment technique in which we fine-tune a language model to implicitly optimize the parameters of a linear regressor that directly predicts humans' reading times of in-context linguistic units, e.g., phonemes, morphemes, or words, using surprisal estimates derived from the language model. Using words as a test case, we evaluate our technique across multiple model sizes and datasets and find that it improves language models' psychometric predictive power. However, we find an inverse relationship between psychometric power and a model's performance on downstream NLP tasks as well as its perplexity on held-out test data. While this latter trend has been observed before (Oh et al., 2022; Shain et al., 2024), we are the first to induce it by manipulating a model's alignment to psychometric data. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.12061 [pdf, other]

CrediRAG: Network-Augmented Credibility-Based Retrieval for Misinformation Detection in Reddit

Authors: Ashwin Ram, Yigit Ege Bayiz, Arash Amini, Mustafa Munir, Radu Marculescu

Abstract: Fake news threatens democracy and exacerbates the polarization and divisions in society; therefore, accurately detecting online misinformation is the foundation of addressing this issue. We present CrediRAG, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at… ▽ More Fake news threatens democracy and exacerbates the polarization and divisions in society; therefore, accurately detecting online misinformation is the foundation of addressing this issue. We present CrediRAG, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at scale. CrediRAG uses a news retriever to initially assign a misinformation score to each post based on the source credibility of similar news articles to the post title content. CrediRAG then improves the initial retrieval estimations through a novel weighted post-to-post network connected based on shared commenters and weighted by the average stance of all shared commenters across every pair of posts. We achieve 11% increase in the F1-score in detecting misinformative posts over state-of-the-art methods. Extensive experiments conducted on curated real-world Reddit data of over 200,000 posts demonstrate the superior performance of CrediRAG on existing baselines. Thus, our approach offers a more accurate and scalable solution to combat the spread of fake news across social media platforms. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.01773 [pdf, other]

Towards deep learning sequence-structure co-generation for protein design

Authors: Chentong Wang, Sarah Alamdari, Carles Domingo-Enrich, Ava Amini, Kevin K. Yang

Abstract: Deep generative models that learn from the distribution of natural protein sequences and structures may enable the design of new proteins with valuable functions. While the majority of today's models focus on generating either sequences or structures, emerging co-generation methods promise more accurate and controllable protein design, ideally achieved by modeling both modalities simultaneously. H… ▽ More Deep generative models that learn from the distribution of natural protein sequences and structures may enable the design of new proteins with valuable functions. While the majority of today's models focus on generating either sequences or structures, emerging co-generation methods promise more accurate and controllable protein design, ideally achieved by modeling both modalities simultaneously. Here we review recent advances in deep generative models for protein design, with a particular focus on sequence-structure co-generation methods. We describe the key methodological and evaluation principles underlying these methods, highlight recent advances from the literature, and discuss opportunities for continued development of sequence-structure co-generation approaches. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.20362 [pdf, other]

TwinArray Sort: An Ultrarapid Conditional Non-Comparison Based Sorting Algorithm

Authors: Amin Amini

Abstract: In computer science, sorting algorithms are crucial for data processing and machine learning. Large datasets and high efficiency requirements provide challenges for comparison-based algorithms like Quicksort and Merge sort, which achieve O(n log n) time complexity. Non-comparison-based algorithms like Spreadsort and Counting Sort have memory consumption issues and a relatively high computational d… ▽ More In computer science, sorting algorithms are crucial for data processing and machine learning. Large datasets and high efficiency requirements provide challenges for comparison-based algorithms like Quicksort and Merge sort, which achieve O(n log n) time complexity. Non-comparison-based algorithms like Spreadsort and Counting Sort have memory consumption issues and a relatively high computational demand, even if they can attain linear time complexity under certain circumstances. We present TwinArray Sort, a novel conditional non-comparison-based sorting algorithm that effectively uses array indices. When it comes to worst-case time and space complexities, TwinArray Sort achieves O(n+k). The approach remains efficient under all settings and works well with datasets with randomly sorted, reverse-sorted, or nearly sorted distributions. TwinArray Sort can handle duplicates and optimize memory efficiently since thanks to its two auxiliary arrays for value storage and frequency counting, as well as a conditional distinct array verifier. TwinArray Sort constantly performs better than conventional algorithms, according to experimental assessments and particularly when sorting unique arrays under all data distribution scenarios. The approach is suitable for massive data processing and machine learning dataset management due to its creative use of dual auxiliary arrays and a conditional distinct array verification, which improves memory use and duplication handling. TwinArray Sort overcomes conventional sorting algorithmic constraints by combining cutting-edge methods with non-comparison-based sorting advantages. Its reliable performance in a range of data distributions makes it an adaptable and effective answer for contemporary computing requirements. △ Less

Submitted 30 September, 2024; originally announced September 2024.

arXiv:2408.16329 [pdf]

doi 10.1016/j.micrna.2024.207817

Estimation Enhancing in Optoelectronic Property: A Novel Approach Using Orbital Interaction Parameters and Tight-Binding

Authors: Ali Haji Ebrahim Zargar, Ali Amini, Ahmad Ayatollahi

Abstract: This paper advocates for an innovative approach designed for estimating optoelectronic properties of quantum structures utilizing Tight-Binding (TB) theory. Predicated on the comparative analysis between estimated and actual properties, the study strives to validate the efficacy of this proposed technique; focusing notably on the computation of bandgap energy. It is observed that preceding methodo… ▽ More This paper advocates for an innovative approach designed for estimating optoelectronic properties of quantum structures utilizing Tight-Binding (TB) theory. Predicated on the comparative analysis between estimated and actual properties, the study strives to validate the efficacy of this proposed technique; focusing notably on the computation of bandgap energy. It is observed that preceding methodologies offered a restricted accuracy when predicting complex structures like super-lattices and quantum wells. To address this gap, we propose a methodology involving three distinct phases using orbital interaction parameters (OIPs) and the TB theory. The research employed Aluminium Arsenide (AlAs) and Gallium Arsenide (GaAs) as the primary bulk materials. Our novel approach introduces a computation framework that first focuses on bulk computation, subsequently expanding to super-lattice structures. The findings of this research demonstrate promising results regarding the accuracy of predicated optoelectronic properties, particularly the cut-off wavelength. This study paves the way for future research, potentially enhancing the precision of the proposed methodology and its application scope within the field of quantum optoelectronics. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: This paper is published in the journal of Micro and Nanostructures

Journal ref: Volume 189, May 2024, 207817

arXiv:2407.19567 [pdf, ps, other]

Sharp Bounds for Poly-GNNs and the Effect of Graph Noise

Authors: Luciano Vinas, Arash A. Amini

Abstract: We investigate the classification performance of graph neural networks with graph-polynomial features, poly-GNNs, on the problem of semi-supervised node classification. We analyze poly-GNNs under a general contextual stochastic block model (CSBM) by providing a sharp characterization of the rate of separation between classes in their output node representations. A question of interest is whether t… ▽ More We investigate the classification performance of graph neural networks with graph-polynomial features, poly-GNNs, on the problem of semi-supervised node classification. We analyze poly-GNNs under a general contextual stochastic block model (CSBM) by providing a sharp characterization of the rate of separation between classes in their output node representations. A question of interest is whether this rate depends on the depth of the network $k$, i.e., whether deeper networks can achieve a faster separation? We provide a negative answer to this question: for a sufficiently large graph, a depth $k > 1$ poly-GNN exhibits the same rate of separation as a depth $k=1$ counterpart. Our analysis highlights and quantifies the impact of ``graph noise'' in deep GNNs and shows how noise in the graph structure can dominate other sources of signal in the graph, negating any benefit further aggregation provides. Our analysis also reveals subtle differences between even and odd-layered GNNs in how the feature noise propagates. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.06057 [pdf, other]

Variational Best-of-N Alignment

Authors: Afra Amini, Tim Vieira, Ryan Cotterell

Abstract: Best-of-N (BoN) is a popular and effective algorithm for aligning language models to human preferences. The algorithm works as follows: at inference time, N samples are drawn from the language model, and the sample with the highest reward, as judged by a reward model, is returned as the output. Despite its effectiveness, BoN is computationally expensive; it reduces sampling throughput by a factor… ▽ More Best-of-N (BoN) is a popular and effective algorithm for aligning language models to human preferences. The algorithm works as follows: at inference time, N samples are drawn from the language model, and the sample with the highest reward, as judged by a reward model, is returned as the output. Despite its effectiveness, BoN is computationally expensive; it reduces sampling throughput by a factor of N. To make BoN more efficient at inference time, one strategy is to fine-tune the language model to mimic what BoN does during inference. To achieve this, we derive the distribution induced by the BoN algorithm. We then propose to fine-tune the language model to minimize backward KL divergence to the BoN distribution. Our approach is analogous to mean-field variational inference and, thus, we term it variational BoN (vBoN). To the extent this fine-tuning is successful and we end up with a good approximation, we have reduced the inference cost by a factor of N. Our experiments on a controlled generation task suggest that while variational BoN is not as effective as BoN in aligning language models, it is close to BoN performance as vBoN appears more often on the Pareto frontier of reward and KL divergence compared to models trained with KL-constrained RL objective. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.15149 [pdf, other]

Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

Authors: Alex Quach, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus

Abstract: Simulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and… ▽ More Simulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks. To this end, we first build a simulator by integrating Gaussian Splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks. In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, crafty programming of expert demonstration training data, and the task understanding capabilities of Liquid networks. Through a series of quantitative flight tests, we demonstrate the robust transfer of navigation skills learned in a single simulation scene directly to the real world. We further show the ability to maintain performance beyond the training environment under drastic distribution and physical environment changes. Our learned Liquid policies, trained on single target manoeuvres curated from a photorealistic simulated indoor flight only, generalize to multi-step hikes onboard a real hardware platform outdoors. △ Less

Submitted 16 October, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

MSC Class: 68T40; 68U20; 93C85 ACM Class: I.2.9; I.2.6

arXiv:2406.13121 [pdf, other]

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-end modeling that minimizes cascading errors in complex pipelines, and allows for the application of sophisticated prompting techniques across the entire system. To assess this paradigm shift, we introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks. However, LCLMs still face challenges in areas like compositional reasoning that are required in SQL-like tasks. Notably, prompting strategies significantly influence performance, emphasizing the need for continued research as context lengths grow. Overall, LOFT provides a rigorous testing ground for LCLMs, showcasing their potential to supplant existing paradigms and tackle novel tasks as model capabilities scale. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

arXiv:2406.10686 [pdf, other]

Graph Neural Thompson Sampling

Authors: Shuang Wu, Arash A. Amini

Abstract: We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for unce… ▽ More We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde{d}$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems. △ Less

Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.06349 [pdf, other]

ARMA Processes with Discrete-Continuous Excitation: Compressibility Beyond Sparsity

Authors: Mohammad-Amin Charusaie, Stefano Rini, Arash Amini

Abstract: Rényi Information Dimension (RID) plays a central role in quantifying the compressibility of random variables with singularities in their distribution, encompassing and extending beyond the class of sparse sources. The RID, from a high perspective, presents the average number of bits that is needed for coding the i.i.d. samples of a random variable with high precision. There are two main extension… ▽ More Rényi Information Dimension (RID) plays a central role in quantifying the compressibility of random variables with singularities in their distribution, encompassing and extending beyond the class of sparse sources. The RID, from a high perspective, presents the average number of bits that is needed for coding the i.i.d. samples of a random variable with high precision. There are two main extensions of the RID for stochastic processes: information dimension rate (IDR) and block information dimension (BID). In addition, a more recent approach towards the compressibility of stochastic processes revolves around the concept of $ε$-achievable compression rates, which treat a random process as the limiting point of finite-dimensional random vectors and apply the compressed sensing tools on these random variables. While there is limited knowledge about the interplay of the the BID, the IDR, and $ε$-achievable compression rates, the value of IDR and BID themselves are known only for very specific types of processes, namely i.i.d. sequences (i.e., discrete-domain white noise) and moving-average (MA) processes. This paper investigates the IDR and BID of discrete-time Auto-Regressive Moving-Average (ARMA) processes in general, and their relations with $ε$-achievable compression rates when the excitation noise has a discrete-continuous measure. To elaborate, this paper shows that the RID and $ε$-achievable compression rates of this type of processes are equal to that of their excitation noise. In other words, the samples of such ARMA processes can be compressed as much as their sparse excitation noise, although the samples themselves are by no means sparse. The results of this paper can be used to evaluate the compressibility of various types of locally correlated data with finite- or infinite-memory as they are often modelled via ARMA processes. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06014 [pdf, other]

Network two-sample test for block models

Authors: Chung Kyong Nguen, Oscar Hernan Madrid Padilla, Arash A. Amini

Abstract: We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same stochastic model. Assuming no vertex correspondence and allowing for different numbers of nodes, we address a fundamental network testing problem that goes beyond simple adjacency matrix comparisons. We adopt the stochastic block model (SBM) for network dist… ▽ More We consider the two-sample testing problem for networks, where the goal is to determine whether two sets of networks originated from the same stochastic model. Assuming no vertex correspondence and allowing for different numbers of nodes, we address a fundamental network testing problem that goes beyond simple adjacency matrix comparisons. We adopt the stochastic block model (SBM) for network distributions, due to their interpretability and the potential to approximate more general models. The lack of meaningful node labels and vertex correspondence translate to a graph matching challenge when developing a test for SBMs. We introduce an efficient algorithm to match estimated network parameters, allowing us to properly combine and contrast information within and across samples, leading to a powerful test. We show that the matching algorithm, and the overall test are consistent, under mild conditions on the sparsity of the networks and the sample sizes, and derive a chi-squared asymptotic null distribution for the test. Through a mixture of theoretical insights and empirical validations, including experiments with both synthetic and real-world data, this study advances robust statistical inference for complex network data. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.07890 [pdf, other]

Subspace-Informed Matrix Completion

Authors: Hamideh. Sadat Fazael Ardakani, Sajad Daei, Arash Amini, Mikael Skoglund, Gabor Fodor

Abstract: In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior inf… ▽ More In this work, we consider the matrix completion problem, where the objective is to reconstruct a low-rank matrix from a few observed entries. A commonly employed approach involves nuclear norm minimization. For this method to succeed, the number of observed entries needs to scale at least proportional to both the rank of the ground-truth matrix and the coherence parameter. While the only prior information is oftentimes the low-rank nature of the ground-truth matrix, in various real-world scenarios, additional knowledge about the ground-truth low-rank matrix is available. For instance, in collaborative filtering, Netflix problem, and dynamic channel estimation in wireless communications, we have partial or full knowledge about the signal subspace in advance. Specifically, we are aware of some subspaces that form multiple angles with the column and row spaces of the ground-truth matrix. Leveraging this valuable information has the potential to significantly reduce the required number of observations. To this end, we introduce a multi-weight nuclear norm optimization problem that concurrently promotes the low-rank property as well the information about the available subspaces. The proposed weights are tailored to penalize each angle corresponding to each basis of the prior subspace independently. We further propose an optimal weight selection strategy by minimizing the coherence parameter of the ground-truth matrix, which is equivalent to minimizing the required number of observations. Simulation results validate the advantages of incorporating multiple weights in the completion procedure. Specifically, our proposed multi-weight optimization problem demonstrates a substantial reduction in the required number of observations compared to the state-of-the-art methods. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2111.00235

arXiv:2404.01924 [pdf, other]

Toward Efficient Visual Gyroscopes: Spherical Moments, Harmonics Filtering, and Masking Techniques for Spherical Camera Applications

Authors: Yao Du, Carlos M. Mateo, Mirjana Maras, Tsun-Hsuan Wang, Marc Blanchon, Alexander Amini, Daniela Rus, Omar Tahri

Abstract: Unlike a traditional gyroscope, a visual gyroscope estimates camera rotation through images. The integration of omnidirectional cameras, offering a larger field of view compared to traditional RGB cameras, has proven to yield more accurate and robust results. However, challenges arise in situations that lack features, have substantial noise causing significant errors, and where certain features in… ▽ More Unlike a traditional gyroscope, a visual gyroscope estimates camera rotation through images. The integration of omnidirectional cameras, offering a larger field of view compared to traditional RGB cameras, has proven to yield more accurate and robust results. However, challenges arise in situations that lack features, have substantial noise causing significant errors, and where certain features in the images lack sufficient strength, leading to less precise prediction results. Here, we address these challenges by introducing a novel visual gyroscope, which combines an Efficient Multi-Mask-Filter Rotation Estimator(EMMFRE) and a Learning based optimization(LbTO) to provide a more efficient and accurate rotation estimation from spherical images. Experimental results demonstrate superior performance of the proposed approach in terms of accuracy. The paper emphasizes the advantages of integrating machine learning to optimize analytical solutions, discusses limitations, and suggests directions for future research. △ Less

Submitted 23 September, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Submitted to 2025 IEEE International Conference on Robotics and Automation (ICRA 2025)

arXiv:2404.01750 [pdf, other]

Exploring Latent Pathways: Enhancing the Interpretability of Autonomous Driving with a Variational Autoencoder

Authors: Anass Bairouk, Mirjana Maras, Simon Herlin, Alexander Amini, Marc Blanchon, Ramin Hasani, Patrick Chareyre, Daniela Rus

Abstract: Autonomous driving presents a complex challenge, which is usually addressed with artificial intelligence models that are end-to-end or modular in nature. Within the landscape of modular approaches, a bio-inspired neural circuit policy model has emerged as an innovative control module, offering a compact and inherently interpretable system to infer a steering wheel command from abstract visual feat… ▽ More Autonomous driving presents a complex challenge, which is usually addressed with artificial intelligence models that are end-to-end or modular in nature. Within the landscape of modular approaches, a bio-inspired neural circuit policy model has emerged as an innovative control module, offering a compact and inherently interpretable system to infer a steering wheel command from abstract visual features. Here, we take a leap forward by integrating a variational autoencoder with the neural circuit policy controller, forming a solution that directly generates steering commands from input camera images. By substituting the traditional convolutional neural network approach to feature extraction with a variational autoencoder, we enhance the system's interpretability, enabling a more transparent and understandable decision-making process. In addition to the architectural shift toward a variational autoencoder, this study introduces the automatic latent perturbation tool, a novel contribution designed to probe and elucidate the latent features within the variational autoencoder. The automatic latent perturbation tool automates the interpretability process, offering granular insights into how specific latent variables influence the overall model's behavior. Through a series of numerical experiments, we demonstrate the interpretative power of the variational autoencoder-neural circuit policy model and the utility of the automatic latent perturbation tool in making the inner workings of autonomous driving systems more transparent. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2403.17240 [pdf, other]

The Role of $n$-gram Smoothing in the Age of Neural Networks

Authors: Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell

Abstract: For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an… ▽ More For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an understatement to suggest that the line of inquiry into $n$-gram smoothing techniques became dormant. This paper re-opens the role classical $n$-gram smoothing techniques may play in the age of neural language models. First, we draw a formal equivalence between label smoothing, a popular regularization technique for neural language models, and add-$λ$ smoothing. Second, we derive a generalized framework for converting any $n$-gram smoothing technique into a regularizer compatible with neural language models. Our empirical results find that our novel regularizers are comparable to and, indeed, sometimes outperform label smoothing on language modeling and machine translation. △ Less

Submitted 30 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: NAACL 2024

arXiv:2403.12279 [pdf, other]

Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

Authors: Vivek Pandey, Arash Amini, Guangyi Liu, Ufuk Topcu, Qiyu Sun, Kostas Daniilidis, Nader Motee

Abstract: We address the problem of sparse selection of visual features for localizing a team of robots navigating an unknown environment, where robots can exchange relative position measurements with neighbors. We select a set of the most informative features by anticipating their importance in robots localization by simulating trajectories of robots over a prediction horizon. Through theoretical proofs, w… ▽ More We address the problem of sparse selection of visual features for localizing a team of robots navigating an unknown environment, where robots can exchange relative position measurements with neighbors. We select a set of the most informative features by anticipating their importance in robots localization by simulating trajectories of robots over a prediction horizon. Through theoretical proofs, we establish a crucial connection between graph Laplacian and the importance of features. We show that strong network connectivity translates to uniformity in feature importance, which enables uniform random sampling of features and reduces the overall computational complexity. We leverage a scalable randomized algorithm for sparse sums of positive semidefinite matrices to efficiently select the set of the most informative features and significantly improve the probabilistic performance bounds. Finally, we support our findings with extensive simulations. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10705 [pdf, other]

Susceptibility of Communities against Low-Credibility Content in Social News Websites

Authors: Yigit Ege Bayiz, Arash Amini, Radu Marculescu, Ufuk Topcu

Abstract: Social news websites, such as Reddit, have evolved into prominent platforms for sharing and discussing news. A key issue on social news websites sites is the formation of echo chambers, which often lead to the spread of highly biased or uncredible news. We develop a method to identify communities within a social news website that are prone to uncredible or highly biased news. We employ a user embe… ▽ More Social news websites, such as Reddit, have evolved into prominent platforms for sharing and discussing news. A key issue on social news websites sites is the formation of echo chambers, which often lead to the spread of highly biased or uncredible news. We develop a method to identify communities within a social news website that are prone to uncredible or highly biased news. We employ a user embedding pipeline that detects user communities based on their stances towards posts and news sources. We then project each community onto a credibility-bias space and analyze the distributional characteristics of each projected community to identify those that have a high risk of adopting beliefs with low credibility or high bias. This approach also enables the prediction of individual users' susceptibility to low credibility content, based on their community affiliation. Our experiments show that latent space clusters effectively indicate the credibility and bias levels of their users, with significant differences observed across clusters -- a $34\%$ difference in the users' susceptibility to low-credibility content and a $8.3\%$ difference in the users' susceptibility to high political bias. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 11 pages, 2 figures, Under review in ICWSM 2024

arXiv:2402.10938 [pdf, other]

News Source Credibility Assessment: A Reddit Case Study

Authors: Arash Amini, Yigit Ege Bayiz, Ashwin Ram, Radu Marculescu, Ufuk Topcu

Abstract: In the era of social media platforms, identifying the credibility of online content is crucial to combat misinformation. We present the CREDiBERT (CREDibility assessment using Bi-directional Encoder Representations from Transformers), a source credibility assessment model fine-tuned for Reddit submissions focusing on political discourse as the main contribution. We adopt a semi-supervised training… ▽ More In the era of social media platforms, identifying the credibility of online content is crucial to combat misinformation. We present the CREDiBERT (CREDibility assessment using Bi-directional Encoder Representations from Transformers), a source credibility assessment model fine-tuned for Reddit submissions focusing on political discourse as the main contribution. We adopt a semi-supervised training approach for CREDiBERT, leveraging Reddit's community-based structure. By encoding submission content using CREDiBERT and integrating it into a Siamese neural network, we significantly improve the binary classification of submission credibility, achieving a 9% increase in F1 score compared to existing methods. Additionally, we introduce a new version of the post-to-post network in Reddit that efficiently encodes user interactions to enhance the binary classification task by nearly 8% in F1 score. Finally, we employ CREDiBERT to evaluate the susceptibility of subreddits with respect to different topics. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 12 pages; 3 figures

arXiv:2402.10571 [pdf, other]

Direct Preference Optimization with an Offset

Authors: Afra Amini, Tim Vieira, Ryan Cotterell

Abstract: Direct preference optimization (DPO) is a successful fine-tuning strategy for aligning large language models with human preferences without the need to train a reward model or employ reinforcement learning. DPO, as originally formulated, relies on binary preference data and fine-tunes a language model to increase the likelihood of a preferred response over a dispreferred response. However, not all… ▽ More Direct preference optimization (DPO) is a successful fine-tuning strategy for aligning large language models with human preferences without the need to train a reward model or employ reinforcement learning. DPO, as originally formulated, relies on binary preference data and fine-tunes a language model to increase the likelihood of a preferred response over a dispreferred response. However, not all preference pairs are equal. Sometimes, the preferred response is only slightly better than the dispreferred one. In other cases, the preference is much stronger. For instance, if a response contains harmful or toxic content, the annotator will have a strong preference for that response. In this paper, we propose a generalization of DPO, termed DPO with an offset (ODPO), that does not treat every preference pair equally during fine-tuning. Intuitively, ODPO requires the difference between the likelihood of the preferred and dispreferred response to be greater than an offset value. The offset is determined based on the extent to which one response is preferred over another. Our experiments on various tasks suggest that ODPO significantly outperforms DPO in aligning language models, especially when the number of preference pairs is limited. △ Less

Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2401.07742 [pdf, ps, other]

On pure subrings of sp-groups

Authors: A. Amini, B. Amini, E. Momtahan

Abstract: Let $G$ be a sp-group such that for every prime $p$, $G_p$ is elementary. %$\oplus \End_{\zz}(G_p) \leq \End_{\zz}(G) \leq \prod \End_{\zz}(G_p)$. Suppose that $\frac{G}{\oplus_{p\in \mathbb{P}} G_p}$ is torsion-free divisible. %In this article we characterize pure subrings of $\prod_{p\in \mathbb{P}} \End(G_p)$. We show that $\End_{\zz}(G)$ is a sp-group and every subring $R$ of… ▽ More Let $G$ be a sp-group such that for every prime $p$, $G_p$ is elementary. %$\oplus \End_{\zz}(G_p) \leq \End_{\zz}(G) \leq \prod \End_{\zz}(G_p)$. Suppose that $\frac{G}{\oplus_{p\in \mathbb{P}} G_p}$ is torsion-free divisible. %In this article we characterize pure subrings of $\prod_{p\in \mathbb{P}} \End(G_p)$. We show that $\End_{\zz}(G)$ is a sp-group and every subring $R$ of $\prod \End_{\zz}(G_p)$, containing $\oplus \End_{\zz}(G_p)$ is pure if and only if $R=\mathbb{M}_T=\{x\in \prod_{p\in \mathbb{P}}\End(G_p) \;|\; \exists k\in \nn \;\mbox{\rm{such that}} \;\; kx \in T \},$ where $T$ is a subring of $\prod_{p\in \mathbb{P}}\End(G_p)$. We observe that $\frac{\mathbb{M}_T}{\oplus_{p\in \mathbb{P}}\End(G_p)}$ is (ring) isomorphic with $T\otimes_{\zz} \qq$. Moreover, we conclude that a significant number of the examples around the topic can be easily obtained and described by choosing an appropriate subring $T$. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.05857

Secure Dynamic Event-triggered Consensus Under Asynchronous Denial of Service

Authors: Ali Azarbahram, Amir Amini

Abstract: This article proposes a secure implementation for consensus using a dynamic event-triggered (DET) communication scheme in high-order nonlinear multi-agent systems (MAS) under asynchronous (distributed) denial of service (DoS) attacks. By introducing a linear auxiliary trajectory of the system, the DET data transmission scheme among the neighboring agents is employed to reduce the communication for… ▽ More This article proposes a secure implementation for consensus using a dynamic event-triggered (DET) communication scheme in high-order nonlinear multi-agent systems (MAS) under asynchronous (distributed) denial of service (DoS) attacks. By introducing a linear auxiliary trajectory of the system, the DET data transmission scheme among the neighboring agents is employed to reduce the communication for each agent. The asynchronous DoS attacks can block each communication channel among the cooperative agents independently in an unknown pattern. To guarantee state consensus of auxiliary MAS under DoS, a linear matrix inequality (LMI) based optimization approach is proposed which simultaneously designs all the unknown DET communication parameters as well as the state feedback control gain. In addition to asynchronous DoS attacks over the graph topology, the destructive effects of independent DoS attacks over the communication links between actual and auxiliary states are compensated as an additional layer of resiliency for the system. The output of each agent ultimately tracks the auxiliary state of the system and this results in the output consensus. △ Less

Submitted 26 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: This work needs to be revised fundamentally with a greater emphasis on the nonlinear dynamics and the destructive effects of independent DoS attacks over the communication links between actual and auxiliary states

arXiv:2312.17710 [pdf, other]

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

Authors: Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell

Abstract: Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the pr… ▽ More Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the problem of designing text samplers that are faithful, meaning that they have the target text distribution as its limiting distribution. We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly, and study their theoretical properties. Through experiments on various forms of text generation, we demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2312.16940 [pdf, other]

Joint Signal Recovery and Graph Learning from Incomplete Time-Series

Authors: Amirhossein Javaheri, Arash Amini, Farokh Marvasti, Daniel P. Palomar

Abstract: Learning a graph from data is the key to taking advantage of graph signal processing tools. Most of the conventional algorithms for graph learning require complete data statistics, which might not be available in some scenarios. In this work, we aim to learn a graph from incomplete time-series observations. From another viewpoint, we consider the problem of semi-blind recovery of time-varying grap… ▽ More Learning a graph from data is the key to taking advantage of graph signal processing tools. Most of the conventional algorithms for graph learning require complete data statistics, which might not be available in some scenarios. In this work, we aim to learn a graph from incomplete time-series observations. From another viewpoint, we consider the problem of semi-blind recovery of time-varying graph signals where the underlying graph model is unknown. We propose an algorithm based on the method of block successive upperbound minimization (BSUM), for simultaneous inference of the signal and the graph from incomplete data. Simulation results on synthetic and real time-series demonstrate the performance of the proposed method for graph learning and signal recovery. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.15451 [pdf, other]

Uncertainty-aware Language Modeling for Selective Question Answering

Authors: Qi Yang, Shreya Ravikumar, Fynn Schmitt-Ulms, Satvik Lolla, Ege Demir, Iaroslav Elistratov, Alex Lavaee, Sadhana Lolla, Elaheh Ahmadi, Daniela Rus, Alexander Amini, Alejandro Perez

Abstract: We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possibl… ▽ More We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possible while maintaining a given accuracy, forgoing providing predictions when necessary. As part of our results, we test BERT and Llama 2 model variants on the SQuAD extractive QA task and the TruthfulQA generative QA task. We show that using the uncertainty estimates provided by our approach to selectively answer questions leads to significantly higher accuracy over directly using model probabilities. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.05756 [pdf, other]

Step and Smooth Decompositions as Topological Clustering

Authors: Luciano Vinas, Arash A. Amini

Abstract: We investigate a class of recovery problems for which observations are a noisy combination of continuous and step functions. These problems can be seen as non-injective instances of non-linear ICA with direct applications to image decontamination for magnetic resonance imaging. Alternately, the problem can be viewed as clustering in the presence of structured (smooth) contaminant. We show that a g… ▽ More We investigate a class of recovery problems for which observations are a noisy combination of continuous and step functions. These problems can be seen as non-injective instances of non-linear ICA with direct applications to image decontamination for magnetic resonance imaging. Alternately, the problem can be viewed as clustering in the presence of structured (smooth) contaminant. We show that a global topological property (graph connectivity) interacts with a local property (the degree of smoothness of the continuous component) to determine conditions under which the components are identifiable. Additionally, a practical estimation algorithm is provided for the case when the contaminant lies in a reproducing kernel Hilbert space of continuous functions. Algorithm effectiveness is demonstrated through a series of simulations and real-world studies. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.05003 [pdf, ps, other]

Harmonic Retrieval Using Weighted Lifted-Structure Low-Rank Matrix Completion

Authors: Mohammad Bokaei, Saeed Razavikia, Stefano Rini, Arash Amini, Hamid Behrouzi

Abstract: In this paper, we investigate the problem of recovering the frequency components of a mixture of $K$ complex sinusoids from a random subset of $N$ equally-spaced time-domain samples. Because of the random subset, the samples are effectively non-uniform. Besides, the frequency values of each of the $K$ complex sinusoids are assumed to vary continuously within a given range. For this problem, we p… ▽ More In this paper, we investigate the problem of recovering the frequency components of a mixture of $K$ complex sinusoids from a random subset of $N$ equally-spaced time-domain samples. Because of the random subset, the samples are effectively non-uniform. Besides, the frequency values of each of the $K$ complex sinusoids are assumed to vary continuously within a given range. For this problem, we propose a two-step strategy: (i) we first lift the incomplete set of uniform samples (unavailable samples are treated as missing data) into a structured matrix with missing entries, which is potentially low-rank; then (ii) we complete the matrix using a weighted nuclear minimization problem. We call the method a \emph{ weighted lifted-structured (WLi) low-rank matrix recovery}. Our approach can be applied to a range of matrix structures such as Hankel and double-Hankel, among others, and provides improvement over the unweighted existing schemes such as EMaC and DEMaC. We provide theoretical guarantees for the proposed method, as well as numerical simulations in both noiseless and noisy settings. Both the theoretical and the numerical results confirm the superiority of the proposed approach. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.17642 [pdf, other]

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Authors: Tsun-Hsuan Wang, Alaa Maalouf, Wei Xiao, Yutong Ban, Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus

Abstract: As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundation… ▽ More As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems, enabling out-of-distribution, end-to-end, multimodal, and more explainable autonomy. Specifically, we present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. To do so, we introduce a method to extract nuanced spatial (pixel/patch-aligned) features from transformers to enable the encapsulation of both spatial and semantic features. Our approach (i) demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations, and (ii) allows the incorporation of latent space simulation (via text) for improved training (data augmentation via text) and policy debugging. We encourage the reader to check our explainer video at https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be and to view the code and demos on our project webpage at https://drive-anywhere.github.io/. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Project webpage: https://drive-anywhere.github.io Explainer video: https://www.youtube.com/watch?v=4n-DJf8vXxo&feature=youtu.be

arXiv:2310.12021 [pdf, other]

Data-Driven Distributionally Robust Mitigation of Risk of Cascading Failures

Authors: Guangyi Liu, Arash Amini, Vivek Pandey, Nader Motee

Abstract: We introduce a novel data-driven method to mitigate the risk of cascading failures in delayed discrete-time Linear Time-Invariant (LTI) systems. Our approach involves formulating a distributionally robust finite-horizon optimal control problem, where the objective is to minimize a given performance function while satisfying a set of distributionally chances constraints on cascading failures, which… ▽ More We introduce a novel data-driven method to mitigate the risk of cascading failures in delayed discrete-time Linear Time-Invariant (LTI) systems. Our approach involves formulating a distributionally robust finite-horizon optimal control problem, where the objective is to minimize a given performance function while satisfying a set of distributionally chances constraints on cascading failures, which accounts for the impact of a known sequence of failures that can be characterized using nested sets. The optimal control problem becomes challenging as the risk of cascading failures and input time-delay poses limitations on the set of feasible control inputs. However, by solving the convex formulation of the distributionally robust model predictive control (DRMPC) problem, the proposed approach is able to keep the system from cascading failures while maintaining the system's performance with delayed control input, which has important implications for designing and operating complex engineering systems, where cascading failures can severely affect system performance, safety, and reliability. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.11276 [pdf, other]

Video Super-Resolution Using a Grouped Residual in Residual Network

Authors: MohammadHossein Ashoori, Arash Amini

Abstract: Super-resolution (SR) is the technique of increasing the nominal resolution of image / video content accompanied with quality improvement. Video super-resolution (VSR) can be considered as the generalization of single image super-resolution (SISR). This generalization should be such that more detail is created in the output using adjacent input frames. In this paper, we propose a grouped residual… ▽ More Super-resolution (SR) is the technique of increasing the nominal resolution of image / video content accompanied with quality improvement. Video super-resolution (VSR) can be considered as the generalization of single image super-resolution (SISR). This generalization should be such that more detail is created in the output using adjacent input frames. In this paper, we propose a grouped residual in residual network (GRRN) for VSR. By adjusting the hyperparameters of the proposed structure, we train three networks with different numbers of parameters and compare their quantitative and qualitative results with the existing methods. Although based on some quantitative criteria, GRRN does not provide better results than the existing methods, in terms of the quality of the output image it has acceptable performance. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.05250 [pdf, other]

Simplifying GNN Performance with Low Rank Kernel Models

Authors: Luciano Vinas, Arash A. Amini

Abstract: We revisit recent spectral GNN approaches to semi-supervised node classification (SSNC). We posit that many of the current GNN architectures may be over-engineered. Instead, simpler, traditional methods from nonparametric estimation, applied in the spectral domain, could replace many deep-learning inspired GNN designs. These conventional techniques appear to be well suited for a variety of graph t… ▽ More We revisit recent spectral GNN approaches to semi-supervised node classification (SSNC). We posit that many of the current GNN architectures may be over-engineered. Instead, simpler, traditional methods from nonparametric estimation, applied in the spectral domain, could replace many deep-learning inspired GNN designs. These conventional techniques appear to be well suited for a variety of graph types reaching state-of-the-art performance on many of the common SSNC benchmarks. Additionally, we show that recent performance improvements in GNN approaches may be partially attributed to shifts in evaluation conventions. Lastly, an ablative study is conducted on the various hyperparameters associated with GNN spectral filtering techniques. Code available at: https://github.com/lucianoAvinas/lowrank-gnn-kernels △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.02932 [pdf, other]

Assessing Large Language Models on Climate Information

Authors: Jannis Bulian, Mike S. Schäfer, Afra Amini, Heidi Lam, Massimiliano Ciaramita, Ben Gaiarin, Michelle Chen Hübscher, Christian Buck, Niels G. Mede, Markus Leippold, Nadine Strauß

Abstract: As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM genera… ▽ More As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication. △ Less

Submitted 28 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

arXiv:2309.04922 [pdf, other]

Quantification of Distributionally Robust Risk of Cascade of Failures in Platoon of Vehicles

Authors: Vivek Pandey, Guangyi Liu, Arash Amini, Nader Motee

Abstract: Achieving safety is a critical aspect of attaining autonomy in a platoon of autonomous vehicles. In this paper, we propose a distributionally robust risk framework to investigate cascading failures in platoons. To examine the impact of network connectivity and system dynamics on the emergence of cascading failures, we consider a time-delayed network model of the platoon of vehicles as a benchmark.… ▽ More Achieving safety is a critical aspect of attaining autonomy in a platoon of autonomous vehicles. In this paper, we propose a distributionally robust risk framework to investigate cascading failures in platoons. To examine the impact of network connectivity and system dynamics on the emergence of cascading failures, we consider a time-delayed network model of the platoon of vehicles as a benchmark. To study the cascading effects among pairs of vehicles in the platoon, we use the measure of conditional distributionally robust functional. We extend the risk framework to quantify cascading failures by utilizing a bi-variate normal distribution. Our work establishes closed-form risk formulas that illustrate the effects of time-delay, noise statistics, underlying communication graph, and sets of soft failures. The insights gained from our research can be applied to design safe platoons that are robust to the risk of cascading failures. We validate our results through extensive simulations. △ Less

Submitted 9 September, 2023; originally announced September 2023.

arXiv:2308.00231 [pdf, other]

Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks

Authors: Sadhana Lolla, Iaroslav Elistratov, Alejandro Perez, Elaheh Ahmadi, Daniela Rus, Alexander Amini

Abstract: The modern pervasiveness of large-scale deep neural networks (NNs) is driven by their extraordinary performance on complex problems but is also plagued by their sudden, unexpected, and often catastrophic failures, particularly on challenging scenarios. Existing algorithms that provide risk-awareness to NNs are complex and ad-hoc. Specifically, these methods require significant engineering changes,… ▽ More The modern pervasiveness of large-scale deep neural networks (NNs) is driven by their extraordinary performance on complex problems but is also plagued by their sudden, unexpected, and often catastrophic failures, particularly on challenging scenarios. Existing algorithms that provide risk-awareness to NNs are complex and ad-hoc. Specifically, these methods require significant engineering changes, are often developed only for particular settings, and are not easily composable. Here we present capsa, a framework for extending models with risk-awareness. Capsa provides a methodology for quantifying multiple forms of risk and composing different algorithms together to quantify different risk metrics in parallel. We validate capsa by implementing state-of-the-art uncertainty estimation algorithms within the capsa framework and benchmarking them on complex perception datasets. We demonstrate capsa's ability to easily compose aleatoric uncertainty, epistemic uncertainty, and bias estimation together in a single procedure, and show how this approach provides a comprehensive awareness of NN risk. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: Neural Information Processing Systems (NeurIPS) 2022. Workshop on Machine Learning for Autonomous Driving (ML4AD)

Journal ref: Neural Information Processing Systems (NeurIPS) 2022. Workshop on Machine Learning for Autonomous Driving (ML4AD)

arXiv:2307.13503 [pdf, other]

Continuous Time Evidential Distributions for Irregular Time Series

Authors: Taylor W. Killian, Haoran Zhang, Thomas Hartvigsen, Ava P. Amini

Abstract: Prevalent in many real-world settings such as healthcare, irregular time series are challenging to formulate predictions from. It is difficult to infer the value of a feature at any given time when observations are sporadic, as it could take on a range of values depending on when it was last observed. To characterize this uncertainty we present EDICT, a strategy that learns an evidential distribut… ▽ More Prevalent in many real-world settings such as healthcare, irregular time series are challenging to formulate predictions from. It is difficult to infer the value of a feature at any given time when observations are sporadic, as it could take on a range of values depending on when it was last observed. To characterize this uncertainty we present EDICT, a strategy that learns an evidential distribution over irregular time series in continuous time. This distribution enables well-calibrated and flexible inference of partially observed features at any time of interest, while expanding uncertainty temporally for sparse, irregular observations. We demonstrate that EDICT attains competitive performance on challenging time series classification tasks and enabling uncertainty-guided inference when encountering noisy data. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: ICML 2023 Workshop on Interpretable Machine Learning in Healthcare. Code is available at https://github.com/twkillian/EDICT

arXiv:2307.11550 [pdf, other]

YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation

Authors: Arul Selvam Periyasamy, Arash Amini, Vladimir Tsaturyan, Sven Behnke

Abstract: 6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-atte… ▽ More 6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression and an improved variant of the YOLOPose model. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods. We analyze the role of object queries in our architecture and reveal that the object queries specialize in detecting objects in specific image regions. Furthermore, we quantify the accuracy trade-off of using datasets of smaller sizes to train our model. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: Robotics and Autonomous Systems Journal, Elsevier, to appear 2023. arXiv admin note: substantial text overlap with arXiv:2205.02536

arXiv:2307.09210 [pdf, other]

Nested stochastic block model for simultaneously clustering networks and nodes

Authors: Nathaniel Josephs, Arash A. Amini, Marina Paez, Lizhen Lin

Abstract: We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the… ▽ More We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.12146 [pdf, other]

Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals

Authors: Robin Chan, Afra Amini, Mennatallah El-Assady

Abstract: We present a human-in-the-loop dashboard tailored to diagnosing potential spurious features that NLI models rely on for predictions. The dashboard enables users to generate diverse and challenging examples by drawing inspiration from GPT-3 suggestions. Additionally, users can receive feedback from a trained NLI model on how challenging the newly created example is and make refinements based on the… ▽ More We present a human-in-the-loop dashboard tailored to diagnosing potential spurious features that NLI models rely on for predictions. The dashboard enables users to generate diverse and challenging examples by drawing inspiration from GPT-3 suggestions. Additionally, users can receive feedback from a trained NLI model on how challenging the newly created example is and make refinements based on the feedback. Through our investigation, we discover several categories of spurious correlations that impact the reasoning of NLI models, which we group into three categories: Semantic Relevance, Logical Fallacies, and Bias. Based on our findings, we identify and describe various research opportunities, including diversifying training data and assessing NLI models' robustness by creating adversarial test suites. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 7 pages, Accepted at ACL 2023: System Demonstrations

arXiv:2306.05477 [pdf, other]

Hexatagging: Projective Dependency Parsing as Tagging

Authors: Afra Amini, Tianyu Liu, Ryan Cotterell

Abstract: We introduce a novel dependency parser, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags. In contrast to many approaches to dependency parsing, our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other. A… ▽ More We introduce a novel dependency parser, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags. In contrast to many approaches to dependency parsing, our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other. Additionally, exact decoding is linear in time and space complexity. Furthermore, we derive a probabilistic dependency parser that predicts hexatags using no more than a linear model with features from a pretrained language model, i.e., we forsake a bespoke architecture explicitly designed for the task. Despite the generality and simplicity of our approach, we achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set. Additionally, our parser's linear time complexity and parallelism significantly improve computational efficiency, with a roughly 10-times speed-up over previous state-of-the-art models during decoding. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: accepted at ACL 2023

arXiv:2306.03061 [pdf, other]

Structured Voronoi Sampling

Authors: Afra Amini, Li Du, Ryan Cotterell

Abstract: Gradient-based sampling algorithms have demonstrated their effectiveness in text generation, especially in the context of controlled text generation. However, there exists a lack of theoretically grounded and principled approaches for this task. In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods. We use discr… ▽ More Gradient-based sampling algorithms have demonstrated their effectiveness in text generation, especially in the context of controlled text generation. However, there exists a lack of theoretically grounded and principled approaches for this task. In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods. We use discrete distributions given by language models to define densities and develop an algorithm based on Hamiltonian Monte Carlo to sample from them. We name our gradient-based technique Structured Voronoi Sampling (SVS). In an experimental setup where the reference distribution is known, we show that the empirical distribution of SVS samples is closer to the reference distribution compared to alternative sampling schemes. Furthermore, in a controlled generation task, SVS is able to generate fluent and diverse samples while following the control targets significantly better than other methods. △ Less

Submitted 6 June, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted at NeurIPS 2023

arXiv:2305.15057 [pdf, other]

Linear-Time Modeling of Linguistic Structure: An Order-Theoretic Perspective

Authors: Tianyu Liu, Afra Amini, Mrinmaya Sachan, Ryan Cotterell

Abstract: Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language. Such tasks, in general, require exhaustive pair-wise comparisons of tokens, thus having a quadratic runtime complexity in the length of the string. We show that these exhaustive comparisons can be avoided, and, moreover, the complexity of such tasks can be reduced to linear by cast… ▽ More Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language. Such tasks, in general, require exhaustive pair-wise comparisons of tokens, thus having a quadratic runtime complexity in the length of the string. We show that these exhaustive comparisons can be avoided, and, moreover, the complexity of such tasks can be reduced to linear by casting the relation between tokens as a partial order over the string. Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string. Each total order implies a set of arcs oriented from smaller to greater tokens, sorted by their predicted numbers. The intersection of total orders results in a partial order over the set of tokens in the string, which is then decoded into a directed graph representing the desired linguistic structure. Our experiments on dependency parsing and coreference resolution show that our method achieves state-of-the-art or comparable performance. Moreover, the linear complexity and parallelism of our method double the speed of graph-based coreference resolution models, and bring a 10-times speed-up over graph-based dependency parsers. △ Less

Submitted 12 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: EMNLP 2023, 23 pages

arXiv:2305.14171 [pdf, other]

In-Context Probing: Toward Building Robust Classifiers via Probing Large Language Models

Authors: Afra Amini, Massimiliano Ciaramita

Abstract: Large language models are able to learn new tasks in context, where they are provided with instructions and a few annotated examples. However, the effectiveness of in-context learning is dependent on the provided context, and the performance on a downstream task can vary considerably, depending on the instruction. Importantly, such dependency on the context can surface in unpredictable ways, e.g.,… ▽ More Large language models are able to learn new tasks in context, where they are provided with instructions and a few annotated examples. However, the effectiveness of in-context learning is dependent on the provided context, and the performance on a downstream task can vary considerably, depending on the instruction. Importantly, such dependency on the context can surface in unpredictable ways, e.g., a seemingly more informative instruction might lead to a worse performance. In this paper, we propose an alternative approach, which we term In-Context Probing (ICP). Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label. Through a series of experiments on a diverse set of classification tasks, we show that in-context probing is significantly more robust to changes in instructions. We further show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models, with less than a hundred training examples. △ Less

Submitted 22 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.02733 [pdf, other]

Learning Stability Attention in Vision-based End-to-end Driving Policies

Authors: Tsun-Hsuan Wang, Wei Xiao, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus

Abstract: Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end… ▽ More Modern end-to-end learning systems can learn to explicitly infer control from perception. However, it is difficult to guarantee stability and robustness for these systems since they are often exposed to unstructured, high-dimensional, and complex observation spaces (e.g., autonomous driving from a stream of pixel inputs). We propose to leverage control Lyapunov functions (CLFs) to equip end-to-end vision-based policies with stability properties and introduce stability attention in CLFs (att-CLFs) to tackle environmental changes and improve learning flexibility. We also present an uncertainty propagation technique that is tightly integrated into att-CLFs. We demonstrate the effectiveness of att-CLFs via comparison with classical CLFs, model predictive control, and vanilla end-to-end learning in a photo-realistic simulator and on a real full-scale autonomous vehicle. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: First two authors contributed equally; L4DC 2023

arXiv:2302.02956 [pdf, other]

RoboCup 2022 AdultSize Winner NimbRo: Upgraded Perception, Capture Steps Gait and Phase-based In-walk Kicks

Authors: Dmytro Pavlichenko, Grzegorz Ficht, Arash Amini, Mojtaba Hosseini, Raphael Memmesheimer, Angel Villar-Corrales, Stefan M. Schulz, Marcell Missura, Maren Bennewitz, Sven Behnke

Abstract: Beating the human world champions by 2050 is an ambitious goal of the Humanoid League that provides a strong incentive for RoboCup teams to further improve and develop their systems. In this paper, we present upgrades of our system which enabled our team NimbRo to win the Soccer Tournament, the Drop-in Games, and the Technical Challenges in the Humanoid AdultSize League of RoboCup 2022. Strong per… ▽ More Beating the human world champions by 2050 is an ambitious goal of the Humanoid League that provides a strong incentive for RoboCup teams to further improve and develop their systems. In this paper, we present upgrades of our system which enabled our team NimbRo to win the Soccer Tournament, the Drop-in Games, and the Technical Challenges in the Humanoid AdultSize League of RoboCup 2022. Strong performance in these competitions resulted in the Best Humanoid award in the Humanoid League. The mentioned upgrades include: hardware upgrade of the vision module, balanced walking with Capture Steps, and the introduction of phase-based in-walk kicks. △ Less

Submitted 7 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Journal ref: In: RoboCup 2022: Robot World Cup XXV. LNCS 13561, Springer, May 2023

arXiv:2302.01428 [pdf, other]

Understanding Reconstruction Attacks with the Neural Tangent Kernel and Dataset Distillation

Authors: Noel Loo, Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus

Abstract: Modern deep learning requires large volumes of data, which could contain sensitive or private information that cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effectiv… ▽ More Modern deep learning requires large volumes of data, which could contain sensitive or private information that cannot be leaked. Recent work has shown for homogeneous neural networks a large portion of this training data could be reconstructed with only access to the trained network parameters. While the attack was shown to work empirically, there exists little formal understanding of its effective regime which datapoints are susceptible to reconstruction. In this work, we first build a stronger version of the dataset reconstruction attack and show how it can provably recover the \emph{entire training set} in the infinite width regime. We then empirically study the characteristics of this attack on two-layer networks and reveal that its success heavily depends on deviations from the frozen infinite-width Neural Tangent Kernel limit. Next, we study the nature of easily-reconstructed images. We show that both theoretically and empirically, reconstructed images tend to "outliers" in the dataset, and that these reconstruction attacks can be used for \textit{dataset distillation}, that is, we can retrain on reconstructed images and obtain high predictive accuracy. △ Less

Submitted 9 November, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.06974 [pdf, other]

Towards Improving the Explainability of Text-based Information Retrieval with Knowledge Graphs

Authors: Boqi Chen, Kua Chen, Yujing Yang, Afshin Amini, Bharat Saxena, Cecilia Chávez-García, Majid Babaei, Amir Feizpour, Dániel Varró

Abstract: Thanks to recent advancements in machine learning, vector-based methods have been adopted in many modern information retrieval (IR) systems. While showing promising retrieval performance, these approaches typically fail to explain why a particular document is retrieved as a query result to address explainable information retrieval(XIR). Knowledge graphs record structured information about entities… ▽ More Thanks to recent advancements in machine learning, vector-based methods have been adopted in many modern information retrieval (IR) systems. While showing promising retrieval performance, these approaches typically fail to explain why a particular document is retrieved as a query result to address explainable information retrieval(XIR). Knowledge graphs record structured information about entities and inherently explainable relationships. Most of existing XIR approaches focus exclusively on the retrieval model with little consideration on using existing knowledge graphs for providing an explanation. In this paper, we propose a general architecture to incorporate knowledge graphs for XIR in various steps of the retrieval process. Furthermore, we create two instances of the architecture for different types of explanation. We evaluate our approaches on well-known IR benchmarks using standard metrics and compare them with vector-based methods as baselines. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: 7 pages, The 1st Workshop on Trustworthy Learning on Graphs (TrustLOG)

arXiv:2211.11069 [pdf, other]

Learning Nonlinear Couplings in Network of Agents from a Single Sample Trajectory

Authors: Arash Amini, Qiyu Sun, Nader Motee

Abstract: We consider a class of stochastic dynamical networks whose governing dynamics can be modeled using a coupling function. It is shown that the dynamics of such networks can generate geometrically ergodic trajectories under some reasonable assumptions. We show that a general class of coupling functions can be learned using only one sample trajectory from the network. This is practically plausible as… ▽ More We consider a class of stochastic dynamical networks whose governing dynamics can be modeled using a coupling function. It is shown that the dynamics of such networks can generate geometrically ergodic trajectories under some reasonable assumptions. We show that a general class of coupling functions can be learned using only one sample trajectory from the network. This is practically plausible as in numerous applications it is desired to run an experiment only once but for a longer period of time, rather than repeating the same experiment multiple times from different initial conditions. Building upon ideas from the concentration inequalities for geometrically ergodic Markov chains, we formulate several results about the convergence of the empirical estimator to the true coupling function. Our theoretical findings are supported by extensive simulation results. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: 15 pages, 5 figures

MSC Class: 93E35 (Primary) 93B70; 47H25 (Secondary)

arXiv:2211.07344 [pdf, other]

On Parsing as Tagging

Authors: Afra Amini, Ryan Cotterell

Abstract: There have been many proposals to reduce constituency parsing to tagging in the literature. To better understand what these approaches have in common, we cast several existing proposals into a unifying pipeline consisting of three steps: linearization, learning, and decoding. In particular, we show how to reduce tetratagging, a state-of-the-art constituency tagger, to shift--reduce parsing by perf… ▽ More There have been many proposals to reduce constituency parsing to tagging in the literature. To better understand what these approaches have in common, we cast several existing proposals into a unifying pipeline consisting of three steps: linearization, learning, and decoding. In particular, we show how to reduce tetratagging, a state-of-the-art constituency tagger, to shift--reduce parsing by performing a right-corner transformation on the grammar and making a specific independence assumption. Furthermore, we empirically evaluate our taxonomy of tagging pipelines with different choices of linearizers, learners, and decoders. Based on the results in English and a set of 8 typologically diverse languages, we conclude that the linearization of the derivation tree and its alignment with the input sequence is the most critical factor in achieving accurate taggers. △ Less

Submitted 20 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: Will appear in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

arXiv:2210.12067 [pdf, other]

Efficient Dataset Distillation Using Random Feature Approximation

Authors: Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus

Abstract: Dataset distillation compresses large datasets into smaller synthetic coresets which retain performance with the aim of reducing the storage and computational burden of processing the entire dataset. Today's best-performing algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the correspondence between infinite-width neural networks and kernel-ridge regression, is prohibitively slo… ▽ More Dataset distillation compresses large datasets into smaller synthetic coresets which retain performance with the aim of reducing the storage and computational burden of processing the entire dataset. Today's best-performing algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the correspondence between infinite-width neural networks and kernel-ridge regression, is prohibitively slow due to the exact computation of the neural tangent kernel matrix, scaling $O(|S|^2)$, with $|S|$ being the coreset size. To improve this, we propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel, which reduces the kernel matrix computation to $O(|S|)$. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. We demonstrate the effectiveness of our approach on tasks involving model interpretability and privacy preservation. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted to the Conference on the Advances in Neural Information Processing Systems (NeurIPS) 2022

Showing 1–50 of 163 results for author: Amini, A