subscribe to arXiv mailings

Oogway: Designing, Implementing, and Testing an AUV for RoboSub 2023

Authors: Will Denton, Lilly Chiavetta, Michael Bryant, Vedarsh Shah, Rico Zhu, Ricky Weerts, Phillip Xue, Vincent Chen, Hung Le, Maxwell Lin, Austin Camacho, Drew Council, Ethan Horowitz, Jackie Ong, Morgan Chu, Alex Pool

Abstract: The Duke Robotics Club is proud to present our robot for the 2023 RoboSub Competition: Oogway. Oogway marks one of the largest design overhauls in club history. Beyond a revamped formfactor, some of Oogway's notable features include all-new computer vision software, advanced sonar integration, novel acoustics hardware processing, and upgraded stereoscopic cameras. Oogway was built on the principle… ▽ More The Duke Robotics Club is proud to present our robot for the 2023 RoboSub Competition: Oogway. Oogway marks one of the largest design overhauls in club history. Beyond a revamped formfactor, some of Oogway's notable features include all-new computer vision software, advanced sonar integration, novel acoustics hardware processing, and upgraded stereoscopic cameras. Oogway was built on the principle of independent, well-integrated, and reliable subsystems. Individual components and subsystems were tested and designed separately. Oogway's most advanced capabilities are a result of the tight integration between these subsystems. Such examples include sonar-assisted computer vision algorithms and robot-agnostic controls configured in part through the robot's 3D model. The success of constructing and testing Oogway in under 2 year's time can be attributed to 20+ contributing club members, supporters within Duke's Pratt School of Engineering, and outside sponsors. △ Less

Submitted 12 October, 2024; originally announced October 2024.

Comments: arXiv admin note: text overlap with arXiv:2410.09684

arXiv:2410.10736 [pdf, other]

Towards Calibrated Losses for Adversarial Robust Reject Option Classification

Authors: Vrund Shah, Tejas Chaudhari, Naresh Manwani

Abstract: Robustness towards adversarial attacks is a vital property for classifiers in several applications such as autonomous driving, medical diagnosis, etc. Also, in such scenarios, where the cost of misclassification is very high, knowing when to abstain from prediction becomes crucial. A natural question is which surrogates can be used to ensure learning in scenarios where the input points are adversa… ▽ More Robustness towards adversarial attacks is a vital property for classifiers in several applications such as autonomous driving, medical diagnosis, etc. Also, in such scenarios, where the cost of misclassification is very high, knowing when to abstain from prediction becomes crucial. A natural question is which surrogates can be used to ensure learning in scenarios where the input points are adversarially perturbed and the classifier can abstain from prediction? This paper aims to characterize and design surrogates calibrated in "Adversarial Robust Reject Option" setting. First, we propose an adversarial robust reject option loss $\ell_{d}^γ$ and analyze it for the hypothesis set of linear classifiers ($\mathcal{H}_{\textrm{lin}}$). Next, we provide a complete characterization result for any surrogate to be $(\ell_{d}^γ,\mathcal{H}_{\textrm{lin}})$- calibrated. To demonstrate the difficulty in designing surrogates to $\ell_{d}^γ$, we show negative calibration results for convex surrogates and quasi-concave conditional risk cases (these gave positive calibration in adversarial setting without reject option). We also empirically argue that Shifted Double Ramp Loss (DRL) and Shifted Double Sigmoid Loss (DSL) satisfy the calibration conditions. Finally, we demonstrate the robustness of shifted DRL and shifted DSL against adversarial perturbations on a synthetically generated dataset. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: Accepted at Asian Conference on Machine Learning (ACML) , 2024

arXiv:2410.09684 [pdf, other]

Technical Design Review of Duke Robotics Club's Oogway: An AUV for RoboSub 2024

Authors: Will Denton, Michael Bryant, Lilly Chiavetta, Vedarsh Shah, Rico Zhu, Philip Xue, Vincent Chen, Maxwell Lin, Hung Le, Austin Camacho, Raul Galvez, Nathan Yang, Nathanael Ren, Tyler Rose, Mathew Chu, Amir Ergashev, Saagar Arya, Kaelyn Pieter, Ethan Horowitz, Maanav Allampallam, Patrick Zheng, Mia Kaarls, June Wood

Abstract: The Duke Robotics Club is proud to present our robot for the 2024 RoboSub Competition: Oogway. Now in its second year, Oogway has been dramatically upgraded in both its capabilities and reliability. Oogway was built on the principle of independent, well-integrated, and reliable subsystems. Individual components and subsystems were tested and designed separately. Oogway's most advanced capabilities… ▽ More The Duke Robotics Club is proud to present our robot for the 2024 RoboSub Competition: Oogway. Now in its second year, Oogway has been dramatically upgraded in both its capabilities and reliability. Oogway was built on the principle of independent, well-integrated, and reliable subsystems. Individual components and subsystems were tested and designed separately. Oogway's most advanced capabilities are a result of the tight integration between these subsystems. Such examples include a re-envisioned controls system, an entirely new electrical stack, advanced sonar integration, additional cameras and system monitoring, a new marker dropper, and a watertight capsule mechanism. These additions enabled Oogway to prequalify for Robosub 2024. △ Less

Submitted 12 October, 2024; originally announced October 2024.

arXiv:2410.07836 [pdf, other]

Masked Generative Priors Improve World Models Sequence Modelling Capabilities

Authors: Cristian Meo, Mircea Lica, Zarif Ikram, Akihiro Nakano, Vedant Shah, Aniket Rajiv Didolkar, Dianbo Liu, Anirudh Goyal, Justin Dauwels

Abstract: Deep Reinforcement Learning (RL) has become the leading approach for creating artificial agents in complex environments. Model-based approaches, which are RL methods with world models that predict environment dynamics, are among the most promising directions for improving data efficiency, forming a critical step toward bridging the gap between research and real-world deployment. In particular, wor… ▽ More Deep Reinforcement Learning (RL) has become the leading approach for creating artificial agents in complex environments. Model-based approaches, which are RL methods with world models that predict environment dynamics, are among the most promising directions for improving data efficiency, forming a critical step toward bridging the gap between research and real-world deployment. In particular, world models enhance sample efficiency by learning in imagination, which involves training a generative sequence model of the environment in a self-supervised manner. Recently, Masked Generative Modelling has emerged as a more efficient and superior inductive bias for modelling and generating token sequences. Building on the Efficient Stochastic Transformer-based World Models (STORM) architecture, we replace the traditional MLP prior with a Masked Generative Prior (e.g., MaskGIT Prior) and introduce GIT-STORM. We evaluate our model on two downstream tasks: reinforcement learning and video prediction. GIT-STORM demonstrates substantial performance gains in RL tasks on the Atari 100k benchmark. Moreover, we apply Transformer-based World Models to continuous action environments for the first time, addressing a significant gap in prior research. To achieve this, we employ a state mixer function that integrates latent state representations with actions, enabling our model to handle continuous control tasks. We validate this approach through qualitative and quantitative analyses on the DeepMind Control Suite, showcasing the effectiveness of Transformer-based World Models in this new domain. Our results highlight the versatility and efficacy of the MaskGIT dynamics prior, paving the way for more accurate world models and effective RL policies. △ Less

Submitted 13 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

arXiv:2409.10532 [pdf, other]

Slug Mobile: Test-Bench for RL Testing

Authors: Jonathan Wellington Morris, Vishrut Shah, Alex Besanceney, Daksh Shah, Leilani H. Gilpin

Abstract: Sim-to real gap in Reinforcement Learning is when a model trained in a simulator does not translate to the real world. This is a problem for Autonomous Vehicles (AVs) as vehicle dynamics can vary from simulation to reality, and also from vehicle to vehicle. Slug Mobile is a one tenth scale autonomous vehicle created to help address the sim-to-real gap for AVs by acting as a test-bench to develop m… ▽ More Sim-to real gap in Reinforcement Learning is when a model trained in a simulator does not translate to the real world. This is a problem for Autonomous Vehicles (AVs) as vehicle dynamics can vary from simulation to reality, and also from vehicle to vehicle. Slug Mobile is a one tenth scale autonomous vehicle created to help address the sim-to-real gap for AVs by acting as a test-bench to develop models that can easily scale from one vehicle to another. In addition to traditional sensors found in other one tenth scale AVs, we have also included a Dynamic Vision Sensor so we can train Spiking Neural Networks running on neuromorphic hardware. △ Less

Submitted 30 August, 2024; originally announced September 2024.

Comments: Submitted to BayLearn 2024

arXiv:2409.06899 [pdf, other]

Inefficient Alliance Formation in Coalitional Blotto Games

Authors: Vade Shah, Keith Paarporn, Jason R. Marden

Abstract: When multiple agents are engaged in a network of conflict, some can advance their competitive positions by forming alliances with each other. However, the costs associated with establishing an alliance may outweigh the potential benefits. This study investigates costly alliance formation in the framework of coalitional Blotto games, in which two players compete separately against a common adversar… ▽ More When multiple agents are engaged in a network of conflict, some can advance their competitive positions by forming alliances with each other. However, the costs associated with establishing an alliance may outweigh the potential benefits. This study investigates costly alliance formation in the framework of coalitional Blotto games, in which two players compete separately against a common adversary, and are able to collude by exchanging resources with one another. Previous work has shown that both players in the alliance can mutually benefit if one player unilaterally donates, or transfers, a portion of their budget to the other. In this letter, we consider a variation where the transfer of resources is inherently inefficient, meaning that the recipient of the transfer only receives a fraction of the donation. Our findings reveal that even in the presence of inefficiencies, mutually beneficial transfers are still possible. More formally, our main result provides necessary and sufficient conditions for the existence of such transfers, offering insights into the robustness of alliance formation in competitive environments with resource constraints. △ Less

Submitted 17 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.04669 [pdf, other]

Learning Optimal Stable Matches in Decentralized Markets with Unknown Preferences

Authors: Vade Shah, Bryce L. Ferguson, Jason R. Marden

Abstract: Matching algorithms have demonstrated great success in several practical applications, but they often require centralized coordination and plentiful information. In many modern online marketplaces, agents must independently seek out and match with another using little to no information. For these kinds of settings, can we design decentralized, limited-information matching algorithms that preserve… ▽ More Matching algorithms have demonstrated great success in several practical applications, but they often require centralized coordination and plentiful information. In many modern online marketplaces, agents must independently seek out and match with another using little to no information. For these kinds of settings, can we design decentralized, limited-information matching algorithms that preserve the desirable properties of standard centralized techniques? In this work, we constructively answer this question in the affirmative. We model a two-sided matching market as a game consisting of two disjoint sets of agents, referred to as proposers and acceptors, each of whom seeks to match with their most preferable partner on the opposite side of the market. However, each proposer has no knowledge of their own preferences, so they must learn their preferences while forming matches in the market. We present a simple online learning rule that guarantees a strong notion of probabilistic convergence to the welfare-maximizing equilibrium of the game, referred to as the proposer-optimal stable match. To the best of our knowledge, this represents the first completely decoupled, communication-free algorithm that guarantees probabilistic convergence to an optimal stable match, irrespective of the structure of the matching market. △ Less

Submitted 16 October, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

arXiv:2408.10397 [pdf, other]

Webcam-based Pupil Diameter Prediction Benefits from Upscaling

Authors: Vijul Shah, Brian B. Moser, Ko Watanabe, Andreas Dengel

Abstract: Capturing pupil diameter is essential for assessing psychological and physiological states such as stress levels and cognitive load. However, the low resolution of images in eye datasets often hampers precise measurement. This study evaluates the impact of various upscaling methods, ranging from bicubic interpolation to advanced super-resolution, on pupil diameter predictions. We compare several p… ▽ More Capturing pupil diameter is essential for assessing psychological and physiological states such as stress levels and cognitive load. However, the low resolution of images in eye datasets often hampers precise measurement. This study evaluates the impact of various upscaling methods, ranging from bicubic interpolation to advanced super-resolution, on pupil diameter predictions. We compare several pre-trained methods, including CodeFormer, GFPGAN, Real-ESRGAN, HAT, and SRResNet. Our findings suggest that pupil diameter prediction models trained on upscaled datasets are highly sensitive to the selected upscaling method and scale. Our results demonstrate that upscaling methods consistently enhance the accuracy of pupil diameter prediction models, highlighting the importance of upscaling in pupilometry. Overall, our work provides valuable insights for selecting upscaling techniques, paving the way for more accurate assessments in psychological and physiological research. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2407.21009 [pdf, other]

AI-Assisted Generation of Difficult Math Questions

Authors: Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Jiatong Yu, Yinghui He, Nan Rosemary Ke, Michael Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal

Abstract: Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLM… ▽ More Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core "skills" from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an "out of distribution" task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH$^2$ - a dataset of higher-quality math questions, as evidenced by: (a) Lower performance of all models on MATH$^2$ than on MATH (b) Higher performance on MATH when using MATH$^2$ questions as in-context examples. Although focused on mathematics, our methodology seems applicable to other domains requiring structured reasoning, and potentially as a component of scalable oversight. Also of interest is a striking relationship observed between models' performance on the new dataset: the success rate on MATH$^2$ is the square on MATH, suggesting that successfully solving the question in MATH$^2$ requires a nontrivial combination of two distinct math skills. △ Less

Submitted 5 October, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.11364 [pdf, ps, other]

Learning-augmented Maximum Independent Set

Authors: Vladimir Braverman, Prathamesh Dharangutte, Vihan Shah, Chen Wang

Abstract: We study the Maximum Independent Set (MIS) problem on general graphs within the framework of learning-augmented algorithms. The MIS problem is known to be NP-hard and is also NP-hard to approximate to within a factor of $n^{1-δ}$ for any $δ>0$. We show that we can break this barrier in the presence of an oracle obtained through predictions from a machine learning model that answers vertex membersh… ▽ More We study the Maximum Independent Set (MIS) problem on general graphs within the framework of learning-augmented algorithms. The MIS problem is known to be NP-hard and is also NP-hard to approximate to within a factor of $n^{1-δ}$ for any $δ>0$. We show that we can break this barrier in the presence of an oracle obtained through predictions from a machine learning model that answers vertex membership queries for a fixed MIS with probability $1/2+\varepsilon$. In the first setting we consider, the oracle can be queried once per vertex to know if a vertex belongs to a fixed MIS, and the oracle returns the correct answer with probability $1/2 + \varepsilon$. Under this setting, we show an algorithm that obtains an $\tilde{O}(\sqrtΔ/\varepsilon)$-approximation in $O(m)$ time where $Δ$ is the maximum degree of the graph. In the second setting, we allow multiple queries to the oracle for a vertex, each of which is correct with probability $1/2 + \varepsilon$. For this setting, we show an $O(1)$-approximation algorithm using $O(n/\varepsilon^2)$ total queries and $\tilde{O}(m)$ runtime. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: APPROX 2024

arXiv:2407.11204 [pdf, other]

EyeDentify: A Dataset for Pupil Diameter Estimation based on Webcam Images

Authors: Vijul Shah, Ko Watanabe, Brian B. Moser, Andreas Dengel

Abstract: In this work, we introduce EyeDentify, a dataset specifically designed for pupil diameter estimation based on webcam images. EyeDentify addresses the lack of available datasets for pupil diameter estimation, a crucial domain for understanding physiological and psychological states traditionally dominated by highly specialized sensor systems such as Tobii. Unlike these advanced sensor systems and a… ▽ More In this work, we introduce EyeDentify, a dataset specifically designed for pupil diameter estimation based on webcam images. EyeDentify addresses the lack of available datasets for pupil diameter estimation, a crucial domain for understanding physiological and psychological states traditionally dominated by highly specialized sensor systems such as Tobii. Unlike these advanced sensor systems and associated costs, webcam images are more commonly found in practice. Yet, deep learning models that can estimate pupil diameters using standard webcam data are scarce. By providing a dataset of cropped eye images alongside corresponding pupil diameter information, EyeDentify enables the development and refinement of models designed specifically for less-equipped environments, democratizing pupil diameter estimation by making it more accessible and broadly applicable, which in turn contributes to multiple domains of understanding human activity and supporting healthcare. Our dataset is available at https://vijulshah.github.io/eyedentify/. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.06245 [pdf]

ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access Networks

Authors: Pranshav Gajjar, Vijay K. Shah

Abstract: Large Language Models (LLMs) can revolutionize how we deploy and operate Open Radio Access Networks (O-RAN) by enhancing network analytics, anomaly detection, and code generation and significantly increasing the efficiency and reliability of a plethora of O-RAN tasks. In this paper, we present ORAN-Bench-13K, the first comprehensive benchmark designed to evaluate the performance of Large Language… ▽ More Large Language Models (LLMs) can revolutionize how we deploy and operate Open Radio Access Networks (O-RAN) by enhancing network analytics, anomaly detection, and code generation and significantly increasing the efficiency and reliability of a plethora of O-RAN tasks. In this paper, we present ORAN-Bench-13K, the first comprehensive benchmark designed to evaluate the performance of Large Language Models (LLMs) within the context of O-RAN. Our benchmark consists of 13,952 meticulously curated multiple-choice questions generated from 116 O-RAN specification documents. We leverage a novel three-stage LLM framework, and the questions are categorized into three distinct difficulties to cover a wide spectrum of ORAN-related knowledge. We thoroughly evaluate the performance of several state-of-the-art LLMs, including Gemini, Chat-GPT, and Mistral. Additionally, we propose ORANSight, a Retrieval-Augmented Generation (RAG)-based pipeline that demonstrates superior performance on ORAN-Bench-13K compared to other tested closed-source models. Our findings indicate that current popular LLM models are not proficient in O-RAN, highlighting the need for specialized models. We observed a noticeable performance improvement when incorporating the RAG-based ORANSight pipeline, with a Macro Accuracy of 0.784 and a Weighted Accuracy of 0.776, which was on average 21.55% and 22.59% better than the other tested LLMs. △ Less

Submitted 13 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2404.14355 [pdf, other]

Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models

Authors: Vishruth Veerendranath, Vishwa Shah, Kshitish Ghate

Abstract: Quantitative and numerical comprehension in language is an important task in many fields like education and finance, but still remains a challenging task for language models. While tool and calculator usage has shown to be helpful to improve mathematical reasoning in large pretrained decoder-only language models, this remains unexplored for smaller language models with encoders. In this paper, we… ▽ More Quantitative and numerical comprehension in language is an important task in many fields like education and finance, but still remains a challenging task for language models. While tool and calculator usage has shown to be helpful to improve mathematical reasoning in large pretrained decoder-only language models, this remains unexplored for smaller language models with encoders. In this paper, we propose Pre-Calc, a simple pre-finetuning objective of learning to use the calculator for both encoder-only and encoder-decoder architectures, formulated as a discriminative and generative task respectively. We pre-train BERT and RoBERTa for discriminative calculator use and Flan-T5 for generative calculator use on the MAWPS, SVAMP, and AsDiv-A datasets, which improves performance on downstream tasks that require numerical understanding. Our code and data are available at https://github.com/calc-cmu/pre-calc. △ Less

Submitted 25 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: AI4Math workshop, ICML 2024

arXiv:2404.12464 [pdf, other]

NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models

Authors: Abhinav Rao, Akhila Yerukola, Vishwa Shah, Katharina Reinecke, Maarten Sap

Abstract: To be effectively and safely deployed to global user populations, large language models (LLMs) must adapt outputs to user values and culture, not just know about them. We introduce NormAd, an evaluation framework to assess LLMs' cultural adaptability, specifically measuring their ability to judge social acceptability across different levels of cultural norm specificity, from abstract values to exp… ▽ More To be effectively and safely deployed to global user populations, large language models (LLMs) must adapt outputs to user values and culture, not just know about them. We introduce NormAd, an evaluation framework to assess LLMs' cultural adaptability, specifically measuring their ability to judge social acceptability across different levels of cultural norm specificity, from abstract values to explicit social norms. As an instantiation of our framework, we create NormAd-Eti, a benchmark of 2.6k situational descriptions representing social-etiquette related cultural norms from 75 countries. Through comprehensive experiments on NormAd-Eti, we find that LLMs struggle to accurately judge social acceptability across these varying degrees of cultural contexts and show stronger adaptability to English-centric cultures over those from the Global South. Even in the simplest setting where the relevant social norms are provided, our best models' performance (<82%) lags behind humans (>95%). In settings with abstract values and country information, model performance drops substantially (<60%), while human accuracy remains high (>90%). Furthermore, we find that models are better at recognizing socially acceptable versus unacceptable situations. Our findings showcase the current pitfalls in socio-cultural reasoning of LLMs which hinder their adaptability for global audiences. △ Less

Submitted 19 October, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Preprint. In Review

arXiv:2404.10094 [pdf, other]

Towards DNA-Encoded Library Generation with GFlowNets

Authors: Michał Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio

Abstract: DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate se… ▽ More DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2402.09710 [pdf]

Preserving Data Privacy for ML-driven Applications in Open Radio Access Networks

Authors: Pranshav Gajjar, Azuka Chiejina, Vijay K. Shah

Abstract: Deep learning offers a promising solution to improve spectrum access techniques by utilizing data-driven approaches to manage and share limited spectrum resources for emerging applications. For several of these applications, the sensitive wireless data (such as spectrograms) are stored in a shared database or multistakeholder cloud environment and are therefore prone to privacy leaks. This paper a… ▽ More Deep learning offers a promising solution to improve spectrum access techniques by utilizing data-driven approaches to manage and share limited spectrum resources for emerging applications. For several of these applications, the sensitive wireless data (such as spectrograms) are stored in a shared database or multistakeholder cloud environment and are therefore prone to privacy leaks. This paper aims to address such privacy concerns by examining the representative case study of shared database scenarios in 5G Open Radio Access Network (O-RAN) networks where we have a shared database within the near-real-time (near-RT) RAN intelligent controller. We focus on securing the data that can be used by machine learning (ML) models for spectrum sharing and interference mitigation applications without compromising the model and network performances. The underlying idea is to leverage a (i) Shuffling-based learnable encryption technique to encrypt the data, following which, (ii) employ a custom Vision transformer (ViT) as the trained ML model that is capable of performing accurate inferences on such encrypted data. The paper offers a thorough analysis and comparisons with analogous convolutional neural networks (CNN) as well as deeper architectures (such as ResNet-50) as baselines. Our experiments showcase that the proposed approach significantly outperforms the baseline CNN with an improvement of 24.5% and 23.9% for the percent accuracy and F1-Score respectively when operated on encrypted data. Though deeper ResNet-50 architecture is obtained as a slightly more accurate model, with an increase of 4.4%, the proposed approach boasts a reduction of parameters by 99.32%, and thus, offers a much-improved prediction time by nearly 60%. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.06846 [pdf, other]

System-level Analysis of Adversarial Attacks and Defenses on Intelligence in O-RAN based Cellular Networks

Authors: Azuka Chiejina, Brian Kim, Kaushik Chowhdury, Vijay K. Shah

Abstract: While the open architecture, open interfaces, and integration of intelligence within Open Radio Access Network technology hold the promise of transforming 5G and 6G networks, they also introduce cybersecurity vulnerabilities that hinder its widespread adoption. In this paper, we conduct a thorough system-level investigation of cyber threats, with a specific focus on machine learning (ML) intellige… ▽ More While the open architecture, open interfaces, and integration of intelligence within Open Radio Access Network technology hold the promise of transforming 5G and 6G networks, they also introduce cybersecurity vulnerabilities that hinder its widespread adoption. In this paper, we conduct a thorough system-level investigation of cyber threats, with a specific focus on machine learning (ML) intelligence components known as xApps within the O-RAN's near-real-time RAN Intelligent Controller (near-RT RIC) platform. Our study begins by developing a malicious xApp designed to execute adversarial attacks on two types of test data - spectrograms and key performance metrics (KPMs), stored in the RIC database within the near-RT RIC. To mitigate these threats, we utilize a distillation technique that involves training a teacher model at a high softmax temperature and transferring its knowledge to a student model trained at a lower softmax temperature, which is deployed as the robust ML model within xApp. We prototype an over-the-air LTE/5G O-RAN testbed to assess the impact of these attacks and the effectiveness of the distillation defense technique by leveraging an ML-based Interference Classification (InterClass) xApp as an example. We examine two versions of InterClass xApp under distinct scenarios, one based on Convolutional Neural Networks (CNNs) and another based on Deep Neural Networks (DNNs) using spectrograms and KPMs as input data respectively. Our findings reveal up to 100% and 96.3% degradation in the accuracy of both the CNN and DNN models respectively resulting in a significant decline in network performance under considered adversarial attacks. Under the strict latency constraints of the near-RT RIC closed control loop, our analysis shows that the distillation technique outperforms classical adversarial training by achieving an accuracy of up to 98.3% for mitigating such attacks. △ Less

Submitted 13 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: his paper has been accepted for publication in ACM WiSec 2024

arXiv:2402.04447 [pdf, other]

Context-Aware Spectrum Coexistence of Terrestrial Beyond 5G Networks in Satellite Bands

Authors: Ta Seen Reaz Niloy, Zoheb Hasan, Rob Smith, Vikram R. Anapana, Vijay K. Shah

Abstract: Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art s… ▽ More Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art spectrum-sharing policies usually consider several worst-case assumptions and ignore site-specific contextual factors in making spectrum-sharing decisions, and thus, often results in under-utilization of the shared band for the secondary licensees. To address such limitations, this paper introduces CAT3S (Context-Aware Terrestrial-Satellite Spectrum Sharing) framework that empowers the coexisting terrestrial 5G network to maximize utilization of the shared satellite band without creating harmful interference to the incumbent links by exploiting the contextual factors. CAT3S consists of the following two components: (i) context-acquisition unit to collect and process essential contextual information for spectrum sharing and (ii) context-aware base station (BS) control unit to optimize the set of operational BSs and their operation parameters (i.e., transmit power and active beams per sector). To evaluate the performance of the CAT3S, a realistic spectrum coexistence case study over the 12 GHz band is considered. Experiment results demonstrate that the proposed CAT3S achieves notably higher spectrum utilization than state-of-the-art spectrum-sharing policies in different weather contexts. △ Less

Submitted 14 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02593 [pdf, other]

Leveraging Continuously Differentiable Activation Functions for Learning in Quantized Noisy Environments

Authors: Vivswan Shah, Nathan Youngblood

Abstract: Real-world analog systems intrinsically suffer from noise that can impede model convergence and accuracy on a variety of deep learning models. We demonstrate that differentiable activations like GELU and SiLU enable robust propagation of gradients which help to mitigate analog quantization error that is ubiquitous to all analog systems. We perform analysis and training of convolutional, linear, an… ▽ More Real-world analog systems intrinsically suffer from noise that can impede model convergence and accuracy on a variety of deep learning models. We demonstrate that differentiable activations like GELU and SiLU enable robust propagation of gradients which help to mitigate analog quantization error that is ubiquitous to all analog systems. We perform analysis and training of convolutional, linear, and transformer networks in the presence of quantized noise. Here, we are able to demonstrate that continuously differentiable activation functions are significantly more noise resilient over conventional rectified activations. As in the case of ReLU, the error in gradients are 100x higher than those in GELU near zero. Our findings provide guidance for selecting appropriate activations to realize performant and reliable hardware implementations across several machine learning domains such as computer vision, signal processing, and beyond. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.01207 [pdf, other]

Efficient Causal Graph Discovery Using Large Language Models

Authors: Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, Yoshua Bengio

Abstract: We propose a novel framework that leverages LLMs for full causal graph discovery. While previous LLM-based methods have used a pairwise query approach, this requires a quadratic number of queries which quickly becomes impractical for larger causal graphs. In contrast, the proposed framework uses a breadth-first search (BFS) approach which allows it to use only a linear number of queries. We also s… ▽ More We propose a novel framework that leverages LLMs for full causal graph discovery. While previous LLM-based methods have used a pairwise query approach, this requires a quadratic number of queries which quickly becomes impractical for larger causal graphs. In contrast, the proposed framework uses a breadth-first search (BFS) approach which allows it to use only a linear number of queries. We also show that the proposed method can easily incorporate observational data when available, to improve performance. In addition to being more time and data-efficient, the proposed framework achieves state-of-the-art results on real-world causal graphs of varying sizes. The results demonstrate the effectiveness and efficiency of the proposed method in discovering causal relationships, showcasing its potential for broad applicability in causal graph discovery tasks across different domains. △ Less

Submitted 20 July, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.06378 [pdf, ps, other]

New Lower Bounds in Merlin-Arthur Communication and Graph Streaming Verification

Authors: Prantar Ghosh, Vihan Shah

Abstract: We show new lower bounds in the \emph{Merlin-Arthur} (MA) communication model and the related \emph{annotated streaming} or stream verification model. The MA communication model is an enhancement of the classical communication model, where in addition to the usual players Alice and Bob, there is an all-powerful but untrusted player Merlin who knows their inputs and tries to convince them about the… ▽ More We show new lower bounds in the \emph{Merlin-Arthur} (MA) communication model and the related \emph{annotated streaming} or stream verification model. The MA communication model is an enhancement of the classical communication model, where in addition to the usual players Alice and Bob, there is an all-powerful but untrusted player Merlin who knows their inputs and tries to convince them about the output. Most functions have MA protocols with total communication significantly smaller than what would be needed without Merlin. We focus on the online MA (OMA) model, which is the MA analogue of one-way communication, and introduce the notion of \emph{non-trivial-OMA} complexity of a function. This is the minimum total communication needed by any non-trivial OMA protocol computing that function, where a trivial OMA protocol is one where Alice sends Bob roughly as many bits as she would have sent without Merlin. We prove a lower bound on the non-trivial-OMA complexity of a natural function \emph{Equals-Index} (basically the well-known Index problem on large domains) and identify it as a canonical problem for proving strong lower bounds on this complexity: reductions from it (i) reproduce and/or improve upon the lower bounds for all functions that were previously known to have large non-trivial-OMA complexity, (ii) exhibit the first explicit functions whose non-trivial-OMA complexity is superlinear, and even exponential, in their classical one-way complexity, and (iii) show functions on input size $n$ for which this complexity is as large as $n/\log n$. While exhibiting a function with $ω(\sqrt{n})$ (standard) OMA complexity is a longstanding open problem, we did not even know of any function with $ω(\sqrt{n})$ non-trivial-OMA complexity. We further extend the lower bounds to a related streaming model called annotated streaming. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: To appear in ITCS 2024

arXiv:2401.03271 [pdf, other]

Analysis and Validation of Image Search Engines in Histopathology

Authors: Isaiah Lahr, Saghir Alfasly, Peyman Nejat, Jibran Khan, Luke Kottom, Vaishnavi Kumbhar, Areej Alsaafin, Abubakr Shafique, Sobhan Hemati, Ghazal Alabtah, Nneka Comfere, Dennis Murphee, Aaron Mangold, Saba Yasir, Chady Meroueh, Lisa Boardman, Vijay H. Shah, Joaquin J. Garcia, H. R. Tizhoosh

Abstract: Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient ma… ▽ More Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient matching. In this paper, we report extensive analysis and validation of four search methods bag of visual words (BoVW), Yottixel, SISH, RetCCL, and some of their potential variants. We analyze their algorithms and structures and assess their performance. For this evaluation, we utilized four internal datasets ($1269$ patients) and three public datasets ($1207$ patients), totaling more than $200,000$ patches from $38$ different classes/subtypes across five primary sites. Certain search engines, for example, BoVW, exhibit notable efficiency and speed but suffer from low accuracy. Conversely, search engines like Yottixel demonstrate efficiency and speed, providing moderately accurate results. Recent proposals, including SISH, display inefficiency and yield inconsistent outcomes, while alternatives like RetCCL prove inadequate in both accuracy and efficiency. Further research is imperative to address the dual aspects of accuracy and minimal storage requirements in histopathological image search. △ Less

Submitted 8 June, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.16094 [pdf, other]

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

Authors: Aiyu Cui, Jay Mahajan, Viraj Shah, Preeti Gomathinayagam, Chang Liu, Svetlana Lazebnik

Abstract: Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results f… ▽ More Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a StreetTryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks. △ Less

Submitted 16 July, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: The abstract and intro are updated. Some typos and some pdf rendering errors have been fixed in the version

arXiv:2311.15268 [pdf, other]

Unlearning via Sparse Representations

Authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

Abstract: Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's p… ▽ More Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost. △ Less

Submitted 10 October, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.13600 [pdf, other]

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Authors: Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani

Abstract: Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and su… ▽ More Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and subjects, existing techniques do not reliably address the problem; they often compromise either subject fidelity or style fidelity. We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs in order to achieve generation of any user-provided subject in any user-provided style. Experiments on a wide range of subject and style combinations show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity while preserving the ability to recontextualize. Project page: https://ziplora.github.io △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Project page: https://ziplora.github.io

arXiv:2309.11510 [pdf, other]

When is a Foundation Model a Foundation Model

Authors: Saghir Alfasly, Peyman Nejat, Sobhan Hemati, Jibran Khan, Isaiah Lahr, Areej Alsaafin, Abubakr Shafique, Nneka Comfere, Dennis Murphree, Chady Meroueh, Saba Yasir, Aaron Mangold, Lisa Boardman, Vijay Shah, Joaquin J. Garcia, H. R. Tizhoosh

Abstract: Recently, several studies have reported on the fine-tuning of foundation models for image-text modeling in the field of medicine, utilizing images from online data sources such as Twitter and PubMed. Foundation models are large, deep artificial neural networks capable of learning the context of a specific domain through training on exceptionally extensive datasets. Through validation, we have obse… ▽ More Recently, several studies have reported on the fine-tuning of foundation models for image-text modeling in the field of medicine, utilizing images from online data sources such as Twitter and PubMed. Foundation models are large, deep artificial neural networks capable of learning the context of a specific domain through training on exceptionally extensive datasets. Through validation, we have observed that the representations generated by such models exhibit inferior performance in retrieval tasks within digital pathology when compared to those generated by significantly smaller, conventional deep networks. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.03844 [pdf, other]

Experimental Study of Adversarial Attacks on ML-based xApps in O-RAN

Authors: Naveen Naik Sapavath, Brian Kim, Kaushik Chowdhury, Vijay K Shah

Abstract: Open Radio Access Network (O-RAN) is considered as a major step in the evolution of next-generation cellular networks given its support for open interfaces and utilization of artificial intelligence (AI) into the deployment, operation, and maintenance of RAN. However, due to the openness of the O-RAN architecture, such AI models are inherently vulnerable to various adversarial machine learning (ML… ▽ More Open Radio Access Network (O-RAN) is considered as a major step in the evolution of next-generation cellular networks given its support for open interfaces and utilization of artificial intelligence (AI) into the deployment, operation, and maintenance of RAN. However, due to the openness of the O-RAN architecture, such AI models are inherently vulnerable to various adversarial machine learning (ML) attacks, i.e., adversarial attacks which correspond to slight manipulation of the input to the ML model. In this work, we showcase the vulnerability of an example ML model used in O-RAN, and experimentally deploy it in the near-real time (near-RT) RAN intelligent controller (RIC). Our ML-based interference classifier xApp (extensible application in near-RT RIC) tries to classify the type of interference to mitigate the interference effect on the O-RAN system. We demonstrate the first-ever scenario of how such an xApp can be impacted through an adversarial attack by manipulating the data stored in a shared database inside the near-RT RIC. Through a rigorous performance analysis deployed on a laboratory O-RAN testbed, we evaluate the performance in terms of capacity and the prediction accuracy of the interference classifier xApp using both clean and perturbed data. We show that even small adversarial attacks can significantly decrease the accuracy of ML application in near-RT RIC, which can directly impact the performance of the entire O-RAN deployment. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted for Globecom 2023

arXiv:2308.07071 [pdf, other]

RL-based Variable Horizon Model Predictive Control of Multi-Robot Systems using Versatile On-Demand Collision Avoidance

Authors: Shreyash Gupta, Abhinav Kumar, Niladri S. Tripathy, Suril V. Shah

Abstract: Multi-robot systems have become very popular in recent years because of their wide spectrum of applications, ranging from surveillance to cooperative payload transportation. Model Predictive Control (MPC) is a promising controller for multi-robot control because of its preview capability and ability to handle constraints easily. The performance of the MPC widely depends on many parameters, among w… ▽ More Multi-robot systems have become very popular in recent years because of their wide spectrum of applications, ranging from surveillance to cooperative payload transportation. Model Predictive Control (MPC) is a promising controller for multi-robot control because of its preview capability and ability to handle constraints easily. The performance of the MPC widely depends on many parameters, among which the prediction horizon is the major contributor. Increasing the prediction horizon beyond a limit drastically increases the computation cost. Tuning the value of the prediction horizon can be very time-consuming, and the tuning process must be repeated for every task. Moreover, instead of using a fixed horizon for an entire task, a better balance between performance and computation cost can be established if different prediction horizons can be employed for every robot at each time step. Further, for such variable prediction horizon MPC for multiple robots, on-demand collision avoidance is the key requirement. We propose Versatile On-demand Collision Avoidance (VODCA) strategy to comply with the variable horizon model predictive control. We also present a framework for learning the prediction horizon for the multi-robot system as a function of the states of the robots using the Soft Actor-Critic (SAC) RL algorithm. The results are illustrated and validated numerically for different multi-robot tasks. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.02053 [pdf, other]

doi 10.1145/3617694.3623257

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

Authors: Abel Salinas, Parth Vipul Shah, Yuzhong Huang, Robert McCormack, Fred Morstatter

Abstract: Large Language Models (LLMs) have seen widespread deployment in various real-world applications. Understanding these biases is crucial to comprehend the potential downstream consequences when using LLMs to make decisions, particularly for historically disadvantaged groups. In this work, we propose a simple method for analyzing and comparing demographic bias in LLMs, through the lens of job recomme… ▽ More Large Language Models (LLMs) have seen widespread deployment in various real-world applications. Understanding these biases is crucial to comprehend the potential downstream consequences when using LLMs to make decisions, particularly for historically disadvantaged groups. In this work, we propose a simple method for analyzing and comparing demographic bias in LLMs, through the lens of job recommendations. We demonstrate the effectiveness of our method by measuring intersectional biases within ChatGPT and LLaMA, two cutting-edge LLMs. Our experiments primarily focus on uncovering gender identity and nationality bias; however, our method can be extended to examine biases associated with any intersection of demographic identities. We identify distinct biases in both models toward various demographic identities, such as both models consistently suggesting low-paying jobs for Mexican workers or preferring to recommend secretarial roles to women. Our study highlights the importance of measuring the bias of LLMs in downstream applications to understand the potential for harm and inequitable outcomes. △ Less

Submitted 9 January, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted to EAAMO 2023

arXiv:2307.12473 [pdf, other]

Adaptive RRI Selection Algorithms for Improved Cooperative Awareness in Decentralized NR-V2X

Authors: Avik Dayal, Vijay K. Shah, Harpreet S. Dhillon, Jeffrey H. Reed

Abstract: Decentralized vehicle-to-everything (V2X) networks (i.e., C-V2X Mode-4 and NR-V2X Mode-2) utilize sensing-based semi-persistent scheduling (SPS) where vehicles sense and reserve suitable radio resources for Basic Safety Message (BSM) transmissions at prespecified periodic intervals termed as Resource Reservation Interval (RRI). Vehicles rely on these received periodic BSMs to localize nearby (tran… ▽ More Decentralized vehicle-to-everything (V2X) networks (i.e., C-V2X Mode-4 and NR-V2X Mode-2) utilize sensing-based semi-persistent scheduling (SPS) where vehicles sense and reserve suitable radio resources for Basic Safety Message (BSM) transmissions at prespecified periodic intervals termed as Resource Reservation Interval (RRI). Vehicles rely on these received periodic BSMs to localize nearby (transmitting) vehicles and infrastructure, referred to as cooperative awareness. Cooperative awareness enables line of sight and non-line of sight localization, extending a vehicle's sensing and perception range. In this work, we first show that under high vehicle density scenarios, existing SPS (with prespecified RRIs) suffer from poor cooperative awareness, quantified as tracking error. Decentralized vehicle-to-everything (V2X) networks (i.e., C-V2X Mode-4 and NR-V2X Mode-2) utilize sensing-based semi-persistent scheduling (SPS) where vehicles sense and reserve suitable radio resources for Basic Safety Message (BSM) transmissions at prespecified periodic intervals termed as Resource Reservation Interval (RRI). Vehicles rely on these received periodic BSMs to localize nearby (transmitting) vehicles and infrastructure, referred to as cooperative awareness. Cooperative awareness enables line of sight and non-line of sight localization, extending a vehicle's sensing and perception range. In this work, we first show that under high vehicle density scenarios, existing SPS (with prespecified RRIs) suffer from poor cooperative awareness, quantified as tracking error. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2306.06084 [pdf]

doi 10.1109/M2VIP55626.2022.10041089

Machine Vision Using Cellphone Camera: A Comparison of deep networks for classifying three challenging denominations of Indian Coins

Authors: Keyur D. Joshi, Dhruv Shah, Varshil Shah, Nilay Gandhi, Sanket J. Shah, Sanket B. Shah

Abstract: Indian currency coins come in a variety of denominations. Off all the varieties Rs.1, RS.2, and Rs.5 have similar diameters. Majority of the coin styles in market circulation for denominations of Rs.1 and Rs.2 coins are nearly the same except for numerals on its reverse side. If a coin is resting on its obverse side, the correct denomination is not distinguishable by humans. Therefore, it was hypo… ▽ More Indian currency coins come in a variety of denominations. Off all the varieties Rs.1, RS.2, and Rs.5 have similar diameters. Majority of the coin styles in market circulation for denominations of Rs.1 and Rs.2 coins are nearly the same except for numerals on its reverse side. If a coin is resting on its obverse side, the correct denomination is not distinguishable by humans. Therefore, it was hypothesized that a digital image of a coin resting on its either size could be classified into its correct denomination by training a deep neural network model. The digital images were generated by using cheap cell phone cameras. To find the most suitable deep neural network architecture, four were selected based on the preliminary analysis carried out for comparison. The results confirm that two of the four deep neural network models can classify the correct denomination from either side of a coin with an accuracy of 97%. △ Less

Submitted 12 May, 2023; originally announced June 2023.

Comments: 6 Pages, 4 Figures, 6 Tables, Conference paper

arXiv:2305.13582 [pdf, other]

Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction

Authors: Yang Chen, Vedaant Shah, Alan Ritter

Abstract: Large language models (LLMs) combined with instruction tuning have shown significant progress in information extraction (IE) tasks, exhibiting strong generalization capabilities to unseen datasets by following annotation guidelines. However, their applicability to low-resource languages remains limited due to lack of both labeled data for fine-tuning, and unlabeled text for pre-training. In this p… ▽ More Large language models (LLMs) combined with instruction tuning have shown significant progress in information extraction (IE) tasks, exhibiting strong generalization capabilities to unseen datasets by following annotation guidelines. However, their applicability to low-resource languages remains limited due to lack of both labeled data for fine-tuning, and unlabeled text for pre-training. In this paper, we propose TransFusion, a framework in which models are fine-tuned to use English translations of low-resource language data, enabling more precise predictions through annotation fusion. Based on TransFusion, we introduce GoLLIE-TF, a cross-lingual instruction-tuned LLM for IE tasks, designed to close the performance gap between high and low-resource languages. Our experiments across twelve multilingual IE datasets spanning 50 languages demonstrate that GoLLIE-TF achieves better zero-shot cross-lingual transfer over the base model. In addition, we show that TransFusion significantly improves low-resource language named entity recognition when applied to proprietary models such as GPT-4 (+5 F1) with a prompting approach, or fine-tuning different language models including decoder-only (+14 F1) and encoder-only (+13 F1) architectures. △ Less

Submitted 20 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.11321 [pdf, other]

JoIN: Joint GANs Inversion for Intrinsic Image Decomposition

Authors: Viraj Shah, Svetlana Lazebnik, Julien Philip

Abstract: In this work, we propose to solve ill-posed inverse imaging problems using a bank of Generative Adversarial Networks (GAN) as a prior and apply our method to the case of Intrinsic Image Decomposition for faces and materials. Our method builds on the demonstrated success of GANs to capture complex image distributions. At the core of our approach is the idea that the latent space of a GAN is a well-… ▽ More In this work, we propose to solve ill-posed inverse imaging problems using a bank of Generative Adversarial Networks (GAN) as a prior and apply our method to the case of Intrinsic Image Decomposition for faces and materials. Our method builds on the demonstrated success of GANs to capture complex image distributions. At the core of our approach is the idea that the latent space of a GAN is a well-suited optimization domain to solve inverse problems. Given an input image, we propose to jointly inverse the latent codes of a set of GANs and combine their outputs to reproduce the input. Contrary to most GAN inversion methods which are limited to inverting only a single GAN, we demonstrate that it is possible to maintain distribution priors while inverting several GANs jointly. We show that our approach is modular, allowing various forward imaging models, and that it can successfully decompose both synthetic and real images. △ Less

Submitted 22 January, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

Comments: Project webpage is available at https://virajshah.com/join

arXiv:2304.14403 [pdf, other]

Make It So: Steering StyleGAN for Any Image Inversion and Editing

Authors: Anand Bhattad, Viraj Shah, Derek Hoiem, D. A. Forsyth

Abstract: StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables, but accurately mapping real-world images to their latent variables (GAN inversion) remains a challenge. Existing GAN inversion methods struggle to maintain editing directions and produce realistic results. To address these limitations, we propose Make It So, a novel GAN inversion met… ▽ More StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables, but accurately mapping real-world images to their latent variables (GAN inversion) remains a challenge. Existing GAN inversion methods struggle to maintain editing directions and produce realistic results. To address these limitations, we propose Make It So, a novel GAN inversion method that operates in the $\mathcal{Z}$ (noise) space rather than the typical $\mathcal{W}$ (latent style) space. Make It So preserves editing capabilities, even for out-of-domain images. This is a crucial property that was overlooked in prior methods. Our quantitative evaluations demonstrate that Make It So outperforms the state-of-the-art method PTI~\cite{roich2021pivotal} by a factor of five in inversion accuracy and achieves ten times better edit quality for complex indoor scenes. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: project: https://anandbhattad.github.io/makeitso/

arXiv:2304.02068 [pdf, other]

Battlefield Transfers in Coalitional Blotto Games

Authors: Vade Shah, Jason R. Marden

Abstract: In competitive resource allocation environments, agents often choose to form alliances; however, for some agents, doing so may not always be beneficial. Is there a method of forming alliances that always reward each of their members? We study this question using the framework of the coalitional Blotto game, in which two players compete against a common adversary by allocating their budgeted resour… ▽ More In competitive resource allocation environments, agents often choose to form alliances; however, for some agents, doing so may not always be beneficial. Is there a method of forming alliances that always reward each of their members? We study this question using the framework of the coalitional Blotto game, in which two players compete against a common adversary by allocating their budgeted resources across disjoint sets of valued battlefields. On any given battlefield, the agent that allocates a greater amount of resources wins the corresponding battlefield value. Existing work has shown the surprising result that in certain game instances, if one player donates a portion of their budget to the other player, then both players win larger amounts in their separate competitions against the adversary. However, this transfer-based method of alliance formation is not always mutually beneficial, which motivates the search for alternate strategies. In this vein, we study a new method of alliance formation referred to as a joint transfer, whereby players publicly transfer battlefields and budgets between one another before they engage in their separate competitions against the adversary. We show that in almost all game instances, there exists a mutually beneficial joint transfer that strictly increases the payoff of each player. △ Less

Submitted 16 October, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.06288 [pdf, ps, other]

Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs

Authors: Sepehr Assadi, Nirmit Joshi, Milind Prabhu, Vihan Shah

Abstract: Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any phi in (0,1], return a phi-quantile of S up to an eps error, i.e., return a phi'-quantile with phi'=phi +- eps. We are particularly interested in comparis… ▽ More Estimating quantiles, like the median or percentiles, is a fundamental task in data mining and data science. A (streaming) quantile summary is a data structure that can process a set S of n elements in a streaming fashion and at the end, for any phi in (0,1], return a phi-quantile of S up to an eps error, i.e., return a phi'-quantile with phi'=phi +- eps. We are particularly interested in comparison-based summaries that only compare elements of the universe under a total ordering and are otherwise completely oblivious of the universe. The best known deterministic quantile summary is the 20-year old Greenwald-Khanna (GK) summary that uses O((1/eps) log(eps n)) space [SIGMOD'01]. This bound was recently proved to be optimal for all deterministic comparison-based summaries by Cormode and Vesleý [PODS'20]. In this paper, we study weighted quantiles, a generalization of the quantiles problem, where each element arrives with a positive integer weight which denotes the number of copies of that element being inserted. The only known method of handling weighted inputs via GK summaries is the naive approach of breaking each weighted element into multiple unweighted items and feeding them one by one to the summary, which results in a prohibitively large update time (proportional to the maximum weight of input elements). We give the first non-trivial extension of GK summaries for weighted inputs and show that it takes O((1/eps) log(eps n)) space and O(log(1/eps)+ log log(eps n)) update time per element to process a stream of length n (under some quite mild assumptions on the range of weights and eps). En route to this, we also simplify the original GK summaries for unweighted quantiles. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 33 pages, 7 figures, International Conference on Database Theory 2023

arXiv:2303.03326 [pdf, other]

doi 10.1109/INFOCOMWKSHPS57453.2023.10226045

Keep It Simple: CNN Model Complexity Studies for Interference Classification Tasks

Authors: Taiwo Oyedare, Vijay K. Shah, Daniel J. Jakubisin, Jeffrey H. Reed

Abstract: The growing number of devices using the wireless spectrum makes it important to find ways to minimize interference and optimize the use of the spectrum. Deep learning models, such as convolutional neural networks (CNNs), have been widely utilized to identify, classify, or mitigate interference due to their ability to learn from the data directly. However, there have been limited research on the co… ▽ More The growing number of devices using the wireless spectrum makes it important to find ways to minimize interference and optimize the use of the spectrum. Deep learning models, such as convolutional neural networks (CNNs), have been widely utilized to identify, classify, or mitigate interference due to their ability to learn from the data directly. However, there have been limited research on the complexity of such deep learning models. The major focus of deep learning-based wireless classification literature has been on improving classification accuracy, often at the expense of model complexity. This may not be practical for many wireless devices, such as, internet of things (IoT) devices, which usually have very limited computational resources and cannot handle very complex models. Thus, it becomes important to account for model complexity when designing deep learning-based models for interference classification. To address this, we conduct an analysis of CNN based wireless classification that explores the trade-off amongst dataset size, CNN model complexity, and classification accuracy under various levels of classification difficulty: namely, interference classification, heterogeneous transmitter classification, and homogeneous transmitter classification. Our study, based on three wireless datasets, shows that a simpler CNN model with fewer parameters can perform just as well as a more complex model, providing important insights into the use of CNNs in computationally constrained applications. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 6 pages, 7 figures, 3 tables

arXiv:2301.13450 [pdf, other]

doi 10.1017/S0263574724000389

Learning Vision-based Robotic Manipulation Tasks Sequentially in Offline Reinforcement Learning Settings

Authors: Sudhir Pratap Yadav, Rajendra Nagar, Suril V. Shah

Abstract: With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online-RL does not suit itself readily into this paradigm due to costly and time-taking agent environment interaction. Therefore recently, many offline-RL algorithms have been proposed to learn robotic task… ▽ More With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online-RL does not suit itself readily into this paradigm due to costly and time-taking agent environment interaction. Therefore recently, many offline-RL algorithms have been proposed to learn robotic tasks. But mainly, all such methods focus on a single task or multi-task learning, which requires retraining every time we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep-RL would allow us to scale the number of tasks by keep adding them one-after-another. In this paper, we investigate the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyze the effect of task ordering. We also investigated the effect of the number of object configurations and density of robot trajectories. We found that learning tasks sequentially helps in the propagation of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation based approaches for continuous learning like the synaptic intelligence method although helps in mitigating catastrophic forgetting but has shown only limited transfer of knowledge from previous tasks. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Comments: 7 pages, 5 Figures

arXiv:2211.16047 [pdf, other]

Neural Feature-Adaptation for Symbolic Predictions Using Pre-Training and Semantic Loss

Authors: Vedant Shah, Aditya Agrawal, Lovekesh Vig, Ashwin Srinivasan, Gautam Shroff, Tanmay Verlekar

Abstract: We are interested in neurosymbolic systems consisting of a high-level symbolic layer for explainable prediction in terms of human-intelligible concepts; and a low-level neural layer for extracting symbols required to generate the symbolic explanation. Real data is often imperfect meaning that even if the symbolic theory remains unchanged, we may still need to address the problem of mapping raw dat… ▽ More We are interested in neurosymbolic systems consisting of a high-level symbolic layer for explainable prediction in terms of human-intelligible concepts; and a low-level neural layer for extracting symbols required to generate the symbolic explanation. Real data is often imperfect meaning that even if the symbolic theory remains unchanged, we may still need to address the problem of mapping raw data to high-level symbols, each time there is a change in the data acquisition environment or equipment. Manual (re-)annotation of the raw data each time this happens is laborious and expensive; and automated labelling methods are often imperfect, especially for complex problems. NEUROLOG proposed the use of a semantic loss function that allows an existing feature-based symbolic model to guide the extraction of feature-values from raw data, using `abduction'. However, the experiments demonstrating the use of semantic loss through abduction appear to rely heavily on a domain-specific pre-processing step that enables a prior delineation of feature locations in the raw data. We examine the use of semantic loss in domains where such pre-processing is not possible, or is not obvious. We show that without any prior information about the features, the NEUROLOG approach can continue to predict accurately even with substantially incorrect feature predictions. We show also that prior information about the features in the form of even imperfect pre-training can help correct this situation. These findings are replicated on the original problem considered by NEUROLOG, without the use of feature-delineation. This suggests that symbolic explanations constructed for data in a domain could be re-used in a related domain, by `feature-adaptation' of pre-trained neural extractors using the semantic loss function constrained by abductive feedback. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.04685 [pdf, ps, other]

Tight Bounds for Vertex Connectivity in Dynamic Streams

Authors: Sepehr Assadi, Vihan Shah

Abstract: We present a streaming algorithm for the vertex connectivity problem in dynamic streams with a (nearly) optimal space bound: for any $n$-vertex graph $G$ and any integer $k \geq 1$, our algorithm with high probability outputs whether or not $G$ is $k$-vertex-connected in a single pass using $\widetilde{O}(k n)$ space. Our upper bound matches the known $Ω(k n)$ lower bound for this problem even i… ▽ More We present a streaming algorithm for the vertex connectivity problem in dynamic streams with a (nearly) optimal space bound: for any $n$-vertex graph $G$ and any integer $k \geq 1$, our algorithm with high probability outputs whether or not $G$ is $k$-vertex-connected in a single pass using $\widetilde{O}(k n)$ space. Our upper bound matches the known $Ω(k n)$ lower bound for this problem even in insertion-only streams -- which we extend to multi-pass algorithms in this paper -- and closes one of the last remaining gaps in our understanding of dynamic versus insertion-only streams. Our result is obtained via a novel analysis of the previous best dynamic streaming algorithm of Guha, McGregor, and Tench [PODS 2015] who obtained an $\widetilde{O}(k^2 n)$ space algorithm for this problem. This also gives a model-independent algorithm for computing a "certificate" of $k$-vertex-connectivity as a union of $O(k^2\log{n})$ spanning forests, each on a random subset of $O(n/k)$ vertices, which may be of independent interest. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: Full version of the paper accepted to SOSA 2023. 15 pages, 3 Figures

arXiv:2210.10048 [pdf]

doi 10.1063/5.0134156

AnalogVNN: A fully modular framework for modeling and optimizing photonic neural networks

Authors: Vivswan Shah, Nathan Youngblood

Abstract: AnalogVNN, a simulation framework built on PyTorch which can simulate the effects of optoelectronic noise, limited precision, and signal normalization present in photonic neural network accelerators. We use this framework to train and optimize linear and convolutional neural networks with up to 9 layers and ~1.7 million parameters, while gaining insights into how normalization, activation function… ▽ More AnalogVNN, a simulation framework built on PyTorch which can simulate the effects of optoelectronic noise, limited precision, and signal normalization present in photonic neural network accelerators. We use this framework to train and optimize linear and convolutional neural networks with up to 9 layers and ~1.7 million parameters, while gaining insights into how normalization, activation function, reduced precision, and noise influence accuracy in analog photonic neural networks. By following the same layer structure design present in PyTorch, the AnalogVNN framework allows users to convert most digital neural network models to their analog counterparts with just a few lines of code, taking full advantage of the open-source optimization, deep learning, and GPU acceleration libraries available through PyTorch. Code is available at https://analogvnn.github.io △ Less

Submitted 6 June, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 28 pages; replace figure 6C; better format; updated links

Report number: 026116

Journal ref: APL Mach. Learn. 1 June 2023; 1 (2): 026116

arXiv:2210.08016 [pdf, other]

Prediction of drug effectiveness in rheumatoid arthritis patients based on machine learning algorithms

Authors: Shengjia Chen, Nikunj Gupta, Woodward B. Galbraith, Valay Shah, Jacopo Cirrone

Abstract: Rheumatoid arthritis (RA) is an autoimmune condition caused when patients' immune system mistakenly targets their own tissue. Machine learning (ML) has the potential to identify patterns in patient electronic health records (EHR) to forecast the best clinical treatment to improve patient outcomes. This study introduced a Drug Response Prediction (DRP) framework with two main goals: 1) design a dat… ▽ More Rheumatoid arthritis (RA) is an autoimmune condition caused when patients' immune system mistakenly targets their own tissue. Machine learning (ML) has the potential to identify patterns in patient electronic health records (EHR) to forecast the best clinical treatment to improve patient outcomes. This study introduced a Drug Response Prediction (DRP) framework with two main goals: 1) design a data processing pipeline to extract information from tabular clinical data, and then preprocess it for functional use, and 2) predict RA patient's responses to drugs and evaluate classification models' performance. We propose a novel two-stage ML framework based on European Alliance of Associations for Rheumatology (EULAR) criteria cutoffs to model drug effectiveness. Our model Stacked-Ensemble DRP was developed and cross-validated using data from 425 RA patients. The evaluation used a subset of 124 patients (30%) from the same data source. In the evaluation of the test set, two-stage DRP leads to improved classification accuracy over other end-to-end classification models for binary classification. Our proposed method provides a complete pipeline to predict disease activity scores and identify the group that does not respond well to anti-TNF treatments, thus showing promise in supporting clinical decisions based on EHR information. △ Less

Submitted 21 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 13 pages, 5 figures, to be published in ICBBE 2022

arXiv:2210.04120 [pdf, other]

MultiStyleGAN: Multiple One-shot Image Stylizations using a Single GAN

Authors: Viraj Shah, Ayush Sarkar, Sudharsan Krishnakumar Anitha, Svetlana Lazebnik

Abstract: Image stylization aims at applying a reference style to arbitrary input images. A common scenario is one-shot stylization, where only one example is available for each reference style. Recent approaches for one-shot stylization such as JoJoGAN fine-tune a pre-trained StyleGAN2 generator on a single style reference image. However, such methods cannot generate multiple stylizations without fine-tuni… ▽ More Image stylization aims at applying a reference style to arbitrary input images. A common scenario is one-shot stylization, where only one example is available for each reference style. Recent approaches for one-shot stylization such as JoJoGAN fine-tune a pre-trained StyleGAN2 generator on a single style reference image. However, such methods cannot generate multiple stylizations without fine-tuning a new model for each style separately. In this work, we present a MultiStyleGAN method that is capable of producing multiple different stylizations at once by fine-tuning a single generator. The key component of our method is a learnable transformation module called Style Transformation Network. It takes latent codes as input, and learns linear mappings to different regions of the latent space to produce distinct codes for each style, resulting in a multistyle space. Our model inherently mitigates overfitting since it is trained on multiple styles, hence improving the quality of stylizations. Our method can learn upwards of $120$ image stylizations at once, bringing $8\times$ to $60\times$ improvement in training time over recent competing methods. We support our results through user studies and quantitative results that indicate meaningful improvements over existing methods. △ Less

Submitted 20 April, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: Project webpage available at https://virajshah.com/multistyle

arXiv:2210.03022 [pdf, other]

Stateful active facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning

Authors: Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

Abstract: In cooperative multi-agent reinforcement learning, a team of agents works together to achieve a common goal. Different environments or tasks may require varying degrees of coordination among agents in order to achieve the goal in an optimal way. The nature of coordination will depend on the properties of the environment -- its spatial layout, distribution of obstacles, dynamics, etc. We term this… ▽ More In cooperative multi-agent reinforcement learning, a team of agents works together to achieve a common goal. Different environments or tasks may require varying degrees of coordination among agents in order to achieve the goal in an optimal way. The nature of coordination will depend on the properties of the environment -- its spatial layout, distribution of obstacles, dynamics, etc. We term this variation of properties within an environment as heterogeneity. Existing literature has not sufficiently addressed the fact that different environments may have different levels of heterogeneity. We formalize the notions of coordination level and heterogeneity level of an environment and present HECOGrid, a suite of multi-agent RL environments that facilitates empirical evaluation of different MARL approaches across different levels of coordination and environmental heterogeneity by providing a quantitative control over coordination and heterogeneity levels of the environment. Further, we propose a Centralized Training Decentralized Execution learning approach called Stateful Active Facilitator (SAF) that enables agents to work efficiently in high-coordination and high-heterogeneity environments through a differentiable and shared knowledge source used during training and dynamic selection from a shared pool of policies. We evaluate SAF and compare its performance against baselines IPPO and MAPPO on HECOGrid. Our results show that SAF consistently outperforms the baselines across different tasks and different heterogeneity and coordination levels. We release the code for HECOGrid as well as all our experiments. △ Less

Submitted 6 October, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: Published at ICLR 2023

arXiv:2209.08750 [pdf, other]

Knowledge-based Analogical Reasoning in Neuro-symbolic Latent Spaces

Authors: Vishwa Shah, Aditya Sharma, Gautam Shroff, Lovekesh Vig, Tirtharaj Dash, Ashwin Srinivasan

Abstract: Analogical Reasoning problems challenge both connectionist and symbolic AI systems as these entail a combination of background knowledge, reasoning and pattern recognition. While symbolic systems ingest explicit domain knowledge and perform deductive reasoning, they are sensitive to noise and require inputs be mapped to preset symbolic features. Connectionist systems on the other hand can directly… ▽ More Analogical Reasoning problems challenge both connectionist and symbolic AI systems as these entail a combination of background knowledge, reasoning and pattern recognition. While symbolic systems ingest explicit domain knowledge and perform deductive reasoning, they are sensitive to noise and require inputs be mapped to preset symbolic features. Connectionist systems on the other hand can directly ingest rich input spaces such as images, text or speech and recognize pattern even with noisy inputs. However, connectionist models struggle to include explicit domain knowledge for deductive reasoning. In this paper, we propose a framework that combines the pattern recognition abilities of neural networks with symbolic reasoning and background knowledge for solving a class of Analogical Reasoning problems where the set of attributes and possible relations across them are known apriori. We take inspiration from the 'neural algorithmic reasoning' approach [DeepMind 2020] and use problem-specific background knowledge by (i) learning a distributed representation based on a symbolic model of the problem (ii) training neural-network transformations reflective of the relations involved in the problem and finally (iii) training a neural network encoder from images to the distributed representation in (i). These three elements enable us to perform search-based reasoning using neural networks as elementary functions manipulating distributed representations. We test this on visual analogy problems in RAVENs Progressive Matrices, and achieve accuracy competitive with human performance and, in certain cases, superior to initial end-to-end neural-network based approaches. While recent neural models trained at scale yield SOTA, our novel neuro-symbolic reasoning approach is a promising direction for this problem, and is arguably more general, especially for problems where domain knowledge is available. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 13 pages, 4 figures, Accepted at 16th International Workshop on Neural-Symbolic Learning and Reasoning as part of the 2nd International Joint Conference on Learning & Reasoning (IJCLR 2022)

arXiv:2209.05623 [pdf, ps, other]

Space Optimal Vertex Cover in Dynamic Streams

Authors: Kheeran K. Naidu, Vihan Shah

Abstract: We optimally resolve the space complexity for the problem of finding an $α$-approximate minimum vertex cover ($α$MVC) in dynamic graph streams. We give a randomised algorithm for $α$MVC which uses $O(n^2/α^2)$ bits of space matching Dark and Konrad's lower bound [CCC 2020] up to constant factors. By computing a random greedy matching, we identify `easy' instances of the problem which can trivially… ▽ More We optimally resolve the space complexity for the problem of finding an $α$-approximate minimum vertex cover ($α$MVC) in dynamic graph streams. We give a randomised algorithm for $α$MVC which uses $O(n^2/α^2)$ bits of space matching Dark and Konrad's lower bound [CCC 2020] up to constant factors. By computing a random greedy matching, we identify `easy' instances of the problem which can trivially be solved by returning the entire vertex set. The remaining `hard' instances, then have sparse induced subgraphs which we exploit to get our space savings and solve $α$MVC. Achieving this type of optimality result is crucial for providing a complete understanding of a problem, and it has been gaining interest within the dynamic graph streaming community. For connectivity, Nelson and Yu [SODA 2019] improved the lower bound showing that $Ω(n \log^3 n)$ bits of space is necessary while Ahn, Guha, and McGregor [SODA 2012] have shown that $O(n \log^3 n)$ bits is sufficient. For finding an $α$-approximate maximum matching, the upper bound was improved by Assadi and Shah [ITCS 2022] showing that $O(n^2/α^3)$ bits is sufficient while Dark and Konrad [CCC 2020] have shown that $Ω(n^2/α^3)$ bits is necessary. The space complexity, however, remains unresolved for many other dynamic graph streaming problems where further improvements can still be made. \end{abstract} △ Less

Submitted 12 September, 2022; originally announced September 2022.

arXiv:2205.13178 [pdf, other]

Prototyping Next-Generation O-RAN Research Testbeds with SDRs

Authors: Pratheek S. Upadhyaya, Aly S. Abdalla, Vuk Marojevic, Jeffrey H. Reed, Vijay K. Shah

Abstract: Open RAN (O-RAN) defines an emerging cellular radio access network (RAN) architecture for future 6G wireless networks, emphasizing openness and intelligence which are considered the foundations of future 6G wireless networks. While the inherent complexity and flexibility of the RAN give rise to many new research problems, progress in developing solutions is hampered due to the lack of end-to-end,… ▽ More Open RAN (O-RAN) defines an emerging cellular radio access network (RAN) architecture for future 6G wireless networks, emphasizing openness and intelligence which are considered the foundations of future 6G wireless networks. While the inherent complexity and flexibility of the RAN give rise to many new research problems, progress in developing solutions is hampered due to the lack of end-to-end, fully developed platforms that can help in pursuing use cases in realistic environments. This has motivated the formation of open-source frameworks available to the wireless community. However, the rapid evolution of dedicated platforms and solutions utilizing various software-based technologies often leaves questions regarding the interoperability and interactions between the components in the framework. This article shows how to build a software-defined radio testbed featuring an open-source 5G system that can interact with the near-real-time (near-RT) RAN intelligent controller (RIC) of the O-RAN architecture through standard interfaces. We focus on the O-RAN E2 interface interactions and outline the procedure to enable a RAN system with E2 capabilities. We demonstrate the working of two xApps on the testbed with detailed E2 message exchange procedures and their role in controlling next-generation RANs. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: This manuscript has been submitted to IEEE Vehicular Technology Magazine for possible publication

arXiv:2205.10607 [pdf, other]

Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

Authors: Dianbo Liu, Vedant Shah, Oussama Boussif, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio

Abstract: In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one another. In this paper, we propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents' collective performance. To ensure that this… ▽ More In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one another. In this paper, we propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents' collective performance. To ensure that this facilitator does not become a centralized controller, agents are incentivized to reduce their dependence on the messages it conveys, and the messages can only influence the selection of a policy from a fixed set, not instantaneous actions given the policy. We demonstrate the strength of this architecture over existing baselines on several cooperative MARL environments. △ Less

Submitted 25 May, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

arXiv:2204.10743 [pdf, other]

An Evaluation of Intra-Transaction Parallelism in Actor-Relational Database Systems

Authors: Vivek Shah, Marcos Antonio Vaz Salles

Abstract: Over the past decade, we have witnessed a dramatic evolution in main-memory capacity and multi-core parallelism of server hardware. To leverage this hardware potential, multi-core in-memory OLTP database systems have been extensively re-designed. The core objective of this re-design has been scaling up sequential execution of OLTP transactions, wherein alternative database architectures have been… ▽ More Over the past decade, we have witnessed a dramatic evolution in main-memory capacity and multi-core parallelism of server hardware. To leverage this hardware potential, multi-core in-memory OLTP database systems have been extensively re-designed. The core objective of this re-design has been scaling up sequential execution of OLTP transactions, wherein alternative database architectures have been explored to eliminate system bottlenecks and overheads impeding inter-transaction parallelism to fully manifest. However, intra-transaction parallelism has been largely ignored by this previous work. We conjecture this situation to have developed because OLTP workloads are sometimes deemed to have far too little intra-transactional parallelism and, even when this kind of parallelism is available, program analyses to recognize it in arbitrary stored procedures are considered too brittle to be used as a general tool. Recently, however, a new concept of actor-relational database systems has been proposed, raising hopes that application developers can specify intra-transaction parallelism by modeling database applications as communicating actors. In this scenario, a natural question is whether an actor-relational database system with an asynchronous programming model can adequately expose the intra-transaction parallelism available in application logic to modern multi-core hardware. Towards that aim, we conduct in this paper an experimental evaluation of the factors that affect intra-transaction parallelism in the prototype actor-relational database system REACTDB, with an OLTP application designed with task-level parallelism in mind, and with the system running on a multi-core machine. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Showing 1–50 of 130 results for author: Shah, V