-
OMCAT: Omni Context Aware Transformer
Authors:
Arushi Goel,
Karan Sapra,
Matthieu Le,
Rafael Valle,
Andrew Tao,
Bryan Catanzaro
Abstract:
Large Language Models (LLMs) have made significant strides in text generation and comprehension, with recent advancements extending into multimodal LLMs that integrate visual and audio inputs. However, these models continue to struggle with fine-grained, cross-modal temporal understanding, particularly when correlating events across audio and video streams. We address these challenges with two key…
▽ More
Large Language Models (LLMs) have made significant strides in text generation and comprehension, with recent advancements extending into multimodal LLMs that integrate visual and audio inputs. However, these models continue to struggle with fine-grained, cross-modal temporal understanding, particularly when correlating events across audio and video streams. We address these challenges with two key contributions: a new dataset and model, called OCTAV and OMCAT respectively. OCTAV (Omni Context and Temporal Audio Video) is a novel dataset designed to capture event transitions across audio and video. Second, OMCAT (Omni Context Aware Transformer) is a powerful model that leverages RoTE (Rotary Time Embeddings), an innovative extension of RoPE, to enhance temporal grounding and computational efficiency in time-anchored tasks. Through a robust three-stage training pipeline-feature alignment, instruction tuning, and OCTAV-specific training-OMCAT excels in cross-modal temporal understanding. Our model demonstrates state-of-the-art performance on Audio-Visual Question Answering (AVQA) tasks and the OCTAV benchmark, showcasing significant gains in temporal reasoning and cross-modal alignment, as validated through comprehensive experiments and ablation studies. Our dataset and code will be made publicly available. The link to our demo page is https://om-cat.github.io.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Dark Patterns in the Opt-Out Process and Compliance with the California Consumer Privacy Act (CCPA)
Authors:
Van Hong Tran,
Aarushi Mehrotra,
Ranya Sharma,
Marshini Chetty,
Nick Feamster,
Jens Frankenreiter,
Lior Strahilevitz
Abstract:
To protect consumer privacy, the California Consumer Privacy Act (CCPA) mandates that businesses provide consumers with a straightforward way to opt out of the sale and sharing of their personal information. However, the control that businesses enjoy over the opt-out process allows them to impose hurdles on consumers aiming to opt out, including by employing dark patterns. Motivated by the enactme…
▽ More
To protect consumer privacy, the California Consumer Privacy Act (CCPA) mandates that businesses provide consumers with a straightforward way to opt out of the sale and sharing of their personal information. However, the control that businesses enjoy over the opt-out process allows them to impose hurdles on consumers aiming to opt out, including by employing dark patterns. Motivated by the enactment of the California Privacy Rights Act (CPRA), which strengthens the CCPA and explicitly forbids certain dark patterns in the opt-out process, we investigate how dark patterns are used in opt-out processes and assess their compliance with CCPA regulations. Our research reveals that websites employ a variety of dark patterns. Some of these patterns are explicitly prohibited under the CCPA; others evidently take advantage of legal loopholes. Despite the initial efforts to restrict dark patterns by policymakers, there is more work to be done.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Enumeration of groups in some special varieties of $A$-groups
Authors:
Arushi,
Geetha Venkataraman
Abstract:
We find an upper bound for the number of groups of order $n$ up to isomorphism in the variety $G = A_pA_qA_r$, where $p$, $q$ and $r$ are distinct primes. We also find a bound on the orders and on the number of conjugacy classes of subgroups that are maximal amongst the subgroups of the general linear group that are also in the variety $A_qA_r$.
We find an upper bound for the number of groups of order $n$ up to isomorphism in the variety $G = A_pA_qA_r$, where $p$, $q$ and $r$ are distinct primes. We also find a bound on the orders and on the number of conjugacy classes of subgroups that are maximal amongst the subgroups of the general linear group that are also in the variety $A_qA_r$.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Grand Unification at the Cosmological Collider with Chemical Potential
Authors:
Arushi Bodas,
Edward Broadberry,
Raman Sundrum
Abstract:
We introduce a tree-level chemical potential mechanism for spin-1 particles within cosmological collider physics, allowing them to be detected in primordial non-Gaussianities for masses above the inflationary Hubble scale. We apply this mechanism to orbifold grand unification and the massive unification partners of the standard model gauge bosons. Our mechanism requires at least a pair of massive…
▽ More
We introduce a tree-level chemical potential mechanism for spin-1 particles within cosmological collider physics, allowing them to be detected in primordial non-Gaussianities for masses above the inflationary Hubble scale. We apply this mechanism to orbifold grand unification and the massive unification partners of the standard model gauge bosons. Our mechanism requires at least a pair of massive vector fields which are singlets of the standard model, a condition which is satisfied in the classic "trinification" scenario. Assuming that the gauge hierarchy problem is solved by supersymmetry, gauge coupling running points to unification partners at ~ $10^{15}$ GeV. We show that, within high-scale inflation, chemical potential enhancement can lead to observably strong signals for trinification partners in future cosmological surveys.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Electron ptychography reveals a ferroelectricity dominated by anion displacements
Authors:
Harikrishnan K. P.,
Ruijuan Xu,
Kinnary Patel,
Kevin J. Crust,
Aarushi Khandelwal,
Chenyu Zhang,
Sergey Prosandeev,
Hua Zhou,
Yu-Tsun Shao,
Laurent Bellaiche,
Harold Y. Hwang,
David A. Muller
Abstract:
Sodium niobate, a lead-free ferroic material, hosts delicately-balanced, competing order parameters, including ferroelectric states that can be stabilized by epitaxial strain. Here, we show that the resulting macroscopic ferroelectricity exhibits an unconventional microscopic structure using multislice electron ptychography. This technique overcomes multiple scattering artifacts limiting conventio…
▽ More
Sodium niobate, a lead-free ferroic material, hosts delicately-balanced, competing order parameters, including ferroelectric states that can be stabilized by epitaxial strain. Here, we show that the resulting macroscopic ferroelectricity exhibits an unconventional microscopic structure using multislice electron ptychography. This technique overcomes multiple scattering artifacts limiting conventional electron microscopy, enabling both lateral spatial resolution beyond the diffraction limit and recovery of three-dimensional structural information. These imaging capabilities allow us to separate the ferroelectric interior of the sample from the relaxed surface structure and identify the soft phonon mode and related structural distortions with picometer precision. Unlike conventional ferroelectric perovskites, we find that the polar distortion in this material involves minimal distortions of the cation sublattices and is instead dominated by anion displacements. We establish limits on film thickness for interfacial octahedral rotation engineering and directly visualize an incommensurate octahedral rotation pattern, arising from the flat dispersion of the associated phonon mode.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
The Role of Privacy Guarantees in Voluntary Donation of Private Data for Altruistic Goals
Authors:
Ruizhe Wang,
Roberta De Viti,
Aarushi Dubey,
Elissa M. Redmiles
Abstract:
Voluntary donation of private information for altruistic purposes, such as advancing research, is common. However, concerns about data misuse and leakage may deter individuals from donating their information. While prior research has indicated that Privacy Enhancement Technologies (PETs) can alleviate these concerns, the extent to which these techniques influence willingness to donate data remains…
▽ More
Voluntary donation of private information for altruistic purposes, such as advancing research, is common. However, concerns about data misuse and leakage may deter individuals from donating their information. While prior research has indicated that Privacy Enhancement Technologies (PETs) can alleviate these concerns, the extent to which these techniques influence willingness to donate data remains unclear.
This study conducts a vignette survey (N=485) to examine people's willingness to donate medical data for developing new treatments under four privacy guarantees: data expiration, anonymization, use restriction, and access control. The study explores two mechanisms for verifying these guarantees: self-auditing and expert auditing, and evaluates the impact on two types of data recipient entities: for-profit and non-profit institutions.
Our findings reveal that the type of entity collecting data strongly influences respondents' privacy expectations, which in part influence their willingness to donate data. Respondents have such high expectations of the privacy provided by non-profit entities that explicitly stating the privacy protections provided makes little adjustment to those expectations. In contrast, statements about privacy bring respondents' expectations of the privacy provided by for-profit entities nearly in-line with non-profit expectations. We highlight the risks of these respective results as well as the need for future research to better align technical community and end-user perceptions about the effectiveness of auditing PETs and to effectively set expectations about the efficacy of PETs in the face of end-user concerns about data breaches.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
CASE: Efficient Curricular Data Pre-training for Building Assistive Psychology Expert Models
Authors:
Sarthak Harne,
Monjoy Narayan Choudhury,
Madhav Rao,
TK Srikanth,
Seema Mehrotra,
Apoorva Vashisht,
Aarushi Basu,
Manjit Sodhi
Abstract:
The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial…
▽ More
The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial challenge in this domain is data privacy and scarcity. To address this, we propose utilizing readily available curricular texts used in institutes specializing in mental health for pre-training the NLP pipelines. This helps us mimic the training process of a psychologist. Our work presents CASE-BERT that flags potential mental health disorders based on forum text. CASE-BERT demonstrates superior performance compared to existing methods, achieving an f1 score of 0.91 for Depression and 0.88 for Anxiety, two of the most commonly reported mental health disorders. Our code and data are publicly available.
△ Less
Submitted 2 October, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Machine Unlearning in Large Language Models
Authors:
Saaketh Koundinya Gundavarapu,
Shreya Agarwal,
Arushi Arora,
Chandana Thimmalapura Jagadeeshaiah
Abstract:
Machine unlearning, a novel area within artificial intelligence, focuses on addressing the challenge of selectively forgetting or reducing undesirable knowledge or behaviors in machine learning models, particularly in the context of large language models (LLMs). This paper introduces a methodology to align LLMs, such as Open Pre-trained Transformer Language Models, with ethical, privacy, and safet…
▽ More
Machine unlearning, a novel area within artificial intelligence, focuses on addressing the challenge of selectively forgetting or reducing undesirable knowledge or behaviors in machine learning models, particularly in the context of large language models (LLMs). This paper introduces a methodology to align LLMs, such as Open Pre-trained Transformer Language Models, with ethical, privacy, and safety standards by leveraging the gradient ascent algorithm for knowledge unlearning. Our approach aims to selectively erase or modify learned information in LLMs, targeting harmful responses and copyrighted content. This paper presents a dual-pronged approach to enhance the ethical and safe behavior of large language models (LLMs) by addressing the issues of harmful responses and copyrighted content. To mitigate harmful responses, we applied gradient ascent on the PKU dataset, achieving a 75\% reduction in harmful responses for Open Pre-trained Transformer Language Models (OPT1.3b and OPT2.7b) \citet{zhang2022opt} while retaining previous knowledge using the TruthfulQA dataset \citet{DBLP:journals/corr/abs-2109-07958}. For handling copyrighted content, we constructed a custom dataset based on the Lord of the Rings corpus and aligned LLMs (OPT1.3b and OPT2.7b) \citet{zhang2022opt} through LoRA: Low-Rank Adaptation of Large Language Models \citet{DBLP:journals/corr/abs-2106-09685} finetuning. Subsequently, we employed gradient ascent to unlearn the Lord of the Rings content, resulting in a remarkable reduction in the presence of copyrighted material. To maintain a diverse knowledge base, we utilized the Book Corpus dataset. Additionally, we propose a new evaluation technique for assessing the effectiveness of harmful unlearning.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
SmartFlow: Robotic Process Automation using LLMs
Authors:
Arushi Jain,
Shubham Paliwal,
Monika Sharma,
Lovekesh Vig,
Gautam Shroff
Abstract:
Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we p…
▽ More
Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we present SmartFlow, an AI-based RPA system that uses pre-trained large language models (LLMs) coupled with deep-learning based image understanding. Our system can adapt to new scenarios, including changes in the user interface and variations in input data, without the need for human intervention. SmartFlow uses computer vision and natural language processing to perceive visible elements on the graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate a sequence of actions that are executed by a scripting engine to complete an assigned task. To assess the effectiveness of SmartFlow, we have developed a dataset that includes a set of generic enterprise applications with diverse layouts, which we are releasing for research use. Our evaluations on this dataset demonstrate that SmartFlow exhibits robustness across different layouts and applications. SmartFlow can automate a wide range of business processes such as form filling, customer service, invoice processing, and back-office operations. SmartFlow can thus assist organizations in enhancing productivity by automating an even larger fraction of screen-based workflows. The demo-video and dataset are available at https://smartflow-4c5a0a.webflow.io/.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Multi-Subject Personalization
Authors:
Arushi Jain,
Shubham Paliwal,
Monika Sharma,
Vikram Jamwal,
Lovekesh Vig
Abstract:
Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to…
▽ More
Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to alleviate some of these challenges. We implement MSP using Stable Diffusion and assess our approach against other text-to-image models, showcasing its consistent generation of good-quality images representing intended subjects and interactions.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
CustomText: Customized Textual Image Generation using Diffusion Models
Authors:
Shubham Paliwal,
Arushi Jain,
Monika Sharma,
Vikram Jamwal,
Lovekesh Vig
Abstract:
Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes. In this paper, we aim to enhance the s…
▽ More
Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes. In this paper, we aim to enhance the synthesis of high-quality images with precise text customization, thereby contributing to the advancement of image generation models. We call our proposed method CustomText. Our implementation leverages a pre-trained TextDiffuser model to enable control over font color, background, and types. Additionally, to address the challenge of accurately rendering small-sized fonts, we train the ControlNet model for a consistency decoder, significantly enhancing text-generation performance. We assess the performance of CustomText in comparison to previous methods of textual image generation on the publicly available CTW-1500 dataset and a self-curated dataset for small-text generation, showcasing superior results.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Adaptive Exploration for Data-Efficient General Value Function Evaluations
Authors:
Arushi Jain,
Josiah P. Hanna,
Doina Precup
Abstract:
General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or pre-collected data often face data efficiency issues when learning multiple GVFs in parallel using off-policy methods. To address this, we introduce G…
▽ More
General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or pre-collected data often face data efficiency issues when learning multiple GVFs in parallel using off-policy methods. To address this, we introduce GVFExplorer, which adaptively learns a single behavior policy that efficiently collects data for evaluating multiple GVFs in parallel. Our method optimizes the behavior policy by minimizing the total variance in return across GVFs, thereby reducing the required environmental interactions. We use an existing temporal-difference-style variance estimator to approximate the return variance. We prove that each behavior policy update decreases the overall mean squared error in GVF predictions. We empirically show our method's performance in tabular and nonlinear function approximation settings, including Mujoco environments, with stationary and non-stationary reward signals, optimizing data usage and reducing prediction errors across multiple GVFs.
△ Less
Submitted 13 October, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)
Authors:
Saaketh Koundinya Gundavarapu,
Arushi Arora,
Shreya Agarwal
Abstract:
We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune…
▽ More
We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the capabilities of the original architecture and enables more versatile and context-aware object segmentation.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models
Authors:
Md Messal Monem Miah,
Ulie Schnaithmann,
Arushi Raghuvanshi,
Youngseo Son
Abstract:
Detecting dialogue breakdown in real time is critical for conversational AI systems, because it enables taking corrective action to successfully complete a task. In spoken dialog systems, this breakdown can be caused by a variety of unexpected situations including high levels of background noise, causing STT mistranscriptions, or unexpected user flows. In particular, industry settings like healthc…
▽ More
Detecting dialogue breakdown in real time is critical for conversational AI systems, because it enables taking corrective action to successfully complete a task. In spoken dialog systems, this breakdown can be caused by a variety of unexpected situations including high levels of background noise, causing STT mistranscriptions, or unexpected user flows. In particular, industry settings like healthcare, require high precision and high flexibility to navigate differently based on the conversation history and dialogue states. This makes it both more challenging and more critical to accurately detect dialog breakdown. To accurately detect breakdown, we found it requires processing audio inputs along with downstream NLP model inferences on transcribed text in real time. In this paper, we introduce a Multimodal Contextual Dialogue Breakdown (MultConDB) model. This model significantly outperforms other known best models by achieving an F1 of 69.27.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls
Authors:
Amin Hosseiny Marani,
Ulie Schnaithmann,
Youngseo Son,
Akil Iyer,
Manas Paldhe,
Arushi Raghuvanshi
Abstract:
Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds complexity in expansion and updates, increases processing time, and causes additive noise through the pipeline that can lead to incorrect next action prediction. This pa…
▽ More
Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds complexity in expansion and updates, increases processing time, and causes additive noise through the pipeline that can lead to incorrect next action prediction. This paper investigates graph integration into language transformers to improve understanding the relationships between humans' utterances, previous, and next actions without the dependency on external sources or components. Experimental analyses on real calls indicate that the proposed Graph Integrated Language Transformer models can achieve higher performance compared to other production level conversational AI systems in driving interactive calls with human users in real-world settings.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Audio Dialogues: Dialogues dataset for audio and music understanding
Authors:
Arushi Goel,
Zhifeng Kong,
Rafael Valle,
Bryan Catanzaro
Abstract:
Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dial…
▽ More
Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dialogues, Audio Dialogues also has question-answer pairs to understand and compare multiple input audios together. Audio Dialogues leverages a prompting-based approach and caption annotations from existing datasets to generate multi-turn dialogues using a Large Language Model (LLM). We evaluate existing audio-augmented large language models on our proposed dataset to demonstrate the complexity and applicability of Audio Dialogues. Our code for generating the dataset will be made publicly available. Detailed prompts and generated dialogues can be found on the demo website https://audiodialogues.github.io/.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Variational Optimization for Constructing Inverse Potentials of Proton-Proton Scattering: A Phase Function Method Study
Authors:
Lalit Kumar,
Arushi Sharma,
Anil Khachi,
Ayushi Awasthi,
O. S. K. S. Sastri
Abstract:
Background: The phase-shift analysis for proton-proton scattering has been studied by various research groups using the realistic potentials to be comprised of various internal interactions based on an exchange of pions and mesons, involving a large number of parameters. Purpose: The goal of the research is to construct inverse potentials for various l-channels of proton-proton (pp) elastic scatte…
▽ More
Background: The phase-shift analysis for proton-proton scattering has been studied by various research groups using the realistic potentials to be comprised of various internal interactions based on an exchange of pions and mesons, involving a large number of parameters. Purpose: The goal of the research is to construct inverse potentials for various l-channels of proton-proton (pp) elastic scattering using the 3-parameter Morse function in combination with atomic Hulthen by utilizing the phase function method and variational optimization technique. Methodology: The implementation of variational optimization begins with randomly assigning initial values to the Morse model parameters. Utilizing the Morse + Hulthen potential as input, the phase equations for various l-channels are numerically solved using the RK-5 method for obtaining the simulated Scattering Phase Shift (SPS). Mean Squared error between simulated and expected SPS has been chosen as the cost function. Variational optimization proceeds iteratively by adjusting potential parameters and re-evaluating the cost function until convergence is achieved. Results: All the obtained scattering phase shifts for various l-channels have been found to converge to a mean squared error <= 0.3. The computed cross-sections matched the experimental ones to less than 1% for energies up to 25 MeV. The scattering parameters are also found to closely match the experimental data. Conclusion: The inverse potentials constructed for various l-channels using Morse + atomic Hulthen are on par with the currently available high-precision realistic potentials.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Measuring Compliance with the California Consumer Privacy Act Over Space and Time
Authors:
Van Tran,
Aarushi Mehrotra,
Marshini Chetty,
Nick Feamster,
Jens Frankenreiter,
Lior Strahilevitz
Abstract:
The widespread sharing of consumers personal information with third parties raises significant privacy concerns. The California Consumer Privacy Act (CCPA) mandates that online businesses offer consumers the option to opt out of the sale and sharing of personal information. Our study automatically tracks the presence of the opt-out link longitudinally across multiple states after the California Pr…
▽ More
The widespread sharing of consumers personal information with third parties raises significant privacy concerns. The California Consumer Privacy Act (CCPA) mandates that online businesses offer consumers the option to opt out of the sale and sharing of personal information. Our study automatically tracks the presence of the opt-out link longitudinally across multiple states after the California Privacy Rights Act (CPRA) went into effect. We categorize websites based on whether they are subject to CCPA and investigate cases of potential non-compliance. We find a number of websites that implement the opt-out link early and across all examined states but also find a significant number of CCPA-subject websites that fail to offer any opt-out methods even when CCPA is in effect. Our findings can shed light on how websites are reacting to the CCPA and identify potential gaps in compliance and opt-out method designs that hinder consumers from exercising CCPA opt-out rights.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Authors:
Zhifeng Kong,
Arushi Goel,
Rohan Badlani,
Wei Ping,
Rafael Valle,
Bryan Catanzaro
Abstract:
Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) stro…
▽ More
Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Our demo website is https://audioflamingo.github.io/ and the code is open-sourced at https://github.com/NVIDIA/audio-flamingo.
△ Less
Submitted 28 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
A Closer Look in the Mirror: Reflections on the Matter/Dark Matter Coincidence
Authors:
Arushi Bodas,
Manuel A. Buen-Abad,
Anson Hook,
Raman Sundrum
Abstract:
We argue that the striking similarity between the cosmic abundances of baryons and dark matter, despite their very different astrophysical behavior, strongly motivates the scenario in which dark matter resides within a rich dark sector parallel in structure to that of the standard model. The near cosmic coincidence is then explained by an approximate $\mathbb{Z}_2$ exchange symmetry between the tw…
▽ More
We argue that the striking similarity between the cosmic abundances of baryons and dark matter, despite their very different astrophysical behavior, strongly motivates the scenario in which dark matter resides within a rich dark sector parallel in structure to that of the standard model. The near cosmic coincidence is then explained by an approximate $\mathbb{Z}_2$ exchange symmetry between the two sectors, where dark matter consists of stable dark neutrons, with matter and dark matter asymmetries arising via parallel WIMP baryogenesis mechanisms. Taking a top-down perspective, we point out that an adequate $\mathbb{Z}_2$ symmetry necessitates solving the electroweak hierarchy problem in each sector, without our committing to a specific implementation. A higher-dimensional realization in the far UV is presented, in which the hierarchical couplings of the two sectors and the requisite $\mathbb{Z}_2$-breaking structure arise naturally from extra-dimensional localization and gauge symmetries. We trace the cosmic history, paying attention to potential pitfalls not fully considered in previous literature. Residual $\mathbb{Z}_2$-breaking can very plausibly give rise to the asymmetric reheating of the two sectors, needed to keep the cosmological abundance of relativistic dark particles below tight bounds. We show that, despite the need to keep inter-sector couplings highly suppressed after asymmetric reheating, there can naturally be order-one couplings mediated by TeV scale particles which can allow experimental probes of the dark sector at high energy colliders. Massive mediators can also induce dark matter direct detection signals, but likely at or below the neutrino floor.
△ Less
Submitted 12 June, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Broken time-reversal symmetry in a new non-centrosymmetric superconductor Re8NbTa
Authors:
R. K. Kushwaha,
Arushi,
S. Sharma,
S. Srivastava,
P. K. Meena,
M. Pula,
J. Beare,
J. Gautreau,
A. D. Hillier,
G. M. Luke,
R. P. Singh
Abstract:
Re-based superconductors provide a rich platform for the study of unconventional superconductivity. We have investigated the superconducting properties of Re$_{8}$NbTa, a new noncentrosymmetric cubic ($α$-Mn structure) rhenium-based ternary superconductor using transport, magnetization, specific heat, and muon spin rotation/relaxation ($μ$SR) measurements. Specific heat and transverse field $μ$SR…
▽ More
Re-based superconductors provide a rich platform for the study of unconventional superconductivity. We have investigated the superconducting properties of Re$_{8}$NbTa, a new noncentrosymmetric cubic ($α$-Mn structure) rhenium-based ternary superconductor using transport, magnetization, specific heat, and muon spin rotation/relaxation ($μ$SR) measurements. Specific heat and transverse field $μ$SR measurements suggest moderately coupled fully gapped superconductivity, well described by BCS theory. However, our zero-field $μ$SR measurements reveal a small internal field onsetting around the superconducting T$_c$, indicating that the superconducting order parameter breaks the time-reversal symmetry.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Defect-driven tunable electronic and optical properties of two-dimensional silicon carbide
Authors:
Arushi Singh,
Vikram Mahamiya,
Alok Shukla
Abstract:
Recently, an atomic-scale two-dimensional silicon carbide monolayer has been synthesized {[}Polley \emph{et al., }Phys. Rev. Lett. \textbf{130},076203 (2023){]} which opens up new possibilities for developing next-generation electronic and optoelectronic devices. Our study predicts the pristine SiC monolayer to have an ``indirect'' band gap of 3.38 eV $(K\rightarrow M)$ and a ``direct'' band gap o…
▽ More
Recently, an atomic-scale two-dimensional silicon carbide monolayer has been synthesized {[}Polley \emph{et al., }Phys. Rev. Lett. \textbf{130},076203 (2023){]} which opens up new possibilities for developing next-generation electronic and optoelectronic devices. Our study predicts the pristine SiC monolayer to have an ``indirect'' band gap of 3.38 eV $(K\rightarrow M)$ and a ``direct'' band gap of 3.43 eV $(K\rightarrow K)$ calculated using the HSE06 functional. We performed a detailed investigation of the various possible defects (i.e., vacancies, foreign impurities, antisites, and their various combinations) on the structural stability, electronic, and optical properties of the SiC monolayer using a first-principles based density-functional theory (DFT) and molecular dynamics (MD) simulations. A number of physical quantities such as the formation energy, electronic band gap, and the effective masses of charge carriers, have been calculated. We report that the SiC monolayer has a very low formation energy of 0.57 eV and can be stabilized on TaC \{111\} film by performing the surface slab energy and interfacial adhesion energy calculations. Nitrogen doping is predicted to be the most favorable defect in silicon carbide monolayer due to its very low formation energy, indicating high thermodynamic stability. An interesting transition from semiconducting to metallic state is observed for $N_{C}$ and $Al_{Si}$ defective systems. For the pristine SiC monolayer, we find that the conduction band is nearly flat in the $M\rightarrow K$ direction, leading to a high effective mass of $3.48m_{o}$. A significant red shift in the absorption edge, as well as the occurrence of additional absorption peaks due to the defects, have been observed in the lower energy range of the spectrum.
△ Less
Submitted 21 December, 2023; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Ce$_{2}$Ir$_{3}$Ga$_{5}$ : a new locally non-centrosymmetric heavy fermion system
Authors:
Arushi,
Raul Cardoso-Gil,
Christoph Geibel
Abstract:
Recently, a new type of unconventional superconductivity with a field-induced transition between two different superconducting (SC) states was discovered in the heavy fermion system CeRh$_{2}$As$_{2}$. This unusual SC state was proposed to be based on specific symmetries of the underlying structure, i.e., a globally centrosymmetric layered structure, but where the Ce-layers themselves lack inversi…
▽ More
Recently, a new type of unconventional superconductivity with a field-induced transition between two different superconducting (SC) states was discovered in the heavy fermion system CeRh$_{2}$As$_{2}$. This unusual SC state was proposed to be based on specific symmetries of the underlying structure, i.e., a globally centrosymmetric layered structure, but where the Ce-layers themselves lack inversion symmetry. This new type of SC state has attracted strong interest, prompting the search for further heavy fermion systems crystallizing in structures with appropriate symmetries. We report the discovery and the study of a new Ce-based heavy fermion system with a globally centrosymmetric structure but without inversion symmetry on the Ce-site, Ce$_{2}$Ir$_{3}$Ga$_{5}$. A single crystal X-ray diffraction study revealed an orthorhombic U$_{2}$Co$_{3}$Si$_{5}$ type structure. Resistivity, specific heat, and magnetization measurements indicate a moderate-heavy fermion behavior with a Kondo energy scale of the order of 40 K. Most experimental results suggest the absence of magnetic order, but a tiny anomaly in the specific heat opens the possibility for a very weak, itinerant type of ordering.
△ Less
Submitted 1 December, 2023; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Authors:
Georgios Tziafas,
Yucheng Xu,
Arushi Goel,
Mohammadreza Kasaei,
Zhibin Li,
Hamidreza Kasaei
Abstract:
Robots operating in human-centric environments require the integration of visual grounding and grasping capabilities to effectively manipulate objects based on user instructions. This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes. Existing approaches often employ multi-stage pipelines that firs…
▽ More
Robots operating in human-centric environments require the integration of visual grounding and grasping capabilities to effectively manipulate objects based on user instructions. This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes. Existing approaches often employ multi-stage pipelines that first segment the referred object and then propose a suitable grasp, and are evaluated in private datasets or simulators that do not capture the complexity of natural indoor scenes. To address these limitations, we develop a challenging benchmark based on cluttered indoor scenes from OCID dataset, for which we generate referring expressions and connect them with 4-DoF grasp poses. Further, we propose a novel end-to-end model (CROG) that leverages the visual grounding capabilities of CLIP to learn grasp synthesis directly from image-text pairs. Our results show that vanilla integration of CLIP with pretrained models transfers poorly in our challenging benchmark, while CROG achieves significant improvements both in terms of grounding and grasping. Extensive robot experiments in both simulation and hardware demonstrate the effectiveness of our approach in challenging interactive object grasping scenarios that include clutter.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Photometry of the Didymos system across the DART impact apparition
Authors:
Nicholas Moskovitz,
Cristina Thomas,
Petr Pravec,
Tim Lister,
Tom Polakis,
David Osip,
Theodore Kareta,
Agata Rożek,
Steven R. Chesley,
Shantanu P. Naidu,
Peter Scheirich,
William Ryan,
Eileen Ryan,
Brian Skiff,
Colin Snodgrass,
Matthew M. Knight,
Andrew S. Rivkin,
Nancy L. Chabot,
Vova Ayvazian,
Irina Belskaya,
Zouhair Benkhaldoun,
Daniel N. Berteşteanu,
Mariangela Bonavita,
Terrence H. Bressi,
Melissa J. Brucker
, et al. (56 additional authors not shown)
Abstract:
On 26 September 2022, the Double Asteroid Redirection Test (DART) spacecraft impacted Dimorphos, the satellite of binary near-Earth asteroid (65803) Didymos. This demonstrated the efficacy of a kinetic impactor for planetary defense by changing the orbital period of Dimorphos by 33 minutes (Thomas et al. 2023). Measuring the period change relied heavily on a coordinated campaign of lightcurve phot…
▽ More
On 26 September 2022, the Double Asteroid Redirection Test (DART) spacecraft impacted Dimorphos, the satellite of binary near-Earth asteroid (65803) Didymos. This demonstrated the efficacy of a kinetic impactor for planetary defense by changing the orbital period of Dimorphos by 33 minutes (Thomas et al. 2023). Measuring the period change relied heavily on a coordinated campaign of lightcurve photometry designed to detect mutual events (occultations and eclipses) as a direct probe of the satellite's orbital period. A total of 28 telescopes contributed 224 individual lightcurves during the impact apparition from July 2022 to February 2023. We focus here on decomposable lightcurves, i.e. those from which mutual events could be extracted. We describe our process of lightcurve decomposition and use that to release the full data set for future analysis. We leverage these data to place constraints on the post-impact evolution of ejecta. The measured depths of mutual events relative to models showed that the ejecta became optically thin within the first ~1 day after impact, and then faded with a decay time of about 25 days. The bulk magnitude of the system showed that ejecta no longer contributed measurable brightness enhancement after about 20 days post-impact. This bulk photometric behavior was not well represented by an HG photometric model. An HG1G2 model did fit the data well across a wide range of phase angles. Lastly, we note the presence of an ejecta tail through at least March 2023. Its persistence implied ongoing escape of ejecta from the system many months after DART impact.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Superconducting Properties of Topological Semimetal 1$T$-RhSeTe
Authors:
C. Patra,
T. Agarwal,
Arushi,
P. Manna,
N. Bhatt,
R. S. Singh,
R. P. Singh
Abstract:
Platinum-group transition-metal dichalcogenides have emerged as a subject of considerable interest in condensed matter physics due to their remarkable topological properties and unconventional superconducting behavior. In this study, we report the synthesis and superconducting characteristics of a new Dirac-type topological semimetallic compound 1$T$-RhSeTe. It shows type-II superconductivity with…
▽ More
Platinum-group transition-metal dichalcogenides have emerged as a subject of considerable interest in condensed matter physics due to their remarkable topological properties and unconventional superconducting behavior. In this study, we report the synthesis and superconducting characteristics of a new Dirac-type topological semimetallic compound 1$T$-RhSeTe. It shows type-II superconductivity with a superconducting transition temperature of 4.72 K and a high upper critical field. The coexistence of superconductivity and topological properties makes it a prime candidate for hosting topological superconductivity.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Authors:
Dingli Yu,
Simran Kaur,
Arushi Gupta,
Jonah Brown-Cohen,
Anirudh Goyal,
Sanjeev Arora
Abstract:
With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023).
This…
▽ More
With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023).
This work introduces Skill-Mix, a new evaluation to measure ability to combine skills. Using a list of $N$ skills the evaluator repeatedly picks random subsets of $k$ skills and asks the LLM to produce text combining that subset of skills. Since the number of subsets grows like $N^k$, for even modest $k$ this evaluation will, with high probability, require the LLM to produce text significantly different from any text in the training set. The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model.
Administering a version of to popular chatbots gave results that, while generally in line with prior expectations, contained surprises. Sizeable differences exist among model capabilities that are not captured by their ranking on popular LLM leaderboards ("cramming for the leaderboard"). Furthermore, simple probability calculations indicate that GPT-4's reasonable performance on $k=5$ is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training.
We sketch how the methodology can lead to a Skill-Mix based eco-system of open evaluations for AI capabilities of future models.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Semi-supervised multimodal coreference resolution in image narrations
Authors:
Arushi Goel,
Basura Fernando,
Frank Keller,
Hakan Bilen
Abstract:
In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised a…
▽ More
In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised approach that utilizes image-narration pairs to resolve coreferences and narrative grounding in a multimodal context. Our approach incorporates losses for both labeled and unlabeled data within a cross-modal framework. Our evaluation shows that the proposed approach outperforms strong baselines both quantitatively and qualitatively, for the tasks of coreference resolution and narrative grounding.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning
Authors:
Arushi Sharma,
Abhibha Gupta,
Maneesh Bilalpur
Abstract:
To advance argumentative stance prediction as a multimodal problem, the First Shared Task in Multimodal Argument Mining hosted stance prediction in crucial social topics of gun control and abortion. Our exploratory study attempts to evaluate the necessity of images for stance prediction in tweets and compare out-of-the-box text-based large-language models (LLM) in few-shot settings against fine-tu…
▽ More
To advance argumentative stance prediction as a multimodal problem, the First Shared Task in Multimodal Argument Mining hosted stance prediction in crucial social topics of gun control and abortion. Our exploratory study attempts to evaluate the necessity of images for stance prediction in tweets and compare out-of-the-box text-based large-language models (LLM) in few-shot settings against fine-tuned unimodal and multimodal models. Our work suggests an ensemble of fine-tuned text-based language models (0.817 F1-score) outperforms both the multimodal (0.677 F1-score) and text-based few-shot prediction using a recent state-of-the-art LLM (0.550 F1-score). In addition to the differences in performance, our findings suggest that the multimodal models tend to perform better when image content is summarized as natural language over their native pixel structure and, using in-context examples improves few-shot performance of LLMs.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
A real-time, hardware agnostic framework for close-up branch reconstruction using RGB data
Authors:
Alexander You,
Aarushi Mehta,
Luke Strohbehn,
Jochen Hemming,
Cindy Grimm,
Joseph R. Davidson
Abstract:
Creating accurate 3D models of tree topology is an important task for tree pruning. The 3D model is used to decide which branches to prune and then to execute the pruning cuts. Previous methods for creating 3D tree models have typically relied on point clouds, which are often computationally expensive to process and can suffer from data defects, especially with thin branches. In this paper, we pro…
▽ More
Creating accurate 3D models of tree topology is an important task for tree pruning. The 3D model is used to decide which branches to prune and then to execute the pruning cuts. Previous methods for creating 3D tree models have typically relied on point clouds, which are often computationally expensive to process and can suffer from data defects, especially with thin branches. In this paper, we propose a method for actively scanning along a primary tree branch, detecting secondary branches to be pruned, and reconstructing their 3D geometry using just an RGB camera mounted on a robot arm. We experimentally validate that our setup is able to produce primary branch models with 4-5 mm accuracy and secondary branch models with 15 degrees orientation accuracy with respect to the ground truth model. Our framework is real-time and can run up to 10 cm/s with no loss in model accuracy or ability to detect secondary branches.
△ Less
Submitted 18 June, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Speeding up charge exchange recombination spectroscopy analysis in support of NERSC/DIII-D realtime workflow
Authors:
Aarushi Jain,
Laurie Stephey,
Erik Linsenmayer,
Colin Chrystal,
Jonathan Dursi,
Hannah Ross
Abstract:
We report optimization work made in support of the development of a realtime Superfacility workflow between DIII-D and NERSC. At DIII-D, the ion properties measured by charge exchange recombination (CER) spectroscopy are required inputs for a Superfacility realtime workflow that computes the full plasma kinetic equilibrium. In this workflow, minutes matter since the results must be ready during th…
▽ More
We report optimization work made in support of the development of a realtime Superfacility workflow between DIII-D and NERSC. At DIII-D, the ion properties measured by charge exchange recombination (CER) spectroscopy are required inputs for a Superfacility realtime workflow that computes the full plasma kinetic equilibrium. In this workflow, minutes matter since the results must be ready during the brief 10-15 minute pause between plasma discharges. Prior to this work, a sample CERFIT analysis took approximately 15 minutes. Because the problem consists of many calculations that can be done independently, we were able to restructure the CERFIT code to leverage this parallelism with Slurm job arrays. We reduced the runtime to approximately 51 seconds -- a speedup of roughly 20x, saving valuable time for both the scientists interested in the CER results and also for the larger equilibrium reconstruction workflow.
△ Less
Submitted 18 September, 2023; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Constructing Inverse Scattering Potentials for α-α System using Reference Potential Approach
Authors:
O. S. K. S. Sastri,
Arushi Sharma,
Ayushi Awasthi
Abstract:
Background: An accurate way to incorporate long range Coulomb interaction alongside short-range nuclear interaction has been a challenge for theoretical physicists. Purpose: In this paper, we propose a methodology based on the reference potential approach for constructing inverse potentials of alpha-alpha scattering. Methods: Two smoothly joined Morse potentials, regular for short-range nuclear in…
▽ More
Background: An accurate way to incorporate long range Coulomb interaction alongside short-range nuclear interaction has been a challenge for theoretical physicists. Purpose: In this paper, we propose a methodology based on the reference potential approach for constructing inverse potentials of alpha-alpha scattering. Methods: Two smoothly joined Morse potentials, regular for short-range nuclear interaction and inverted for long range Coulomb, are used in tandem as a reference potential in the phase function method to obtain the scattering phase shifts for the S, D and G states of alpha-alpha scattering. The model parameters are optimized by choosing to minimize the mean absolute percentage error between the obtained and experimental scattering phase shift values. Results: The constructed inverse potentials for S, D and G states have resulted in mean absolute percentage errors of 0.8, 0.5, and 0.4 respectively. The obtained resonances for D and G states closely match the experimental ones. Conclusion: The reference potential approach using a combination of smoothly joined Morse functions is successful in accurately accounting for the Coulomb interaction between charged particles in nuclear scattering studies.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Discrimination of Radiologists Utilizing Eye-Tracking Technology and Machine Learning: A Case Study
Authors:
Stanford Martinez,
Carolina Ramirez-Tamayo,
Syed Hasib Akhter Faruqui,
Kal L. Clark,
Adel Alaeddini,
Nicholas Czarnek,
Aarushi Aggarwal,
Sahra Emamzadeh,
Jeffrey R. Mock,
Edward J. Golob
Abstract:
Perception-related errors comprise most diagnostic mistakes in radiology. To mitigate this problem, radiologists employ personalized and high-dimensional visual search strategies, otherwise known as search patterns. Qualitative descriptions of these search patterns, which involve the physician verbalizing or annotating the order he/she analyzes the image, can be unreliable due to discrepancies in…
▽ More
Perception-related errors comprise most diagnostic mistakes in radiology. To mitigate this problem, radiologists employ personalized and high-dimensional visual search strategies, otherwise known as search patterns. Qualitative descriptions of these search patterns, which involve the physician verbalizing or annotating the order he/she analyzes the image, can be unreliable due to discrepancies in what is reported versus the actual visual patterns. This discrepancy can interfere with quality improvement interventions and negatively impact patient care. This study presents a novel discretized feature encoding based on spatiotemporal binning of fixation data for efficient geometric alignment and temporal ordering of eye movement when reading chest X-rays. The encoded features of the eye-fixation data are employed by machine learning classifiers to discriminate between faculty and trainee radiologists. We include a clinical trial case study utilizing the Area Under the Curve (AUC), Accuracy, F1, Sensitivity, and Specificity metrics for class separability to evaluate the discriminability between the two subjects in regard to their level of experience. We then compare the classification performance to state-of-the-art methodologies. A repeatability experiment using a separate dataset, experimental protocol, and eye tracker was also performed using eight subjects to evaluate the robustness of the proposed approach. The numerical results from both experiments demonstrate that classifiers employing the proposed feature encoding methods outperform the current state-of-the-art in differentiating between radiologists in terms of experience level. This signifies the potential impact of the proposed method for identifying radiologists' level of expertise and those who would benefit from additional training.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?
Authors:
Albert Yu Sun,
Eliott Zemour,
Arushi Saxena,
Udith Vaidyanathan,
Eric Lin,
Christian Lau,
Vaikkunth Mugunthan
Abstract:
Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memoriza…
▽ More
Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at: https://github.com/albertsun1/gpt3-pii-attacks
△ Less
Submitted 15 April, 2024; v1 submitted 30 July, 2023;
originally announced July 2023.
-
A Coronal Mass Ejection Source Region Catalogue and their Associated Properties
Authors:
Satabdwa Majumdar,
Ritesh Patel,
Vaibhav Pant,
Dipankar Banerjee,
Aarushi Rawat,
Abhas Pradhan,
Paritosh Singh
Abstract:
The primary objective of this study is to connect the coronal mass ejections (CMEs) to their source regions, primarily creating a CME source region (CSR) catalogue, and secondly probing into the influence the source regions have on different statistical properties of CMEs. We create a source region catalogue for 3327 CMEs from 1998 to 2017, thus capturing the different phases of cycle 23 and 24. T…
▽ More
The primary objective of this study is to connect the coronal mass ejections (CMEs) to their source regions, primarily creating a CME source region (CSR) catalogue, and secondly probing into the influence the source regions have on different statistical properties of CMEs. We create a source region catalogue for 3327 CMEs from 1998 to 2017, thus capturing the different phases of cycle 23 and 24. The identified source regions are segregated into 3 classes, Active Regions (ARs), Prominence Eruptions (PEs) and Active Prominences (APs), while the CMEs are segregated into slow and fast based on their average projected speeds. We find the contribution of these three source region types to the occurrences of slow and fast CMEs to be different in the above period. A study of the distribution of average speeds reveals different power-laws for CMEs originating from different sources, and the power-law is different during the different phases of cycles 23 and 24. A study of statistical latitudinal deflections showed equator-ward deflections, while the magnitude of deflections again bears an imprint of the source regions. An East-West asymmetry is also noted, particularly in the rising phase of cycle 23, with the presence of active longitudes for the CMEs, with a preference towards the Western part of the Sun. Our results show that different aspects of CME kinematics bear a strong imprint of the source regions they originate from, thus indicating the existence of different ejection and/or propagation mechanisms of these CMEs.
△ Less
Submitted 26 October, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
Authors:
Thomas Mensink,
Jasper Uijlings,
Lluis Castrejon,
Arushi Goel,
Felipe Cadar,
Howard Zhou,
Fei Sha,
André Araujo,
Vittorio Ferrari
Abstract:
We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi…
▽ More
We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evidence to support each answer. Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13.0% accuracy on our dataset. Moreover, we experimentally show that progress on answering our encyclopedic questions can be achieved by augmenting large models with a mechanism that retrieves relevant information from the knowledge base. An oracle experiment with perfect retrieval achieves 87.0% accuracy on the single-hop portion of our dataset, and an automatic retrieval-augmented prototype yields 48.8%. We believe that our dataset enables future research on retrieval-augmented vision+language models. It is available at https://github.com/google-research/google-research/tree/master/encyclopedic_vqa .
△ Less
Submitted 24 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Leveraging Explicit Procedural Instructions for Data-Efficient Action Prediction
Authors:
Julia White,
Arushi Raghuvanshi,
Yada Pruksachatkun
Abstract:
Task-oriented dialogues often require agents to enact complex, multi-step procedures in order to meet user requests. While large language models have found success automating these dialogues in constrained environments, their widespread deployment is limited by the substantial quantities of task-specific data required for training. The following paper presents a data-efficient solution to construc…
▽ More
Task-oriented dialogues often require agents to enact complex, multi-step procedures in order to meet user requests. While large language models have found success automating these dialogues in constrained environments, their widespread deployment is limited by the substantial quantities of task-specific data required for training. The following paper presents a data-efficient solution to constructing dialogue systems, leveraging explicit instructions derived from agent guidelines, such as company policies or customer service manuals. Our proposed Knowledge-Augmented Dialogue System (KADS) combines a large language model with a knowledge retrieval module that pulls documents outlining relevant procedures from a predefined set of policies, given a user-agent interaction. To train this system, we introduce a semi-supervised pre-training scheme that employs dialogue-document matching and action-oriented masked language modeling with partial parameter freezing. We evaluate the effectiveness of our approach on prominent task-oriented dialogue datasets, Action-Based Conversations Dataset and Schema-Guided Dialogue, for two dialogue tasks: action state tracking and workflow discovery. Our results demonstrate that procedural knowledge augmentation improves accuracy predicting in- and out-of-distribution actions while preserving high performance in settings with low or sparse data.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Online Nonstochastic Model-Free Reinforcement Learning
Authors:
Udaya Ghai,
Arushi Gupta,
Wenhan Xia,
Karan Singh,
Elad Hazan
Abstract:
We investigate robust model-free reinforcement learning algorithms designed for environments that may be dynamic or even adversarial. Traditional state-based policies often struggle to accommodate the challenges imposed by the presence of unmodeled disturbances in such settings. Moreover, optimizing linear state-based policies pose an obstacle for efficient optimization, leading to nonconvex objec…
▽ More
We investigate robust model-free reinforcement learning algorithms designed for environments that may be dynamic or even adversarial. Traditional state-based policies often struggle to accommodate the challenges imposed by the presence of unmodeled disturbances in such settings. Moreover, optimizing linear state-based policies pose an obstacle for efficient optimization, leading to nonconvex objectives, even in benign environments like linear dynamical systems.
Drawing inspiration from recent advancements in model-based control, we introduce a novel class of policies centered on disturbance signals. We define several categories of these signals, which we term pseudo-disturbances, and develop corresponding policy classes based on them. We provide efficient and practical algorithms for optimizing these policies.
Next, we examine the task of online adaptation of reinforcement learning agents in the face of adversarial disturbances. Our methods seamlessly integrate with any black-box model-free approach, yielding provable regret guarantees when dealing with linear dynamics. These regret guarantees unconditionally improve the best-known results for bandit linear control in having no dependence on the state-space dimension. We evaluate our method over various standard RL benchmarks and demonstrate improved robustness.
△ Less
Submitted 31 October, 2023; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Superconducting properties of new hexagonal and noncentrosymmetric cubic high entropy alloys
Authors:
K. Motla,
Arushi,
S. Jangid,
P. Meena,
R. K. Kushwaha,
R. P. Singh
Abstract:
Superconducting high-entropy alloys (HEAs) are a newly burgeoning field of unconventional superconductors and raise intriguing questions about the presence of superconductivity in highly disordered systems, which lack regular phonon modes. In our study, we have synthesized and investigated the superconducting characteristics of two new transition elements based HEAs Re$_{0.35} $Os$_{0.35} $Mo…
▽ More
Superconducting high-entropy alloys (HEAs) are a newly burgeoning field of unconventional superconductors and raise intriguing questions about the presence of superconductivity in highly disordered systems, which lack regular phonon modes. In our study, we have synthesized and investigated the superconducting characteristics of two new transition elements based HEAs Re$_{0.35} $Os$_{0.35} $Mo$_{0.08} $W$_{0.10} $Zr$_{0.12}$ (ReOMWZ) crystallizing in noncentrosymmetric $α$-Mn structure, and Ru$_{0.35} $Os$_{0.35} $Mo$_{0.10} $W$_{0.10} $Zr$_{0.10}$ (RuOMWZ) crystallizing hexagonal closed-packed structure (hcp). Transition metal-based hexagonal hcp HEA is rare and highly desirable for practical applications due to their high hardness. Bulk magnetization, resistivity, and specific heat measurements confirmed bulk type-II superconductivity in both alloys. Specific heat analysis up to the measured low-temperature range suffices for a BCS explanation. Comparable upper critical fields with the Pauli paramagnetic limit suggest the possibility of unconventional superconductivity in both HEAs.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Redundancy and Concept Analysis for Code-trained Language Models
Authors:
Arushi Sharma,
Zefu Hu,
Christopher Quinn,
Ali Jannesari
Abstract:
Code-trained language models have proven to be highly effective for various code intelligence tasks. However, they can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints. Implementing effective strategies to address these issues requires a better understanding of these 'black box' models. In this paper, we perform t…
▽ More
Code-trained language models have proven to be highly effective for various code intelligence tasks. However, they can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints. Implementing effective strategies to address these issues requires a better understanding of these 'black box' models. In this paper, we perform the first neuron-level analysis for source code models to identify \textit{important} neurons within latent representations. We achieve this by eliminating neurons that are highly similar or irrelevant to the given task. This approach helps us understand which neurons and layers can be eliminated (redundancy analysis) and where important code properties are located within the network (concept analysis). Using redundancy analysis, we make observations relevant to knowledge transfer and model optimization applications. We find that over 95\% of the neurons are redundant with respect to our code intelligence tasks and can be eliminated without significant loss in accuracy. We also discover several subsets of neurons that can make predictions with baseline accuracy. Through concept analysis, we explore the traceability and distribution of human-recognizable concepts within latent code representations which could be used to influence model predictions. We trace individual and subsets of important neurons to specific code properties and identify 'number' neurons, 'string' neurons, and higher-level 'text' neurons for token-level tasks and higher-level concepts important for sentence-level downstream tasks. This also helps us understand how decomposable and transferable task-related features are and can help devise better techniques for transfer learning, model compression, and the decomposition of deep neural networks into modules.
△ Less
Submitted 15 February, 2024; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Phase Stability of Hexagonal/cubic Boron Nitride Nanocomposites
Authors:
Abhijit Biswas,
Rui Xu,
Joyce Christiansen-Salameh,
Eugene Jeong,
Gustavo A. Alvarez,
Chenxi Li,
Anand B. Puthirath,
Bin Gao,
Arushi Garg,
Tia Gray,
Harikishan Kannan,
Xiang Zhang,
Jacob Elkins,
Tymofii S. Pieshkov,
Robert Vajtai,
A. Glen Birdwell,
Mahesh R. Neupane,
Bradford B. Pate,
Tony Ivanov,
Elias J. Garratt,
Pengcheng Dai,
Hanyu Zhu,
Zhiting Tian,
Pulickel M. Ajayan
Abstract:
Boron nitride (BN) is an exceptional material and among its polymorphs, two-dimensional (2D) hexagonal and three-dimensional (3D) cubic BN (h-BN and c-BN) phases are most common. The phase stability regimes of these BN phases are still under debate and phase transformations of h-BN/c-BN remain a topic of interest. Here, we investigate the phase stability of 2D/3D h-BN/c-BN nanocomposites and show…
▽ More
Boron nitride (BN) is an exceptional material and among its polymorphs, two-dimensional (2D) hexagonal and three-dimensional (3D) cubic BN (h-BN and c-BN) phases are most common. The phase stability regimes of these BN phases are still under debate and phase transformations of h-BN/c-BN remain a topic of interest. Here, we investigate the phase stability of 2D/3D h-BN/c-BN nanocomposites and show that the co-existence of two phases can lead to strong non-linear optical properties and low thermal conductivity at room temperature. Furthermore, spark-plasma sintering of the nanocomposite shows complete phase transformation to 2D h-BN with improved crystalline quality, where 3D c-BN grain sizes governs the nucleation and growth kinetics. Our demonstration might be insightful in phase engineering of BN polymorphs based nanocomposites with desirable properties for optoelectronics and thermal energy management applications.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection
Authors:
Arushi Rai,
Adriana Kovashka
Abstract:
The use of large-scale vision-language datasets is limited for object detection due to the negative impact of label noise on localization. Prior methods have shown how such large-scale datasets can be used for pretraining, which can provide initial signal for localization, but is insufficient without clean bounding-box data for at least some categories. We propose a technique to "vet" labels extra…
▽ More
The use of large-scale vision-language datasets is limited for object detection due to the negative impact of label noise on localization. Prior methods have shown how such large-scale datasets can be used for pretraining, which can provide initial signal for localization, but is insufficient without clean bounding-box data for at least some categories. We propose a technique to "vet" labels extracted from noisy captions, and use them for weakly-supervised object detection (WSOD), without any bounding boxes. We analyze and annotate the types of label noise in captions in our Caption Label Noise dataset, and train a classifier that predicts if an extracted label is actually present in the image or not. Our classifier generalizes across dataset boundaries and across categories. We compare the classifier to nine baselines on five datasets, and demonstrate that it can improve WSOD without label vetting by 30% (31.2 to 40.5 mAP when evaluated on PASCAL VOC). See dataset at: https://github.com/arushirai1/CLaNDataset.
△ Less
Submitted 10 March, 2024; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Controllable Video Generation by Learning the Underlying Dynamical System with Neural ODE
Authors:
Yucheng Xu,
Li Nanbo,
Arushi Goel,
Zijian Guo,
Zonghai Yao,
Hamidreza Kasaei,
Mohammadreze Kasaei,
Zhibin Li
Abstract:
Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework le…
▽ More
Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework leverages the ability of Neural Ordinary Differential Equations~(Neural ODEs) to represent complex dynamical systems as a set of nonlinear ordinary differential equations. The resulting framework is capable of generating videos with both desired dynamics and content. Experiments demonstrate the ability of the proposed method in generating highly controllable and visually consistent videos, and its capability of modeling dynamical systems. Overall, this work is a significant step towards developing advanced controllable video generation models that can handle complex and dynamic scenes.
△ Less
Submitted 4 April, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Superconducting ground state study of valence skip compound AgSnSe$_2$
Authors:
A. Kataria,
Arushi,
S. Sharma,
T. Agarwal,
M. Pula,
J. Beare,
S. Yoon,
Y. Cai,
K. M. Kojima,
G. M. Luke,
R. P. Singh
Abstract:
The valence-skipped superconductors are natural candidates for unconventional superconductivity, as they can exhibit a negative effective, attractive interaction for electron-pairing. This work reports comprehensive XRD, magnetization, specific heat and muon spin rotation and relaxation measurements ($μ$SR) on a valence-skipped compound: AgSnSe$_2$. The temperature dependence of the electronic spe…
▽ More
The valence-skipped superconductors are natural candidates for unconventional superconductivity, as they can exhibit a negative effective, attractive interaction for electron-pairing. This work reports comprehensive XRD, magnetization, specific heat and muon spin rotation and relaxation measurements ($μ$SR) on a valence-skipped compound: AgSnSe$_2$. The temperature dependence of the electronic specific heat ($C_{el}(T)$) and of the upper critical field ($H_{c2}(T)$) provide evidence of two-gap superconductivity, which is also confirmed by our transverse-field $μ$SR measurements. Our zero-field $μ$SR measurements suggest preserved time-reversal symmetry in the superconducting ground state of AgSnSe$_2$.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Graph Contrastive Learning for Multi-omics Data
Authors:
Nishant Rajadhyaksha,
Aarushi Chitkara
Abstract:
Advancements in technologies related to working with omics data require novel computation methods to fully leverage information and help develop a better understanding of human diseases. This paper studies the effects of introducing graph contrastive learning to help leverage graph structure and information to produce better representations for downstream classification tasks for multi-omics datas…
▽ More
Advancements in technologies related to working with omics data require novel computation methods to fully leverage information and help develop a better understanding of human diseases. This paper studies the effects of introducing graph contrastive learning to help leverage graph structure and information to produce better representations for downstream classification tasks for multi-omics datasets. We present a learnining framework named Multi-Omics Graph Contrastive Learner(MOGCL) which outperforms several aproaches for integrating multi-omics data for supervised learning tasks. We show that pre-training graph models with a contrastive methodology along with fine-tuning it in a supervised manner is an efficient strategy for multi-omics data classification.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Who are you referring to? Coreference resolution in image narrations
Authors:
Arushi Goel,
Basura Fernando,
Frank Keller,
Hakan Bilen
Abstract:
Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. First we introduce a new dataset with annotated coreference chains and their bounding boxes, as most existing image-text datasets only contain short sentence…
▽ More
Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. First we introduce a new dataset with annotated coreference chains and their bounding boxes, as most existing image-text datasets only contain short sentences without coreferring expressions or labeled chains. We propose a new technique that learns to identify coreference chains using weak supervision, only from image-text pairs and a regularization using prior linguistic knowledge. Our model yields large performance gains over several strong baselines in resolving coreferences. We also show that coreference resolution helps improving grounding narratives in images.
△ Less
Submitted 17 March, 2023; v1 submitted 26 November, 2022;
originally announced November 2022.
-
Large Primordial Fluctuations in Gravitational Waves from Phase Transitions
Authors:
Arushi Bodas,
Raman Sundrum
Abstract:
It is well-known that first order phase transitions in the early universe can be a powerful source of observable stochastic gravitational wave backgrounds. Any such gravitational wave background must exhibit large-scale anisotropies at least as large as those seen in the CMB $\sim 10^{-5}$, providing a valuable new window onto the (inflationary) origins of primordial fluctuations. While significan…
▽ More
It is well-known that first order phase transitions in the early universe can be a powerful source of observable stochastic gravitational wave backgrounds. Any such gravitational wave background must exhibit large-scale anisotropies at least as large as those seen in the CMB $\sim 10^{-5}$, providing a valuable new window onto the (inflationary) origins of primordial fluctuations. While significantly larger fractional anisotropies are possible (for example, in multi-field inflation) and would be easier to interpret, it has been argued that these can only be consistent with CMB bounds if the gravitational wave signal is correspondingly smaller. In this paper, we show that this argument, which relies on assuming radiation dominance of the very early universe, can be evaded if there is an era of early matter dominance of a certain robust type. This allows large gravitational wave anisotropies to be consistent with observable signals at proposed future gravitational wave detectors. Constraints from the CMB on large scales, as well as primordial black hole and mini-cluster formation on small scales, and secondary scalar-induced gravitational waves are all taken into account.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
Authors:
Arushi Gupta,
Nikunj Saunshi,
Dingli Yu,
Kaifeng Lyu,
Sanjeev Arora
Abstract:
Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net. Evaluations of saliency methods convert this heat map into a new {\em masked input} by retaining the $k$ highest-ranked pixels of the original input and replacing the rest with \textquotedblleft uninformative\textquotedblright\ pixels, and checking if th…
▽ More
Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net. Evaluations of saliency methods convert this heat map into a new {\em masked input} by retaining the $k$ highest-ranked pixels of the original input and replacing the rest with \textquotedblleft uninformative\textquotedblright\ pixels, and checking if the net's output is mostly unchanged. This is usually seen as an {\em explanation} of the output, but the current paper highlights reasons why this inference of causality may be suspect. Inspired by logic concepts of {\em completeness \& soundness}, it observes that the above type of evaluation focuses on completeness of the explanation, but ignores soundness. New evaluation metrics are introduced to capture both notions, while staying in an {\em intrinsic} framework -- i.e., using the dataset and the net, but no separately trained nets, human evaluations, etc. A simple saliency method is described that matches or outperforms prior methods in the evaluations. Experiments also suggest new intrinsic justifications, based on soundness, for popular heuristic tricks such as TV regularization and upsampling.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
What can we learn about a generated image corrupting its latent representation?
Authors:
Agnieszka Tomczak,
Aarushi Gupta,
Slobodan Ilic,
Nassir Navab,
Shadi Albarqouni
Abstract:
Generative adversarial networks (GANs) offer an effective solution to the image-to-image translation problem, thereby allowing for new possibilities in medical imaging. They can translate images from one imaging modality to another at a low cost. For unpaired datasets, they rely mostly on cycle loss. Despite its effectiveness in learning the underlying data distribution, it can lead to a discrepan…
▽ More
Generative adversarial networks (GANs) offer an effective solution to the image-to-image translation problem, thereby allowing for new possibilities in medical imaging. They can translate images from one imaging modality to another at a low cost. For unpaired datasets, they rely mostly on cycle loss. Despite its effectiveness in learning the underlying data distribution, it can lead to a discrepancy between input and output data. The purpose of this work is to investigate the hypothesis that we can predict image quality based on its latent representation in the GANs bottleneck. We achieve this by corrupting the latent representation with noise and generating multiple outputs. The degree of differences between them is interpreted as the strength of the representation: the more robust the latent representation, the fewer changes in the output image the corruption causes. Our results demonstrate that our proposed method has the ability to i) predict uncertain parts of synthesized images, and ii) identify samples that may not be reliable for downstream tasks, e.g., liver segmentation task.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Understanding Influence Functions and Datamodels via Harmonic Analysis
Authors:
Nikunj Saunshi,
Arushi Gupta,
Mark Braverman,
Sanjeev Arora
Abstract:
Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the ef…
▽ More
Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the effect of training points on outputs on test data. The current paper seeks to provide a better theoretical understanding of such interesting empirical phenomena. The primary tool is harmonic analysis and the idea of noise stability. Contributions include: (a) Exact characterization of the learnt datamodel in terms of Fourier coefficients. (b) An efficient method to estimate the residual error and quality of the optimum linear datamodel without having to train the datamodel. (c) New insights into when influences of groups of datapoints may or may not add up linearly.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.