-
Natural GaLore: Accelerating GaLore for memory-efficient LLM Training and Fine-tuning
Authors:
Arijit Das
Abstract:
Training LLMs presents significant memory challenges due to growing size of data, weights, and optimizer states. Techniques such as data and model parallelism, gradient checkpointing, and offloading strategies address this issue but are often infeasible due to hardware constraints. To mitigate memory usage, alternative methods like Parameter-Efficient-Fine-Tuning (PEFT) and GaLore approximate weig…
▽ More
Training LLMs presents significant memory challenges due to growing size of data, weights, and optimizer states. Techniques such as data and model parallelism, gradient checkpointing, and offloading strategies address this issue but are often infeasible due to hardware constraints. To mitigate memory usage, alternative methods like Parameter-Efficient-Fine-Tuning (PEFT) and GaLore approximate weights or optimizer states. PEFT methods, such as LoRA, have gained popularity for fine-tuning LLMs, though they require a full-rank warm start. In contrast, GaLore allows full-parameter learning while being more memory-efficient. This work introduces Natural GaLore, a simple drop in replacement for AdamW, which efficiently applies the inverse Empirical Fisher Information Matrix to low-rank gradients using Woodbury's Identity. We demonstrate that incorporating second-order information speeds up optimization significantly, especially when the iteration budget is limited. Empirical pretraining on 60M, 130M, 350M, and 1.1B parameter Llama models on C4 data demonstrate significantly lower perplexity over GaLore without additional memory overhead. By fine-tuning RoBERTa on the GLUE benchmark using Natural GaLore, we demonstrate significant reduction in gap 86.05% vs 86.28% for full-finetuning. Furthermore, fine-tuning the TinyLlama 1.1B model for function calling using the TinyAgent framework shows that Natural GaLore achieving 83.09% accuracy on the TinyAgent dataset, significantly outperforms 16-bit LoRA at 80.06% and even surpasses GPT4-Turbo by 4%, all while using 30% less memory.
All code to reproduce the results are available at: https://github.com/selfsupervised-ai/Natural-GaLore.git
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Authors:
Suhita Ghosh,
Melanie Jouaiti,
Arnab Das,
Yamini Sinha,
Tim Polzehl,
Ingo Siegert,
Sebastian Stober
Abstract:
Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content. Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains, which is essential for remote health monitoring. To address this gap, we propose a voice conversion-based method (DDSP-QbE) using differentiable digita…
▽ More
Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content. Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains, which is essential for remote health monitoring. To address this gap, we propose a voice conversion-based method (DDSP-QbE) using differentiable digital signal processing and query-by-example. The proposed method, trained with novel losses, aids in disentangling linguistic, prosodic, and domain representations, enabling the model to adapt to uncommon speech patterns. Objective and subjective evaluations show that DDSP-QbE significantly outperforms the voice conversion state-of-the-art concerning intelligibility, prosody, and domain preservation across diverse datasets, pathologies, and speakers while maintaining quality and speaker anonymity. Experts validate domain preservation by analysing twelve clinically pertinent domain attributes.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold
Authors:
Sahil Verma,
Royi Rassin,
Arnav Das,
Gantavya Bhatt,
Preethi Seshadri,
Chirag Shah,
Jeff Bilmes,
Hannaneh Hajishirzi,
Yanai Elazar
Abstract:
Text-to-image models are trained using large datasets collected by scraping image-text pairs from the internet. These datasets often include private, copyrighted, and licensed material. Training models on such datasets enables them to generate images with such content, which might violate copyright laws and individual privacy. This phenomenon is termed imitation -- generation of images with conten…
▽ More
Text-to-image models are trained using large datasets collected by scraping image-text pairs from the internet. These datasets often include private, copyrighted, and licensed material. Training models on such datasets enables them to generate images with such content, which might violate copyright laws and individual privacy. This phenomenon is termed imitation -- generation of images with content that has recognizable similarity to its training images. In this work we study the relationship between a concept's frequency in the training dataset and the ability of a model to imitate it. We seek to determine the point at which a model was trained on enough instances to imitate a concept -- the imitation threshold. We posit this question as a new problem: Finding the Imitation Threshold (FIT) and propose an efficient approach that estimates the imitation threshold without incurring the colossal cost of training multiple models from scratch. We experiment with two domains -- human faces and art styles -- for which we create four datasets, and evaluate three text-to-image models which were trained on two pretraining datasets. Our results reveal that the imitation threshold of these models is in the range of 200-600 images, depending on the domain and the model. The imitation threshold can provide an empirical basis for copyright violation claims and acts as a guiding principle for text-to-image model developers that aim to comply with copyright and privacy laws. We release the code and data at \url{https://github.com/vsahil/MIMETIC-2.git} and the project's website is hosted at \url{https://how-many-van-goghs-does-it-take.github.io}.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Authors:
Genta Indra Winata,
Frederikus Hudi,
Patrick Amadeus Irawan,
David Anugraha,
Rifki Afina Putri,
Yutong Wang,
Adam Nohejl,
Ubaidillah Ariq Prathama,
Nedjma Ousidhoum,
Afifa Amriani,
Anar Rzayev,
Anirban Das,
Ashmari Pramodya,
Aulia Adila,
Bryan Wilie,
Candy Olivia Mawalim,
Ching Lam Cheng,
Daud Abolade,
Emmanuele Chersoni,
Enrico Santus,
Fariz Ikhwantri,
Garry Kuwanto,
Hanyang Zhao,
Haryo Akbarianto Wibowo,
Holy Lovenia
, et al. (26 additional authors not shown)
Abstract:
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering…
▽ More
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Depth Estimation From Monocular Images With Enhanced Encoder-Decoder Architecture
Authors:
Dabbrata Das,
Argho Deb Das,
Farhan Sadaf
Abstract:
Estimating depth from a single 2D image is a challenging task because of the need for stereo or multi-view data, which normally provides depth information. This paper deals with this challenge by introducing a novel deep learning-based approach using an encoder-decoder architecture, where the Inception-ResNet-v2 model is utilized as the encoder. According to the available literature, this is the f…
▽ More
Estimating depth from a single 2D image is a challenging task because of the need for stereo or multi-view data, which normally provides depth information. This paper deals with this challenge by introducing a novel deep learning-based approach using an encoder-decoder architecture, where the Inception-ResNet-v2 model is utilized as the encoder. According to the available literature, this is the first instance of using Inception-ResNet-v2 as an encoder for monocular depth estimation, illustrating better performance than previous models. The use of Inception-ResNet-v2 enables our model to capture complex objects and fine-grained details effectively that are generally difficult to predict. Besides, our model incorporates multi-scale feature extraction to enhance depth prediction accuracy across different kinds of object sizes and distances. We propose a composite loss function consisting of depth loss, gradient edge loss, and SSIM loss, where the weights are fine-tuned to optimize the weighted sum, ensuring better balance across different aspects of depth estimation. Experimental results on the NYU Depth V2 dataset show that our model achieves state-of-the-art performance, with an ARE of 0.064, RMSE of 0.228, and accuracy ($δ$ $<1.25$) of 89.3%. These metrics demonstrate that our model effectively predicts depth, even in challenging circumstances, providing a scalable solution for real-world applications in robotics, 3D reconstruction, and augmented reality.
△ Less
Submitted 16 October, 2024; v1 submitted 15 October, 2024;
originally announced October 2024.
-
Dynamical freezing in the thermodynamic limit: the strongly driven ensemble
Authors:
Asmi Haldar,
Anirban Das,
Sagnik Chaudhuri,
Luke Staszewski,
Alexander Wietek,
Frank Pollmann,
Roderich Moessner,
Arnab Das
Abstract:
The ergodicity postulate, a foundational pillar of Gibbsian statistical mechanics predicts that a periodically driven (Floquet) system in the absence of any conservation law heats to a featureless `infinite temperature' state. Here, we find--for a clean and interacting generic spin chain subject to a {\it strong} driving field--that this can be prevented by the emergence of {\it approximate but st…
▽ More
The ergodicity postulate, a foundational pillar of Gibbsian statistical mechanics predicts that a periodically driven (Floquet) system in the absence of any conservation law heats to a featureless `infinite temperature' state. Here, we find--for a clean and interacting generic spin chain subject to a {\it strong} driving field--that this can be prevented by the emergence of {\it approximate but stable} conservation-laws not present in the undriven system. We identify their origin: they do not necessarily owe their stability to familiar protections by symmetry, topology, disorder, or even high energy costs. We show numerically, {\it in the thermodynamic limit,} that when required by these emergent conservation-laws, the entanglement-entropy density of an infinite subsystem remains zero over our entire simulation time of several decades in natural units. We further provide a recipe for designing such conservation laws with high accuracy. Finally, we present an ensemble description, which we call the strongly driven ensemble incorporating these constraints. This provides a way to control many-body chaos through stable Floquet-engineering. Strong signatures of these conservation-laws should be experimentally accessible since they manifest in all length and time scales. Variants of the spin model we have used, have already been realized using Rydberg-dressed atoms.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Overview of Factify5WQA: Fact Verification through 5W Question-Answering
Authors:
Suryavardan Suresh,
Anku Rani,
Parth Patwa,
Aishwarya Reganti,
Vinija Jain,
Aman Chadha,
Amitava Das,
Amit Sheth,
Asif Ekbal
Abstract:
Researchers have found that fake news spreads much times faster than real news. This is a major problem, especially in today's world where social media is the key source of news for many among the younger population. Fact verification, thus, becomes an important task and many media sites contribute to the cause. Manual fact verification is a tedious task, given the volume of fake news online. The…
▽ More
Researchers have found that fake news spreads much times faster than real news. This is a major problem, especially in today's world where social media is the key source of news for many among the younger population. Fact verification, thus, becomes an important task and many media sites contribute to the cause. Manual fact verification is a tedious task, given the volume of fake news online. The Factify5WQA shared task aims to increase research towards automated fake news detection by providing a dataset with an aspect-based question answering based fact verification method. Each claim and its supporting document is associated with 5W questions that help compare the two information sources. The objective performance measure in the task is done by comparing answers using BLEU score to measure the accuracy of the answers, followed by an accuracy measure of the classification. The task had submissions using custom training setup and pre-trained language-models among others. The best performing team posted an accuracy of 69.56%, which is a near 35% improvement over the baseline.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Authors:
Hanyang Zhao,
Genta Indra Winata,
Anirban Das,
Shi-Xiong Zhang,
David D. Yao,
Wenpin Tang,
Sambit Sahu
Abstract:
Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding regarding the contributions of their additional components. Moreover, fair and consistent comparisons are scarce, making it difficult to discern whic…
▽ More
Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding regarding the contributions of their additional components. Moreover, fair and consistent comparisons are scarce, making it difficult to discern which components genuinely enhance downstream performance. In this work, we propose RainbowPO, a unified framework that demystifies the effectiveness of existing DPO methods by categorizing their key components into seven broad directions. We integrate these components into a single cohesive objective, enhancing the performance of each individual element. Through extensive experiments, we demonstrate that RainbowPO outperforms existing DPO variants. Additionally, we provide insights to guide researchers in developing new DPO methods and assist practitioners in their implementations.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Kubo formula for spin hydrodynamics: spin chemical potential as leading order in gradient expansion
Authors:
Sourav Dey,
Arpan Das
Abstract:
We present a first-order dissipative spin hydrodynamic framework, where the spin chemical potential $ω^{μν}$ is treated as the leading term in the hydrodynamic gradient expansion, i.e., $ω^{μν}\sim \mathcal{O}(1)$. We argue that for the consistency of the theoretical framework, the energy-momentum tensor needs to be symmetric at least up to order $\mathcal{O}(\partial)$. We consider the phenomenol…
▽ More
We present a first-order dissipative spin hydrodynamic framework, where the spin chemical potential $ω^{μν}$ is treated as the leading term in the hydrodynamic gradient expansion, i.e., $ω^{μν}\sim \mathcal{O}(1)$. We argue that for the consistency of the theoretical framework, the energy-momentum tensor needs to be symmetric at least up to order $\mathcal{O}(\partial)$. We consider the phenomenological form of the spin tensor, where it is anti-symmetric in the last two indices only. A comprehensive analysis of spin hydrodynamics is conducted using both macroscopic entropy current analysis and microscopic Kubo formalism, establishing consistency between the two approaches. A key finding is the entropy production resulting from spin-orbit coupling, which alters the traditional equivalence between the Landau and Eckart fluid frames. Additionally, we identify cross-diffusion effects, where vector dissipative currents are influenced by gradients of both spin chemical potential and chemical potential corresponding to the conserved charge through off-diagonal transport coefficients. Two distinct methods for decomposing the spin tensor are proposed, and their equivalence is demonstrated through Kubo relations.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Volume growth functions of complete Riemannian manifolds with positive scalar curvature
Authors:
Anushree Das,
Soma Maity
Abstract:
Let $M$ be an open manifold of dimension at least $3$, which admits a complete metric of positive scalar curvature. For a function $v$ with bounded growth of derivative, whether $M$ admits a metric of positive scalar curvature with volume growth of the same growth type as $v$ is unknown. We answer this question positively in the case of manifolds, which are infinite connected sums of closed manifo…
▽ More
Let $M$ be an open manifold of dimension at least $3$, which admits a complete metric of positive scalar curvature. For a function $v$ with bounded growth of derivative, whether $M$ admits a metric of positive scalar curvature with volume growth of the same growth type as $v$ is unknown. We answer this question positively in the case of manifolds, which are infinite connected sums of closed manifolds that admit metrics of positive scalar curvature. To define a metric of positive scalar curvature with a certain volume growth type on $M$, we use the Gromov-Lawson construction of metrics with positive scalar curvature on connected sums and Grimaldi-Pansu's construction of metrics of bounded geometry of certain volume growth type on open manifolds. We generalize this result to manifolds, which are infinite connected sums of similar closed manifolds along lower-dimensional spheres.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Half-quantized Hall Plateaus in the Confined Geometry of Graphene
Authors:
Preeti Pandey,
Sourav Manna,
Kristiana N. Frei,
Jerin Saji,
Anne Denis,
Alexander Savin,
Kenji Watanabe,
Takashi Taniguchi,
Pertti J. Hakonen,
Ankur Das,
Manohar Kumar
Abstract:
Since the ground-breaking discovery of the quantum Hall effect, half-quantized quantum Hall plateaus have been some of the most studied and sought-after states. Their importance stems not only from the fact that they transcend the composite fermion framework used to explain fractional quantum Hall states (such as Laughlin states). Crucially, they hold promise for hosting non-Abelian excitations, w…
▽ More
Since the ground-breaking discovery of the quantum Hall effect, half-quantized quantum Hall plateaus have been some of the most studied and sought-after states. Their importance stems not only from the fact that they transcend the composite fermion framework used to explain fractional quantum Hall states (such as Laughlin states). Crucially, they hold promise for hosting non-Abelian excitations, which are essential for developing topological qubits - key components for fault-tolerant quantum computing. In this work, we show that these coveted half-quantized plateaus can appear in more than one unexpected way. We report the observation of fractional states with conductance quantization at $ν_H = 5/2$ arising due to charge equilibration in the confined region of a quantum point contact in monolayer graphene.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Aggregation of Constrained Crowd Opinions for Urban Planning
Authors:
Akanksha Das,
Jyoti Patel,
Malay Bhattacharyya
Abstract:
Collective decision making is often a customary action taken in government crowdsourcing. Through ensemble of opinions (popularly known as judgment analysis), governments can satisfy majority of the people who provided opinions. This has various real-world applications like urban planning or participatory budgeting that require setting up {\em facilities} based on the opinions of citizens. Recentl…
▽ More
Collective decision making is often a customary action taken in government crowdsourcing. Through ensemble of opinions (popularly known as judgment analysis), governments can satisfy majority of the people who provided opinions. This has various real-world applications like urban planning or participatory budgeting that require setting up {\em facilities} based on the opinions of citizens. Recently, there is an emerging interest in performing judgment analysis on opinions that are constrained. We consider a new dimension of this problem that accommodate background constraints in the problem of judgment analysis, which ensures the collection of more responsible opinions. The background constraints refer to the restrictions (with respect to the existing infrastructure) to be taken care of while performing the consensus of opinions. In this paper, we address the said kind of problems with efficient unsupervised approaches of learning suitably modified to cater to the constraints of urban planning. We demonstrate the effectiveness of this approach in various scenarios where the opinions are taken for setting up ATM counters and sewage lines. Our main contributions encompass a novel approach of collecting data for smart city planning (in the presence of constraints), development of methods for opinion aggregation in various formats. As a whole, we present a new dimension of judgment analysis by adding background constraints to the problem.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Density based Spatial Clustering of Lines via Probabilistic Generation of Neighbourhood
Authors:
Akanksha Das,
Malay Bhattacharyya
Abstract:
Density based spatial clustering of points in $\mathbb{R}^n$ has a myriad of applications in a variety of industries. We generalise this problem to the density based clustering of lines in high-dimensional spaces, keeping in mind there exists no valid distance measure that follows the triangle inequality for lines. In this paper, we design a clustering algorithm that generates a customised neighbo…
▽ More
Density based spatial clustering of points in $\mathbb{R}^n$ has a myriad of applications in a variety of industries. We generalise this problem to the density based clustering of lines in high-dimensional spaces, keeping in mind there exists no valid distance measure that follows the triangle inequality for lines. In this paper, we design a clustering algorithm that generates a customised neighbourhood for a line of a fixed volume (given as a parameter), based on an optional parameter as a continuous probability density function. This algorithm is not sensitive to the outliers and can effectively identify the noise in the data using a cardinality parameter. One of the pivotal applications of this algorithm is clustering data points in $\mathbb{R}^n$ with missing entries, while utilising the domain knowledge of the respective data. In particular, the proposed algorithm is able to cluster $n$-dimensional data points that contain at least $(n-1)$-dimensional information. We illustrate the neighbourhoods for the standard probability distributions with continuous probability density functions and demonstrate the effectiveness of our algorithm on various synthetic and real-world datasets (e.g., rail and road networks). The experimental results also highlight its application in clustering incomplete data.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects
Authors:
Awal Ahmed Fime,
Saifuddin Mahmud,
Arpita Das,
Md. Sunzidul Islam,
Hong-Hoon Kim
Abstract:
Automatic scene generation is an essential area of research with applications in robotics, recreation, visual representation, training and simulation, education, and more. This survey provides a comprehensive review of the current state-of-the-arts in automatic scene generation, focusing on techniques that leverage machine learning, deep learning, embedded systems, and natural language processing…
▽ More
Automatic scene generation is an essential area of research with applications in robotics, recreation, visual representation, training and simulation, education, and more. This survey provides a comprehensive review of the current state-of-the-arts in automatic scene generation, focusing on techniques that leverage machine learning, deep learning, embedded systems, and natural language processing (NLP). We categorize the models into four main types: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, and Diffusion Models. Each category is explored in detail, discussing various sub-models and their contributions to the field.
We also review the most commonly used datasets, such as COCO-Stuff, Visual Genome, and MS-COCO, which are critical for training and evaluating these models. Methodologies for scene generation are examined, including image-to-3D conversion, text-to-3D generation, UI/layout design, graph-based methods, and interactive scene generation. Evaluation metrics such as Frechet Inception Distance (FID), Kullback-Leibler (KL) Divergence, Inception Score (IS), Intersection over Union (IoU), and Mean Average Precision (mAP) are discussed in the context of their use in assessing model performance.
The survey identifies key challenges and limitations in the field, such as maintaining realism, handling complex scenes with multiple objects, and ensuring consistency in object relationships and spatial arrangements. By summarizing recent advances and pinpointing areas for improvement, this survey aims to provide a valuable resource for researchers and practitioners working on automatic scene generation.
△ Less
Submitted 14 September, 2024;
originally announced October 2024.
-
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis
Authors:
Shukesh Reddy,
Nishit Poddar,
Srijan Das,
Abhijit Das
Abstract:
In this work, we explore Self-supervised Learning (SSL) as an auxiliary task to blend the texture-based local descriptors into feature modelling for efficient face analysis. Combining a primary task and a self-supervised auxiliary task is beneficial for robust representation. Therefore, we used the SSL task of mask auto-encoder (MAE) as an auxiliary task to reconstruct texture features such as loc…
▽ More
In this work, we explore Self-supervised Learning (SSL) as an auxiliary task to blend the texture-based local descriptors into feature modelling for efficient face analysis. Combining a primary task and a self-supervised auxiliary task is beneficial for robust representation. Therefore, we used the SSL task of mask auto-encoder (MAE) as an auxiliary task to reconstruct texture features such as local patterns along with the primary task for robust and unbiased face analysis. We experimented with our hypothesis on three major paradigms of face analysis: face attribute and face-based emotion analysis, and deepfake detection. Our experiment results exhibit that better feature representation can be gleaned from our proposed model for fair and bias-less face analysis.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Deformation maps in proto-twilled Leibniz algebras
Authors:
Apurba Das,
Suman Majhi,
Ramkrishna Mandal
Abstract:
This paper aims to find a unified approach to studying the cohomology theories of various operators on Leibniz algebras. We first introduce deformation maps in a proto-twilled Leibniz algebra to do this. Such maps generalize various well-known operators (such as homomorphisms, derivations, crossed homomorphisms, Rota-Baxter operators, modified Rota-Baxter operators, twisted Rota-Baxter operators,…
▽ More
This paper aims to find a unified approach to studying the cohomology theories of various operators on Leibniz algebras. We first introduce deformation maps in a proto-twilled Leibniz algebra to do this. Such maps generalize various well-known operators (such as homomorphisms, derivations, crossed homomorphisms, Rota-Baxter operators, modified Rota-Baxter operators, twisted Rota-Baxter operators, Reynolds operators etc) defined on Leibniz algebras and embedding tensors on Lie algebras. We define the cohomology of a deformation map unifying the existing cohomologies of all the operators mentioned above. Then we construct a curved $L_\infty$-algebra whose Maurer-Cartan elements are precisely deformation maps in a given proto-twilled Leibniz algebra. In particular, we get the Maurer-Cartan characterizations of modified Rota-Baxter operators, twisted Rota-Baxter operators and Reynolds operators on a Leibniz algebra. Finally, given a proto-twilled Leibniz algebra and a deformation map $r$, we construct two governing $L_\infty$-algebras, the first one controls the deformations of the operator $r$ while the second one controls the simultaneous deformations of both the proto-twilled Leibniz algebra and the operator $r$.
△ Less
Submitted 7 October, 2024; v1 submitted 27 September, 2024;
originally announced September 2024.
-
Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom
Authors:
Adrito Das,
Bilal Sidiqi,
Laurent Mennillo,
Zhehua Mao,
Mikael Brudfors,
Miguel Xochicale,
Danyal Z. Khan,
Nicola Newall,
John G. Hanrahan,
Matthew J. Clarkson,
Danail Stoyanov,
Hani J. Marcus,
Sophia Bano
Abstract:
Improved surgical skill is generally associated with improved patient outcomes, although assessment is subjective; labour-intensive; and requires domain specific expertise. Automated data driven metrics can alleviate these difficulties, as demonstrated by existing machine learning instrument tracking models in minimally invasive surgery. However, these models have been tested on limited datasets o…
▽ More
Improved surgical skill is generally associated with improved patient outcomes, although assessment is subjective; labour-intensive; and requires domain specific expertise. Automated data driven metrics can alleviate these difficulties, as demonstrated by existing machine learning instrument tracking models in minimally invasive surgery. However, these models have been tested on limited datasets of laparoscopic surgery, with a focus on isolated tasks and robotic surgery. In this paper, a new public dataset is introduced, focusing on simulated surgery, using the nasal phase of endoscopic pituitary surgery as an exemplar. Simulated surgery allows for a realistic yet repeatable environment, meaning the insights gained from automated assessment can be used by novice surgeons to hone their skills on the simulator before moving to real surgery. PRINTNet (Pituitary Real-time INstrument Tracking Network) has been created as a baseline model for this automated assessment. Consisting of DeepLabV3 for classification and segmentation; StrongSORT for tracking; and the NVIDIA Holoscan SDK for real-time performance, PRINTNet achieved 71.9% Multiple Object Tracking Precision running at 22 Frames Per Second. Using this tracking output, a Multilayer Perceptron achieved 87% accuracy in predicting surgical skill level (novice or expert), with the "ratio of total procedure time to instrument visible time" correlated with higher surgical skill. This therefore demonstrates the feasibility of automated surgical skill assessment in simulated endoscopic pituitary surgery. The new publicly available dataset can be found here: https://doi.org/10.5522/04/26511049.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
PitRSDNet: Predicting Intra-operative Remaining Surgery Duration in Endoscopic Pituitary Surgery
Authors:
Anjana Wijekoon,
Adrito Das,
Roxana R. Herrera,
Danyal Z. Khan,
John Hanrahan,
Eleanor Carter,
Valpuri Luoma,
Danail Stoyanov,
Hani J. Marcus,
Sophia Bano
Abstract:
Accurate intra-operative Remaining Surgery Duration (RSD) predictions allow for anaesthetists to more accurately decide when to administer anaesthetic agents and drugs, as well as to notify hospital staff to send in the next patient. Therefore RSD plays an important role in improving patient care and minimising surgical theatre costs via efficient scheduling. In endoscopic pituitary surgery, it is…
▽ More
Accurate intra-operative Remaining Surgery Duration (RSD) predictions allow for anaesthetists to more accurately decide when to administer anaesthetic agents and drugs, as well as to notify hospital staff to send in the next patient. Therefore RSD plays an important role in improving patient care and minimising surgical theatre costs via efficient scheduling. In endoscopic pituitary surgery, it is uniquely challenging due to variable workflow sequences with a selection of optional steps contributing to high variability in surgery duration. This paper presents PitRSDNet for predicting RSD during pituitary surgery, a spatio-temporal neural network model that learns from historical data focusing on workflow sequences. PitRSDNet integrates workflow knowledge into RSD prediction in two forms: 1) multi-task learning for concurrently predicting step and RSD; and 2) incorporating prior steps as context in temporal learning and inference. PitRSDNet is trained and evaluated on a new endoscopic pituitary surgery dataset with 88 videos to show competitive performance improvements over previous statistical and machine learning methods. The findings also highlight how PitRSDNet improve RSD precision on outlier cases utilising the knowledge of prior steps.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Do the Right Thing, Just Debias! Multi-Category Bias Mitigation Using LLMs
Authors:
Amartya Roy,
Danush Khanna,
Devanshu Mahapatra,
Vasanthakumar,
Avirup Das,
Kripabandhu Ghosh
Abstract:
This paper tackles the challenge of building robust and generalizable bias mitigation models for language. Recognizing the limitations of existing datasets, we introduce ANUBIS, a novel dataset with 1507 carefully curated sentence pairs encompassing nine social bias categories. We evaluate state-of-the-art models like T5, utilizing Supervised Fine-Tuning (SFT), Reinforcement Learning (PPO, DPO), a…
▽ More
This paper tackles the challenge of building robust and generalizable bias mitigation models for language. Recognizing the limitations of existing datasets, we introduce ANUBIS, a novel dataset with 1507 carefully curated sentence pairs encompassing nine social bias categories. We evaluate state-of-the-art models like T5, utilizing Supervised Fine-Tuning (SFT), Reinforcement Learning (PPO, DPO), and In-Context Learning (ICL) for effective bias mitigation. Our analysis focuses on multi-class social bias reduction, cross-dataset generalizability, and environmental impact of the trained models. ANUBIS and our findings offer valuable resources for building more equitable AI systems and contribute to the development of responsible and unbiased technologies with broad societal impact.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation
Authors:
Naoki Yokoyama,
Ram Ramrakhya,
Abhishek Das,
Dhruv Batra,
Sehoon Ha
Abstract:
We present the Habitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON), a large-scale benchmark that broadens the scope and semantic range of prior Object Goal Navigation (ObjectNav) benchmarks. Leveraging the HM3DSem dataset, HM3D-OVON incorporates over 15k annotated instances of household objects across 379 distinct categories, derived from photo-realistic 3D scans of re…
▽ More
We present the Habitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON), a large-scale benchmark that broadens the scope and semantic range of prior Object Goal Navigation (ObjectNav) benchmarks. Leveraging the HM3DSem dataset, HM3D-OVON incorporates over 15k annotated instances of household objects across 379 distinct categories, derived from photo-realistic 3D scans of real-world environments. In contrast to earlier ObjectNav datasets, which limit goal objects to a predefined set of 6-20 categories, HM3D-OVON facilitates the training and evaluation of models with an open-set of goals defined through free-form language at test-time. Through this open-vocabulary formulation, HM3D-OVON encourages progress towards learning visuo-semantic navigation behaviors that are capable of searching for any object specified by text in an open-vocabulary manner. Additionally, we systematically evaluate and compare several different types of approaches on HM3D-OVON. We find that HM3D-OVON can be used to train an open-vocabulary ObjectNav agent that achieves both higher performance and is more robust to localization and actuation noise than the state-of-the-art ObjectNav approach. We hope that our benchmark and baseline results will drive interest in developing embodied agents that can navigate real-world spaces to find household objects specified through free-form language, taking a step towards more flexible and human-like semantic visual navigation. Code and videos available at: naoki.io/ovon.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
Authors:
Akshaj Kumar Veldanda,
Shi-Xiong Zhang,
Anirban Das,
Supriyo Chakraborty,
Stephen Rawls,
Sambit Sahu,
Milind Naphade
Abstract:
Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery,…
▽ More
Large language models (LLMs) have revolutionized various domains, yet their utility comes with significant challenges related to outdated or problematic knowledge embedded during pretraining. This paper addresses the challenge of modifying LLMs to unlearn problematic and outdated information while efficiently integrating new knowledge without retraining from scratch. Here, we propose LLM Surgery, a framework to efficiently modify LLM behaviour by optimizing a three component objective function that: (1) Performs reverse gradient on unlearning dataset (problematic and outdated information), (2) Performs gradient descent on the update dataset (new and updated information), and (3) Minimizes the KL divergence on the retain dataset (small subset of unchanged text), ensuring alignment between pretrained and modified model outputs. Due to the lack of publicly available datasets specifically tailored for our novel task, we compiled a new dataset and an evaluation benchmark. Using Llama2-7B, we demonstrate that LLM Surgery can achieve significant forgetting on the unlearn set, a 20\% increase in accuracy on the update set, and maintain performance on the retain set.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Check-probe spectroscopy of lifetime-limited emitters in bulk-grown silicon carbide
Authors:
G. L. van de Stolpe,
L. J. Feije,
S. J. H. Loenen,
A. Das,
G. M. Timmer,
T. W. de Jong,
T. H. Taminiau
Abstract:
Solid-state single-photon emitters provide a versatile platform for exploring quantum technologies such as optically connected quantum networks. A key challenge is to ensure optical coherence and spectral stability of the emitters. Here, we introduce a high-bandwidth `check-probe' scheme to quantitatively measure (laser-induced) spectral diffusion and ionisation rates, as well as homogeneous linew…
▽ More
Solid-state single-photon emitters provide a versatile platform for exploring quantum technologies such as optically connected quantum networks. A key challenge is to ensure optical coherence and spectral stability of the emitters. Here, we introduce a high-bandwidth `check-probe' scheme to quantitatively measure (laser-induced) spectral diffusion and ionisation rates, as well as homogeneous linewidths. We demonstrate these methods on single V2 centers in commercially available bulk-grown 4H-silicon carbide. Despite observing significant spectral diffusion under laser illumination ($\gtrsim$ GHz/s), the optical transitions are narrow ($\sim$35 MHz), and remain stable in the dark ($\gtrsim$1 s). Through Landau-Zener-Stückelberg interferometry, we determine the optical coherence to be near-lifetime limited ($T_2 = 16.4(4)$ ns), hinting at the potential for using bulk-grown materials for developing quantum technologies. These results advance our understanding of spectral diffusion of quantum emitters in semiconductor materials, and may have applications for studying charge dynamics across other platforms.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
Authors:
Genta Indra Winata,
Hanyang Zhao,
Anirban Das,
Wenpin Tang,
David D. Yao,
Shi-Xiong Zhang,
Sambit Sahu
Abstract:
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and data…
▽ More
Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is organized into three main sections: 1) introduction and preliminaries: an introduction to reinforcement learning frameworks, preference tuning tasks, models, and datasets across various modalities: language, speech, and vision, as well as different policy approaches, 2) in-depth examination of each preference tuning approach: a detailed analysis of the methods used in preference tuning, and 3) applications, discussion, and future directions: an exploration of the applications of preference tuning in downstream tasks, including evaluation methods for different modalities, and an outlook on future research directions. Our objective is to present the latest methodologies in preference tuning and model alignment, enhancing the understanding of this field for researchers and practitioners. We hope to encourage further engagement and innovation in this area.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations
Authors:
Hussam Al Daas,
Grey Ballard,
Laura Grigori,
Suraj Kumar,
Kathryn Rouse,
Mathieu Verite
Abstract:
In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with the transpose of another matrix and the transpose of that result, known as a symmetric rank-2k update (SYR2K) iii) performing matrix multiplication with a symme…
▽ More
In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with the transpose of another matrix and the transpose of that result, known as a symmetric rank-2k update (SYR2K) iii) performing matrix multiplication with a symmetric input matrix (SYMM). All three computations appear in the Level 3 Basic Linear Algebra Subroutines (BLAS) and have wide use in applications involving symmetric matrices. We establish communication lower bounds for these kernels using sequential and distributed-memory parallel computational models, and we show that our bounds are tight by presenting communication-optimal algorithms for each setting. Our lower bound proofs rely on applying a geometric inequality for symmetric computations and analytically solving constrained nonlinear optimization problems. The symmetric matrix and its corresponding computations are accessed and performed according to a triangular block partitioning scheme in the optimal algorithms.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models
Authors:
Avirup Das,
Rishabh Dev Yadav,
Sihao Sun,
Mingfei Sun,
Samuel Kaski,
Wei Pan
Abstract:
An inherent fragility of quadrotor systems stems from model inaccuracies and external disturbances. These factors hinder performance and compromise the stability of the system, making precise control challenging. Existing model-based approaches either make deterministic assumptions, utilize Gaussian-based representations of uncertainty, or rely on nominal models, all of which often fall short in c…
▽ More
An inherent fragility of quadrotor systems stems from model inaccuracies and external disturbances. These factors hinder performance and compromise the stability of the system, making precise control challenging. Existing model-based approaches either make deterministic assumptions, utilize Gaussian-based representations of uncertainty, or rely on nominal models, all of which often fall short in capturing the complex, multimodal nature of real-world dynamics. This work introduces DroneDiffusion, a novel framework that leverages conditional diffusion models to learn quadrotor dynamics, formulated as a sequence generation task. DroneDiffusion achieves superior generalization to unseen, complex scenarios by capturing the temporal nature of uncertainties and mitigating error propagation. We integrate the learned dynamics with an adaptive controller for trajectory tracking with stability guarantees. Extensive experiments in both simulation and real-world flights demonstrate the robustness of the framework across a range of scenarios, including unfamiliar flight paths and varying payloads, velocities, and wind disturbances.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Challenging Fairness: A Comprehensive Exploration of Bias in LLM-Based Recommendations
Authors:
Shahnewaz Karim Sakib,
Anindya Bijoy Das
Abstract:
Large Language Model (LLM)-based recommendation systems provide more comprehensive recommendations than traditional systems by deeply analyzing content and user behavior. However, these systems often exhibit biases, favoring mainstream content while marginalizing non-traditional options due to skewed training data. This study investigates the intricate relationship between bias and LLM-based recom…
▽ More
Large Language Model (LLM)-based recommendation systems provide more comprehensive recommendations than traditional systems by deeply analyzing content and user behavior. However, these systems often exhibit biases, favoring mainstream content while marginalizing non-traditional options due to skewed training data. This study investigates the intricate relationship between bias and LLM-based recommendation systems, with a focus on music, song, and book recommendations across diverse demographic and cultural groups. Through a comprehensive analysis conducted over different LLM-models, this paper evaluates the impact of bias on recommendation outcomes. Our findings reveal that bias is so deeply ingrained within these systems that even a simpler intervention like prompt engineering can significantly reduce bias, underscoring the pervasive nature of the issue. Moreover, factors like intersecting identities and contextual information, such as socioeconomic status, further amplify these biases, demonstrating the complexity and depth of the challenges faced in creating fair recommendations across different groups.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Absence of heat flow in ν = 0 quantum Hall ferromagnet in bilayer graphene
Authors:
Ravi Kumar,
Saurabh Kumar Srivastav,
Ujjal Roy,
Ujjawal Singhal,
K. Watanabe,
T. Taniguchi,
Vibhor Singh,
P. Roulleau,
Anindya Das
Abstract:
The charge neutrality point of bilayer graphene, denoted as ν = 0 state, manifests competing phases marked by spontaneously broken isospin (spin/valley/layer) symmetries under external magnetic and electric fields. However, due to their electrically insulating nature, identifying these phases through electrical conductance measurements remains challenging. A recent theoretical proposal introduces…
▽ More
The charge neutrality point of bilayer graphene, denoted as ν = 0 state, manifests competing phases marked by spontaneously broken isospin (spin/valley/layer) symmetries under external magnetic and electric fields. However, due to their electrically insulating nature, identifying these phases through electrical conductance measurements remains challenging. A recent theoretical proposal introduces a novel approach, employing thermal transport measurements to detect these competing phases. Here, we experimentally explore the bulk thermal transport of the ν = 0 state in bilayer graphene to investigate its ground states and collective excitations associated with isospin. While the theory anticipates a finite thermal conductance in the ν = 0 state, our findings unveil an absence of detectable thermal conductance. Through variations in the external electric field and temperature-dependent measurements, our results suggest towards gapped collective excitations at ν = 0 state. Our findings underscore the necessity for further investigations into the nature of ν = 0.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
$n-\overline{n}$ Oscillation in $S^1/Z_2\times Z_2'$ Orbifold $SU(5)$ GUT
Authors:
Ankit Das,
Sarthak Duary,
Utpal Sarkar
Abstract:
We explore the possibility of $B$ and $B-L$ violating processes, specifically proton decay and neutron-antineutron oscillation, using explicit realization of operators in the $SU(5)$ grand unified theory with an $S^1/Z_2 \times Z_2'$ orbifold space.
We explore the possibility of $B$ and $B-L$ violating processes, specifically proton decay and neutron-antineutron oscillation, using explicit realization of operators in the $SU(5)$ grand unified theory with an $S^1/Z_2 \times Z_2'$ orbifold space.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Matrix perturbation analysis of methods for extracting singular values from approximate singular subspaces
Authors:
Lorenzo Lazzarino,
Hussam Al Daas,
Yuji Nakatsukasa
Abstract:
Given (orthonormal) approximations $\tilde{U}$ and $\tilde{V}$ to the left and right subspaces spanned by the leading singular vectors of a matrix $A$, we discuss methods to approximate the leading singular values of $A$ and study their accuracy. In particular, we focus our analysis on the generalized Nyström approximation, as surprisingly, it is able to obtain significantly better accuracy than c…
▽ More
Given (orthonormal) approximations $\tilde{U}$ and $\tilde{V}$ to the left and right subspaces spanned by the leading singular vectors of a matrix $A$, we discuss methods to approximate the leading singular values of $A$ and study their accuracy. In particular, we focus our analysis on the generalized Nyström approximation, as surprisingly, it is able to obtain significantly better accuracy than classical methods, namely Rayleigh-Ritz and (one-sided) projected SVD.
A key idea of the analysis is to view the methods as finding the exact singular values of a perturbation of $A$. In this context, we derive a matrix perturbation result that exploits the structure of such $2\times2$ block matrix perturbation. Furthermore, we extend it to block tridiagonal matrices. We then obtain bounds on the accuracy of the extracted singular values. This leads to sharp bounds that predict well the approximation error trends and explain the difference in the behavior of these methods. Finally, we present an approach to derive an a-posteriori version of those bounds, which are more amenable to computation in practice.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Stretched-Exponential Melting of a Dynamically Frozen State Under Imprinted Phase Noise in the Ising Chain in a Transverse Field
Authors:
Krishanu Roychowdhury,
Arnab Das
Abstract:
Dynamical freezing is a phenomenon where a set of local observables emerges as approximate but stable conserved quantities (freezes) under a strong periodic drive in a closed quantum system. The expectation values of these emergent conserved quantities exhibit small fluctuations around their respective initial values. These fluctuations do not grow with time, and their magnitude can be tuned down…
▽ More
Dynamical freezing is a phenomenon where a set of local observables emerges as approximate but stable conserved quantities (freezes) under a strong periodic drive in a closed quantum system. The expectation values of these emergent conserved quantities exhibit small fluctuations around their respective initial values. These fluctuations do not grow with time, and their magnitude can be tuned down sharply by tuning the drive parameters. In this work, we probe the resilience of dynamical freezing to random perturbations added to the relative phases between the interfering states (elements of a natural basis) in the time-evolving wave function after each drive cycle. We study this in an integrable Ising chain in a time-periodic transverse field. Our key finding is, that the imprinted phase noise melts the dynamically frozen state, but the decay is "slow": a stretched-exponential decay rather than an exponential one. Stretched-exponential decays (also known as Kohlrausch relaxation) are usually expected in complex systems with time-scale hierarchies due to strong disorders or other inhomogeneities resulting in jamming, glassiness, or localization.
△ Less
Submitted 20 October, 2024; v1 submitted 13 September, 2024;
originally announced September 2024.
-
AI data transparency: an exploration through the lens of AI incidents
Authors:
Sophia Worth,
Ben Snaith,
Arunav Das,
Gefion Thuermer,
Elena Simperl
Abstract:
Knowing more about the data used to build AI systems is critical for allowing different stakeholders to play their part in ensuring responsible and appropriate deployment and use. Meanwhile, a 2023 report shows that data transparency lags significantly behind other areas of AI transparency in popular foundation models. In this research, we sought to build on these findings, exploring the status of…
▽ More
Knowing more about the data used to build AI systems is critical for allowing different stakeholders to play their part in ensuring responsible and appropriate deployment and use. Meanwhile, a 2023 report shows that data transparency lags significantly behind other areas of AI transparency in popular foundation models. In this research, we sought to build on these findings, exploring the status of public documentation about data practices within AI systems generating public concern.
Our findings demonstrate that low data transparency persists across a wide range of systems, and further that issues of transparency and explainability at model- and system- level create barriers for investigating data transparency information to address public concerns about AI systems. We highlight a need to develop systematic ways of monitoring AI data transparency that account for the diversity of AI system types, and for such efforts to build on further understanding of the needs of those both supplying and using data transparency information.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
SITAR: Semi-supervised Image Transformer for Action Recognition
Authors:
Owais Iqbal,
Omprakash Chakraborty,
Aftab Hussain,
Rameswar Panda,
Abir Das
Abstract:
Recognizing actions from a limited set of labeled videos remains a challenge as annotating visual data is not only tedious but also can be expensive due to classified nature. Moreover, handling spatio-temporal data using deep $3$D transformers for this can introduce significant computational complexity. In this paper, our objective is to address video action recognition in a semi-supervised settin…
▽ More
Recognizing actions from a limited set of labeled videos remains a challenge as annotating visual data is not only tedious but also can be expensive due to classified nature. Moreover, handling spatio-temporal data using deep $3$D transformers for this can introduce significant computational complexity. In this paper, our objective is to address video action recognition in a semi-supervised setting by leveraging only a handful of labeled videos along with a collection of unlabeled videos in a compute efficient manner. Specifically, we rearrange multiple frames from the input videos in row-column form to construct super images. Subsequently, we capitalize on the vast pool of unlabeled samples and employ contrastive learning on the encoded super images. Our proposed approach employs two pathways to generate representations for temporally augmented super images originating from the same video. Specifically, we utilize a 2D image-transformer to generate representations and apply a contrastive loss function to minimize the similarity between representations from different videos while maximizing the representations of identical videos. Our method demonstrates superior performance compared to existing state-of-the-art approaches for semi-supervised action recognition across various benchmark datasets, all while significantly reducing computational costs.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Generalized Symmetry Resolution of Entanglement in CFT for Twisted and Anyonic sectors
Authors:
Arpit Das,
Javier Molina-Vilaplana,
Pablo Saura-Bastida
Abstract:
A comprehensive symmetry resolution of the entanglement entropy (EE) in $(1+1)$-d rational conformal field theories (RCFT) with categorical non-invertible symmetries is presented. This amounts to symmetry resolving the entanglement with respect to the generalized twisted and anyonic charge sectors of the theory. The anyonic sectors label the irreducible representations of a modular fusion category…
▽ More
A comprehensive symmetry resolution of the entanglement entropy (EE) in $(1+1)$-d rational conformal field theories (RCFT) with categorical non-invertible symmetries is presented. This amounts to symmetry resolving the entanglement with respect to the generalized twisted and anyonic charge sectors of the theory. The anyonic sectors label the irreducible representations of a modular fusion category defining the symmetry and can be understood through the $(2+1)$-d symmetry topological field theory (SymTFT) that encodes the symmetry features of the CFT. Using this, we define the corresponding generalized boundary dependent charged moments necessary for the symmetry resolution of the entanglement entropy, which is the main result of this work. Furthermore, contrary to the case of invertible symmetries, we observe the breakdown of entanglement equipartition between different charged sectors at the next-to-leading order in the ultraviolet cutoff.
△ Less
Submitted 9 September, 2024; v1 submitted 3 September, 2024;
originally announced September 2024.
-
Cup product, Frölicher-Nijenhuis bracket and the derived bracket associated to Hom-Lie algebras
Authors:
Anusuiya Baishya,
Apurba Das
Abstract:
In this paper, we introduce some new graded Lie algebras associated with a Hom-Lie algebra. At first, we define the cup product bracket and its application to the deformation theory of Hom-Lie algebra morphisms. We observe an action of the well-known Hom-analogue of the Nijenhuis-Richardson graded Lie algebra on the cup product graded Lie algebra. Using the corresponding semidirect product, we def…
▽ More
In this paper, we introduce some new graded Lie algebras associated with a Hom-Lie algebra. At first, we define the cup product bracket and its application to the deformation theory of Hom-Lie algebra morphisms. We observe an action of the well-known Hom-analogue of the Nijenhuis-Richardson graded Lie algebra on the cup product graded Lie algebra. Using the corresponding semidirect product, we define the Frölicher-Nijenhuis bracket and study its application to Nijenhuis operators. We show that the Nijenhuis-Richardson graded Lie algebra and the Frölicher-Nijenhuis algebra constitute a matched pair of graded Lie algebras. Finally, we define another graded Lie bracket, called the derived bracket that is useful to study Rota-Baxter operators on Hom-Lie algebras.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Exact computation of Transfer Entropy with Path Weight Sampling
Authors:
Avishek Das,
Pieter Rein ten Wolde
Abstract:
The ability to quantify the directional flow of information is vital to understanding natural systems and designing engineered information-processing systems. A widely used measure to quantify this information flow is the transfer entropy. However, until now, this quantity could only be obtained in dynamical models using approximations that are typically uncontrolled. Here we introduce a computati…
▽ More
The ability to quantify the directional flow of information is vital to understanding natural systems and designing engineered information-processing systems. A widely used measure to quantify this information flow is the transfer entropy. However, until now, this quantity could only be obtained in dynamical models using approximations that are typically uncontrolled. Here we introduce a computational algorithm called Transfer Entropy-Path Weight Sampling (TE-PWS), which makes it possible, for the first time, to quantify the transfer entropy and its variants exactly for any stochastic model, including those with multiple hidden variables, nonlinearity, transient conditions, and feedback. By leveraging techniques from polymer and path sampling, TE-PWS efficiently computes the transfer entropy as a Monte-Carlo average over signal trajectory space. We apply TE-PWS to linear and nonlinear systems to reveal how transfer entropy can overcome naive applications of the data processing inequality in the presence of feedback.
△ Less
Submitted 17 October, 2024; v1 submitted 3 September, 2024;
originally announced September 2024.
-
Experimental and computational study of ethanolamine ices at astrochemical conditions
Authors:
R Ramachandran,
Milan Sil,
Prasanta Gorai,
J K Meka,
S Pavithraa,
J -I Lo,
S -L Chou,
Y -J Wu,
P Janardhan,
B -M Cheng,
Anil Bhardwaj,
Vıctor M. Rivilla,
N J Mason,
B Sivaraman,
Ankan Das
Abstract:
Ethanolamine (NH2CH2CH2OH) has recently been identified in the molecular cloud G+0.693-0.027, situated in the SgrB2 complex in the Galactic center. However, its presence in other regions, and in particular in star-forming sites, is still elusive. Given its likely role as a precursor to simple amino acids, understanding its presence in the star-forming region is required. Here, we present the exper…
▽ More
Ethanolamine (NH2CH2CH2OH) has recently been identified in the molecular cloud G+0.693-0.027, situated in the SgrB2 complex in the Galactic center. However, its presence in other regions, and in particular in star-forming sites, is still elusive. Given its likely role as a precursor to simple amino acids, understanding its presence in the star-forming region is required. Here, we present the experimentally obtained temperature-dependent spectral features and morphological behavior of pure ethanolamine ices under astrochemical conditions in the 2 - 12 micro meter (MIR) and 120 - 230 nm (VUV) regions for the first time. These features would help in understanding its photochemical behavior. In addition, we present the first chemical models specifically dedicated to ethanolamine. These models include all the discussed chemical routes from the literature, along with the estimated binding energies and activation energies from quantum chemical calculations reported in this work. We have found that surface reactions: CH2OH + NH2CH2 --> NH2CH2CH2OH and NH2 + C2H4OH --> NH2CH2CH2OH in warmer regions (60-90 K) could play a significant role in the formation of ethanolamine. Our modeled abundance of ethanolamine complements the upper limit of ethanolamine column density estimated in earlier observations in hot core/corino regions. Furthermore, we provide a theoretical estimation of the rotational and distortional constants for various species (such as HNCCO, NH2CHCO, and NH2CH2CO) related to ethanolamine that have not been studied in existing literature. This study could be valuable for identifying these species in the future.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery
Authors:
Adrito Das,
Danyal Z. Khan,
Dimitrios Psychogyios,
Yitong Zhang,
John G. Hanrahan,
Francisco Vasconcelos,
You Pang,
Zhen Chen,
Jinlin Wu,
Xiaoyang Zou,
Guoyan Zheng,
Abdul Qayyum,
Moona Mazher,
Imran Razzak,
Tianbin Li,
Jin Ye,
Junjun He,
Szymon Płotka,
Joanna Kaleta,
Amine Yamlahi,
Antoine Jund,
Patrick Godau,
Satoshi Kondo,
Satoshi Kasai,
Kousuke Hirasawa
, et al. (7 additional authors not shown)
Abstract:
The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operat…
▽ More
The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operation notes. The Pituitary Vision (PitVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery. This is a unique task when compared to other minimally invasive surgeries due to the smaller working space, which limits and distorts vision; and higher frequency of instrument and step switching, which requires more precise model predictions. Participants were provided with 25-videos, with results presented at the MICCAI-2023 conference as part of the Endoscopic Vision 2023 Challenge in Vancouver, Canada, on 08-Oct-2023. There were 18-submissions from 9-teams across 6-countries, using a variety of deep learning models. A commonality between the top performing models was incorporating spatio-temporal and multi-task methods, with greater than 50% and 10% macro-F1-score improvement over purely spacial single-task models in step and instrument recognition respectively. The PitVis-2023 Challenge therefore demonstrates state-of-the-art computer vision models in minimally invasive surgery are transferable to a new dataset, with surgery specific techniques used to enhance performance, progressing the field further. Benchmark results are provided in the paper, and the dataset is publicly available at: https://doi.org/10.5522/04/26531686.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Quasi-twilled associative algebras, deformation maps and their governing algebras
Authors:
Apurba Das,
Ramkrishna Mandal
Abstract:
A quasi-twilled associative algebra is an associative algebra $\mathbb{A}$ whose underlying vector space has a decomposition $\mathbb{A} = A \oplus B$ such that $B \subset \mathbb{A}$ is a subalgebra. In the first part of this paper, we give the Maurer-Cartan characterization and introduce the cohomology of a quasi-twilled associative algebra.
In a quasi-twilled associative algebra $\mathbb{A}$,…
▽ More
A quasi-twilled associative algebra is an associative algebra $\mathbb{A}$ whose underlying vector space has a decomposition $\mathbb{A} = A \oplus B$ such that $B \subset \mathbb{A}$ is a subalgebra. In the first part of this paper, we give the Maurer-Cartan characterization and introduce the cohomology of a quasi-twilled associative algebra.
In a quasi-twilled associative algebra $\mathbb{A}$, a linear map $D: A \rightarrow B$ is called a strong deformation map if $\mathrm{Gr}(D) \subset \mathbb{A}$ is a subalgebra. Such a map generalizes associative algebra homomorphisms, derivations, crossed homomorphisms and the associative analogue of modified {\sf r}-matrices. We introduce the cohomology of a strong deformation map $D$ unifying the cohomologies of all the operators mentioned above. We also define the governing algebra for the pair $(\mathbb{A}, D)$ to study simultaneous deformations of both $\mathbb{A}$ and $D$.
On the other hand, a linear map $r: B \rightarrow A$ is called a weak deformation map if $\mathrm{Gr} (r) \subset \mathbb{A}$ is a subalgebra. Such a map generalizes relative Rota-Baxter operators of any weight, twisted Rota-Baxter operators, Reynolds operators, left-averaging operators and right-averaging operators. Here we define the cohomology and governing algebra of a weak deformation map $r$ (that unify the cohomologies of all the operators mentioned above) and also for the pair $(\mathbb{A}, r)$ that govern simultaneous deformations.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
Corrections to Hawking radiation from asteroid-mass primordial black holes: Numerical evaluation of dissipative effects
Authors:
Emily Koivu,
John Kushan,
Makana Silva,
Gabriel Vasquez,
Arijit Das,
Christopher M Hirata
Abstract:
Primordial black holes (PBHs) are theorized objects that may make up some - or all - of the dark matter in the universe. At the lowest allowed masses, Hawking radiation (in the form of photons or electrons and positrons) is the primary tool to search for PBHs. This paper is part of an ongoing series in which we aim to calculate the $O(α)$ corrections to Hawking radiation from asteroid-mass primord…
▽ More
Primordial black holes (PBHs) are theorized objects that may make up some - or all - of the dark matter in the universe. At the lowest allowed masses, Hawking radiation (in the form of photons or electrons and positrons) is the primary tool to search for PBHs. This paper is part of an ongoing series in which we aim to calculate the $O(α)$ corrections to Hawking radiation from asteroid-mass primordial black holes, based on a perturbative quantum electrodymanics (QED) calculation on Schwarzschild background. Silva et. al. (2023) divided the corrections into dissipative and conservative parts; this work focuses on the numerical computation of the dissipative $O(α)$ corrections to the photon spectrum. We generate spectra for primordial black holes of mass $M=1$-$8 \times 10^{21} m_{\rm planck}$. This calculation confirms the expectation that at low energies, the inner bremsstrahlung radiation is the dominant contribution to the Hawking radiation spectrum. At high energies, the main $O(α)$ effect is a suppression of the photon spectrum due to pair production (emitted $γ\rightarrow e^+e^-$), but this is small compared to the overall spectrum. We compare the low-energy tail in our curved spacetime QED calculation to several approximation schemes in the literature, and find deviations that could have important implications for constraints from Hawking radiation on primordial black holes as dark matter.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Lyapunov spectra and fluctuation relations: Insights from the Galerkin-truncated Burgers equation
Authors:
Arunava Das,
Pinaki Dutta,
Kamal L. Panigrahi,
Vishwanath Shukla
Abstract:
The imposition of a global constraint of the conservation of total kinetic energy on a forced-dissipative Burgers equation yields a governing equation that is invariant under the time-reversal symmetry operation, $\{\mathcal{T}: t \to -t; u \to -u \}$, where $u$ is the velocity field. Moreover, the dissipation term gets strongly modified, as the viscosity is no longer a constant, but a fluctuating…
▽ More
The imposition of a global constraint of the conservation of total kinetic energy on a forced-dissipative Burgers equation yields a governing equation that is invariant under the time-reversal symmetry operation, $\{\mathcal{T}: t \to -t; u \to -u \}$, where $u$ is the velocity field. Moreover, the dissipation term gets strongly modified, as the viscosity is no longer a constant, but a fluctuating, state dependent quantity, which can even become negative in certain dynamical regimes. Despite these differences, the statistical properties of different dynamical regimes of the time-reversible Burgers equation and the standard forced-dssipative Burgers equation are equivalent, à la Gallavotti's conjecture of \textit{equivalence of nonequilibrium ensembles}. We show that the negative viscosity events occur only in the thermalized regime described by the time-reversible equation. This quasi-equilibrium regime is examined by calculating the local Lyapunov spectra and fluctuation relations. A pairing symmetry among the spectra is observed, indicating that the dynamics is chaotic and has an attractor spanning the entire phase space of the system. The violations of the second law of thermodynamics are found to be in accordance with the fluctuation relations, namely the Gallavotti-Cohen relation based on the phase-space contraction rate and the Cohen-Searles fluctuation relation based on the energy production rate. It is also argued that these violations are associated with the effects of the Galerkin-truncation, the latter is responsible for the thermalization.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Diffusive transport of a 2-D magnetized dusty plasma cloud
Authors:
Aman Singh Katariya,
Amita Das,
Animesh Sharma,
Bibhuti Bhushan Sahu
Abstract:
Dusty plasma medium turns out to be an ideal system for studying the strongly coupled behavior of matter. The large size and slow response make their dynamics suitable to be captured through simple diagnostic tools. Furthermore, as the charge on individual particles is significantly higher than the electronic charge, the interaction amongst them can be in a strong coupling regime even at room temp…
▽ More
Dusty plasma medium turns out to be an ideal system for studying the strongly coupled behavior of matter. The large size and slow response make their dynamics suitable to be captured through simple diagnostic tools. Furthermore, as the charge on individual particles is significantly higher than the electronic charge, the interaction amongst them can be in a strong coupling regime even at room temperatures and normal densities. Such charged dust particles are often present in several industrial plasma-based processes and can have a detrimental influence. For instance, in magnetrons, the sputtering phenomena may be affected by the accumulation of charged impurity clusters. The objective here is to understand the transport behavior of these particles in the presence of an externally applied magnetic field. For this purpose, Molecular Dynamics (MD) simulations are performed using an open-source large-scale atomic/molecular massively parallel simulator (LAMMPS). The dependence of the transport coefficient on the applied magnetic field and prevalent collisional processes has been discerned through simulations in detail.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting
Authors:
Alloy Das,
Sanket Biswas,
Umapada Pal,
Josep Lladós,
Saumik Bhattacharya
Abstract:
The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions. This paper presents FastTextSpotter, a framework that integrates a Swin Transformer visual backbone with a Transformer Encoder-Decoder architecture, enhanced by a novel, faster self-…
▽ More
The proliferation of scene text in both structured and unstructured environments presents significant challenges in optical character recognition (OCR), necessitating more efficient and robust text spotting solutions. This paper presents FastTextSpotter, a framework that integrates a Swin Transformer visual backbone with a Transformer Encoder-Decoder architecture, enhanced by a novel, faster self-attention unit, SAC2, to improve processing speeds while maintaining accuracy. FastTextSpotter has been validated across multiple datasets, including ICDAR2015 for regular texts and CTW1500 and TotalText for arbitrary-shaped texts, benchmarking against current state-of-the-art models. Our results indicate that FastTextSpotter not only achieves superior accuracy in detecting and recognizing multilingual scene text (English and Vietnamese) but also improves model efficiency, thereby setting new benchmarks in the field. This study underscores the potential of advanced transformer architectures in improving the adaptability and speed of text spotting applications in diverse real-world settings. The dataset, code, and pre-trained models have been released in our Github.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Automatic Detection of COVID-19 from Chest X-ray Images Using Deep Learning Model
Authors:
Alloy Das,
Rohit Agarwal,
Rituparna Singh,
Arindam Chowdhury,
Debashis Nandi
Abstract:
The infectious disease caused by novel corona virus (2019-nCoV) has been widely spreading since last year and has shaken the entire world. It has caused an unprecedented effect on daily life, global economy and public health. Hence this disease detection has life-saving importance for both patients as well as doctors. Due to limited test kits, it is also a daunting task to test every patient with…
▽ More
The infectious disease caused by novel corona virus (2019-nCoV) has been widely spreading since last year and has shaken the entire world. It has caused an unprecedented effect on daily life, global economy and public health. Hence this disease detection has life-saving importance for both patients as well as doctors. Due to limited test kits, it is also a daunting task to test every patient with severe respiratory problems using conventional techniques (RT-PCR). Thus implementing an automatic diagnosis system is urgently required to overcome the scarcity problem of Covid-19 test kits at hospital, health care systems. The diagnostic approach is mainly classified into two categories-laboratory based and Chest radiography approach. In this paper, a novel approach for computerized corona virus (2019-nCoV) detection from lung x-ray images is presented. Here, we propose models using deep learning to show the effectiveness of diagnostic systems. In the experimental result, we evaluate proposed models on publicly available data-set which exhibit satisfactory performance and promising results compared with other previous existing methods.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Assessing realistic binding energies of some essential interstellar radicals with amorphous solid water. A fully quantum chemical approach
Authors:
Milan Sil,
Arghyadeb Roy,
Prasanta Gorai,
Naoki Nakatani,
Takashi Shimonishi,
Kenji Furuya,
Natalia Inostroza-Pino,
Paola Caselli,
Ankan Das
Abstract:
In the absence of laboratory data, state-of-the-art quantum chemical approaches can provide estimates of the binding energy (BE) of interstellar species with grains. Without BE values, contemporary astrochemical models are compelled to utilize wild guesses, often delivering misleading information. Here, we employed a fully quantum chemical approach to estimate the BE of seven diatomic radicals - C…
▽ More
In the absence of laboratory data, state-of-the-art quantum chemical approaches can provide estimates of the binding energy (BE) of interstellar species with grains. Without BE values, contemporary astrochemical models are compelled to utilize wild guesses, often delivering misleading information. Here, we employed a fully quantum chemical approach to estimate the BE of seven diatomic radicals - CH, NH, OH, SH, CN, NS, and NO - that play a crucial role in shaping the interstellar chemical composition, using a suitable amorphous solid water model as a substrate since water is the principal constituent of interstellar ice in dense and shielded regions. While the BEs are compatible with physisorption, the binding of CH in some sites shows chemisorption, in which a chemical bond to an oxygen atom of a water molecule is formed. While no structural change has been observed for the CN radical, it is believed that the formation of a hemibonded system between the outer layer of the water cluster and the radical is the reason for the unusually large BE in one of the binding sites considered in our study. A significantly lower BE for NO, consistent with recent calculations, is obtained, which helps explain the recently observed HONO/NH$_2$OH and HONO/HNO ratios in the low-mass hot corino IRAS 16293-2422 B with chemical models.
△ Less
Submitted 16 September, 2024; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Detection of a Transient Quasi-periodic Oscillation in $γ$-Rays from Blazar PKS 2255-282
Authors:
Ajay Sharma,
Anuvab Banerjee,
Avik Kumar Das,
Avijit Mandal,
Debanjan Bose
Abstract:
We conducted a comprehensive variability analysis of the blazar PKS 2255-282 using Fermi-LAT observations spanning over four years, from MJD 57783.5 to 59358.5. Our analysis revealed a transient quasi-periodic oscillation (QPO) with a period of 93$\pm$2.6 days. We employed a variety of Fourier-based methods, including the Lomb-Scargle Periodogram (LSP) and Weighted Wavelet Z-Transform (WWZ), as we…
▽ More
We conducted a comprehensive variability analysis of the blazar PKS 2255-282 using Fermi-LAT observations spanning over four years, from MJD 57783.5 to 59358.5. Our analysis revealed a transient quasi-periodic oscillation (QPO) with a period of 93$\pm$2.6 days. We employed a variety of Fourier-based methods, including the Lomb-Scargle Periodogram (LSP) and Weighted Wavelet Z-Transform (WWZ), as well as time domain analysis techniques such as Seasonal and Non-Seasonal Autoregressive Integrated Moving Average (ARIMA) models and the Stochastic modeling with Stochastically Driven Damped Harmonic Oscillator (SHO) models. Consistently, the QPO with a period of 93 days was detected across all methods used. The observed peak in LSP and time-averaged WWZ plots has a significance level of 4.06$σ$ and 3.96$σ$, respectively. To understand the source of flux modulations in the light curve, we explored various physical models. A plausible scenario involves the precession of the jet with a high Lorentz factor or the movement of a plasma blob along a helical trajectory within the relativistic jet.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs
Authors:
Ronit Singhal,
Pransh Patwa,
Parth Patwa,
Aman Chadha,
Amitava Das
Abstract:
Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is very challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset (Schlichtkrull et al., 2023) to assess the performan…
▽ More
Given the widespread dissemination of misinformation on social media, implementing fact-checking mechanisms for online claims is essential. Manually verifying every claim is very challenging, underscoring the need for an automated fact-checking system. This paper presents our system designed to address this issue. We utilize the Averitec dataset (Schlichtkrull et al., 2023) to assess the performance of our fact-checking system. In addition to veracity prediction, our system provides supporting evidence, which is extracted from the dataset. We develop a Retrieve and Generate (RAG) pipeline to extract relevant evidence sentences from a knowledge base, which are then inputted along with the claim into a large language model (LLM) for classification. We also evaluate the few-shot In-Context Learning (ICL) capabilities of multiple LLMs. Our system achieves an 'Averitec' score of 0.33, which is a 22% absolute improvement over the baseline. Our Code is publicly available on https://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-shot-in-context-learning-with-llms.
△ Less
Submitted 4 October, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Fractional Quantum Hall phases of graphene beyond ultra-short range intervalley-anisotropic interaction
Authors:
Oleg Grigorev,
Ankur Das
Abstract:
Recent experimental and theoretical development in the Quantum Hall effect in monolayer graphene showed that the previous model of the valley-anisotropy interaction is incomplete, as it was assumed to be ultra-short range (USR). In this work, we use exact diagonalization to go beyond the ultra-short range to find the different phases for $ν=2/3$. We model the interaction as Yukawa so that we can c…
▽ More
Recent experimental and theoretical development in the Quantum Hall effect in monolayer graphene showed that the previous model of the valley-anisotropy interaction is incomplete, as it was assumed to be ultra-short range (USR). In this work, we use exact diagonalization to go beyond the ultra-short range to find the different phases for $ν=2/3$. We model the interaction as Yukawa so that we can control the range as a proof of concept. Even in this simple setting, we discovered how dropping the USR condition shifts the transition borders in favour of certain phases, leads to a new bond-ordered phase appearing, and breaks the ferromagnetic phase in two competing states as a result of lifting the USR-driven degeneracy.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks
Authors:
Niyar R Barman,
Krish Sharma,
Ashhar Aziz,
Shashwat Bajpai,
Shwetangshu Biswas,
Vasu Sharma,
Vinija Jain,
Aman Chadha,
Amit Sheth,
Amitava Das
Abstract:
The rapid advancement of text-to-image generation systems, exemplified by models like Stable Diffusion, Midjourney, Imagen, and DALL-E, has heightened concerns about their potential misuse. In response, companies like Meta and Google have intensified their efforts to implement watermarking techniques on AI-generated images to curb the circulation of potentially misleading visuals. However, in this…
▽ More
The rapid advancement of text-to-image generation systems, exemplified by models like Stable Diffusion, Midjourney, Imagen, and DALL-E, has heightened concerns about their potential misuse. In response, companies like Meta and Google have intensified their efforts to implement watermarking techniques on AI-generated images to curb the circulation of potentially misleading visuals. However, in this paper, we argue that current image watermarking methods are fragile and susceptible to being circumvented through visual paraphrase attacks. The proposed visual paraphraser operates in two steps. First, it generates a caption for the given image using KOSMOS-2, one of the latest state-of-the-art image captioning systems. Second, it passes both the original image and the generated caption to an image-to-image diffusion system. During the denoising step of the diffusion pipeline, the system generates a visually similar image that is guided by the text caption. The resulting image is a visual paraphrase and is free of any watermarks. Our empirical findings demonstrate that visual paraphrase attacks can effectively remove watermarks from images. This paper provides a critical assessment, empirically revealing the vulnerability of existing watermarking techniques to visual paraphrase attacks. While we do not propose solutions to this issue, this paper serves as a call to action for the scientific community to prioritize the development of more robust watermarking techniques. Our first-of-its-kind visual paraphrase dataset and accompanying code are publicly available.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Enhancing ASL Recognition with GCNs and Successive Residual Connections
Authors:
Ushnish Sarkar,
Archisman Chakraborti,
Tapas Samanta,
Sarbajit Pal,
Amitabha Das
Abstract:
This study presents a novel approach for enhancing American Sign Language (ASL) recognition using Graph Convolutional Networks (GCNs) integrated with successive residual connections. The method leverages the MediaPipe framework to extract key landmarks from each hand gesture, which are then used to construct graph representations. A robust preprocessing pipeline, including translational and scale…
▽ More
This study presents a novel approach for enhancing American Sign Language (ASL) recognition using Graph Convolutional Networks (GCNs) integrated with successive residual connections. The method leverages the MediaPipe framework to extract key landmarks from each hand gesture, which are then used to construct graph representations. A robust preprocessing pipeline, including translational and scale normalization techniques, ensures consistency across the dataset. The constructed graphs are fed into a GCN-based neural architecture with residual connections to improve network stability. The architecture achieves state-of-the-art results, demonstrating superior generalization capabilities with a validation accuracy of 99.14%.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Towards Effective Authorship Attribution: Integrating Class-Incremental Learning
Authors:
Mostafa Rahgouy,
Hamed Babaei Giglou,
Mehnaz Tabassum,
Dongji Feng,
Amit Das,
Taher Rahgooy,
Gerry Dozier,
Cheryl D. Seals
Abstract:
AA is the process of attributing an unidentified document to its true author from a predefined group of known candidates, each possessing multiple samples. The nature of AA necessitates accommodating emerging new authors, as each individual must be considered unique. This uniqueness can be attributed to various factors, including their stylistic preferences, areas of expertise, gender, cultural ba…
▽ More
AA is the process of attributing an unidentified document to its true author from a predefined group of known candidates, each possessing multiple samples. The nature of AA necessitates accommodating emerging new authors, as each individual must be considered unique. This uniqueness can be attributed to various factors, including their stylistic preferences, areas of expertise, gender, cultural background, and other personal characteristics that influence their writing. These diverse attributes contribute to the distinctiveness of each author, making it essential for AA systems to recognize and account for these variations. However, current AA benchmarks commonly overlook this uniqueness and frame the problem as a closed-world classification, assuming a fixed number of authors throughout the system's lifespan and neglecting the inclusion of emerging new authors. This oversight renders the majority of existing approaches ineffective for real-world applications of AA, where continuous learning is essential. These inefficiencies manifest as current models either resist learning new authors or experience catastrophic forgetting, where the introduction of new data causes the models to lose previously acquired knowledge. To address these inefficiencies, we propose redefining AA as CIL, where new authors are introduced incrementally after the initial training phase, allowing the system to adapt and learn continuously. To achieve this, we briefly examine subsequent CIL approaches introduced in other domains. Moreover, we have adopted several well-known CIL methods, along with an examination of their strengths and weaknesses in the context of AA. Additionally, we outline potential future directions for advancing CIL AA systems. As a result, our paper can serve as a starting point for evolving AA systems from closed-world models to continual learning through CIL paradigms.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.