subscribe to arXiv mailings

Mitigating Embedding Collapse in Diffusion Models for Categorical Data

Authors: Bac Nguyen, and Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Stefano Ermon, Yuki Mitsufuji

Abstract: Latent diffusion models have enabled continuous-state diffusion models to handle a variety of datasets, including categorical data. However, most methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performan… ▽ More Latent diffusion models have enabled continuous-state diffusion models to handle a variety of datasets, including categorical data. However, most methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performance, our analysis shows that end-to-end training risks embedding collapse, degrading generation quality. To address this issue, we introduce CATDM, a continuous diffusion framework within the embedding space that stabilizes training. We propose a novel objective combining the joint embedding-diffusion variational lower bound with a Consistency-Matching (CM) regularizer, alongside a shifted cosine noise schedule and random dropping strategy. The CM regularizer ensures the recovery of the true data distribution. Experiments on benchmarks show that CATDM mitigates embedding collapse, yielding superior results on FFHQ, LSUN Churches, and LSUN Bedrooms. In particular, CATDM achieves an FID of 6.81 on ImageNet $256\times256$ with 50 steps. It outperforms non-autoregressive models in machine translation and is on a par with previous methods in text generation. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.14710 [pdf, other]

G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving

Authors: Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon, Yuki Mitsufuji

Abstract: Recent literature has effectively utilized diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limit… ▽ More Recent literature has effectively utilized diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limited their application to inverse problems formulated in continuous spaces. This paper presents a novel method for addressing linear inverse problems by leveraging image-generation models based on discrete diffusion as priors. We overcome these limitations by approximating the true posterior distribution with a variational distribution constructed from categorical distributions and continuous relaxation techniques. Furthermore, we employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states, demonstrating that our method performs comparably to continuous diffusion techniques. To the best of our knowledge, this is the first approach to use discrete diffusion model-based priors for solving image inverse problems. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.03507 [pdf, other]

LOO-PIT: A sensitive posterior test

Authors: Alan B. H. Nguyen, Marco Bonici, Glen McGee, Will J. Percival

Abstract: With the advent of the next generation of astrophysics experiments, the volume of data available to researchers will be greater than ever. As these projects will significantly drive down statistical uncertainties in measurements, it is crucial to develop novel tools to assess the ability of our models to fit these data within the specified errors. We introduce to astronomy the Leave One Out-Probab… ▽ More With the advent of the next generation of astrophysics experiments, the volume of data available to researchers will be greater than ever. As these projects will significantly drive down statistical uncertainties in measurements, it is crucial to develop novel tools to assess the ability of our models to fit these data within the specified errors. We introduce to astronomy the Leave One Out-Probability Integral Transform (LOO-PIT) technique. This first estimates the LOO posterior predictive distributions based on the model and likelihood distribution specified, then evaluates the quality of the match between the model and data by applying the PIT to each estimated distribution and data point, outputting a LOO-PIT distribution. Deviations between this output distribution and that expected can be characterised visually and with a standard Kolmogorov--Smirnov distribution test. We compare LOO-PIT and the more common $χ^2$ test using both a simplified model and a more realistic astrophysics problem, where we consider fitting Baryon Acoustic Oscillations in galaxy survey data with contamination from emission line interlopers. LOO-PIT and $χ^2$ tend to find different signals from the contaminants, and using these tests in conjunction increases the statistical power compared to using either test alone. We also show that LOO-PIT outperforms $χ^2$ in certain realistic test cases. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: 28 pages, 13 figures. Prepared for submission to JCAP. Comments welcomed

arXiv:2410.00020 [pdf, other]

doi 10.1109/BSN58485.2023.10331561

Loneliness Forecasting Using Multi-modal Wearable and Mobile Sensing in Everyday Settings

Authors: Zhongqi Yang, Iman Azimi, Salar Jafarlou, Sina Labbaf, Brenda Nguyen, Hana Qureshi, Christopher Marcotullio, Jessica L. Borelli, Nikil Dutt, Amir M. Rahmani

Abstract: The adverse effects of loneliness on both physical and mental well-being are profound. Although previous research has utilized mobile sensing techniques to detect mental health issues, few studies have utilized state-of-the-art wearable devices to forecast loneliness and estimate the physiological manifestations of loneliness and its predictive nature. The primary objective of this study is to exa… ▽ More The adverse effects of loneliness on both physical and mental well-being are profound. Although previous research has utilized mobile sensing techniques to detect mental health issues, few studies have utilized state-of-the-art wearable devices to forecast loneliness and estimate the physiological manifestations of loneliness and its predictive nature. The primary objective of this study is to examine the feasibility of forecasting loneliness by employing wearable devices, such as smart rings and watches, to monitor early physiological indicators of loneliness. Furthermore, smartphones are employed to capture initial behavioral signs of loneliness. To accomplish this, we employed personalized machine learning techniques, leveraging a comprehensive dataset comprising physiological and behavioral information obtained during our study involving the monitoring of college students. Through the development of personalized models, we achieved a notable accuracy of 0.82 and an F-1 score of 0.82 in forecasting loneliness levels seven days in advance. Additionally, the application of Shapley values facilitated model explainability. The wealth of data provided by this study, coupled with the forecasting methodology employed, possesses the potential to augment interventions and facilitate the early identification of loneliness within populations at risk. △ Less

Submitted 15 September, 2024; originally announced October 2024.

Journal ref: 2023 IEEE 19th International Conference on Body Sensor Networks (BSN), 1-4

arXiv:2409.18476 [pdf]

Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models

Authors: Nguyen Gia Bach, Chanh Minh Tran, Eiji Kamioka, Phan Xuan Tan

Abstract: Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while… ▽ More Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while learning-based methods, particularly those using convolutional neural networks (CNNs) and generative adversarial networks (GANs), offer more robust solutions but face limitations such as inadequate enhancement, unstable training, or mode collapse. Denoising diffusion probabilistic models (DDPMs) have emerged as a state-of-the-art approach in image-to-image tasks but require intensive computational complexity to achieve the desired underwater image enhancement (UIE) using the recent UW-DDPM solution. To address these challenges, this paper introduces UW-DiffPhys, a novel physical-based and diffusion-based UIE approach. UW-DiffPhys combines light-computation physical-based UIE network components with a denoising U-Net to replace the computationally intensive distribution transformation U-Net in the existing UW-DDPM framework, reducing complexity while maintaining performance. Additionally, the Denoising Diffusion Implicit Model (DDIM) is employed to accelerate the inference process through non-Markovian sampling. Experimental results demonstrate that UW-DiffPhys achieved a substantial reduction in computational complexity and inference time compared to UW-DDPM, with competitive performance in key metrics such as PSNR, SSIM, UCIQE, and an improvement in the overall underwater image quality UIQM metric. The implementation code can be found at the following repository: https://github.com/bachzz/UW-DiffPhys △ Less

Submitted 27 September, 2024; originally announced September 2024.

arXiv:2409.10955 [pdf, other]

Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style

Authors: Yuepei Li, Kang Zhou, Qiao Qiao, Bach Nguyen, Qing Wang, Qi Li

Abstract: Retrieval-augmented generation (RAG) improves Large Language Models (LLMs) by incorporating external information into the response generation process. However, how context-faithful LLMs are and what factors influence LLMs' context-faithfulness remain largely unexplored. In this study, we investigate the impact of memory strength and evidence presentation on LLMs' receptiveness to external evidence… ▽ More Retrieval-augmented generation (RAG) improves Large Language Models (LLMs) by incorporating external information into the response generation process. However, how context-faithful LLMs are and what factors influence LLMs' context-faithfulness remain largely unexplored. In this study, we investigate the impact of memory strength and evidence presentation on LLMs' receptiveness to external evidence. We introduce a method to quantify the memory strength of LLMs by measuring the divergence in LLMs' responses to different paraphrases of the same question, which is not considered by previous works. We also generate evidence in various styles to evaluate the effects of evidence in different styles. Two datasets are used for evaluation: Natural Questions (NQ) with popular questions and popQA featuring long-tail questions. Our results show that for questions with high memory strength, LLMs are more likely to rely on internal memory, particularly for larger LLMs such as GPT-4. On the other hand, presenting paraphrased evidence significantly increases LLMs' receptiveness compared to simple repetition or adding details. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.10445 [pdf, other]

Deep-Wide Learning Assistance for Insect Pest Classification

Authors: Toan Nguyen, Huy Nguyen, Huy Ung, Hieu Ung, Binh Nguyen

Abstract: Accurate insect pest recognition plays a critical role in agriculture. It is a challenging problem due to the intricate characteristics of insects. In this paper, we present DeWi, novel learning assistance for insect pest classification. With a one-stage and alternating training strategy, DeWi simultaneously improves several Convolutional Neural Networks in two perspectives: discrimination (by opt… ▽ More Accurate insect pest recognition plays a critical role in agriculture. It is a challenging problem due to the intricate characteristics of insects. In this paper, we present DeWi, novel learning assistance for insect pest classification. With a one-stage and alternating training strategy, DeWi simultaneously improves several Convolutional Neural Networks in two perspectives: discrimination (by optimizing a triplet margin loss in a supervised training manner) and generalization (via data augmentation). From that, DeWi can learn discriminative and in-depth features of insect pests (deep) yet still generalize well to a large number of insect categories (wide). Experimental results show that DeWi achieves the highest performances on two insect pest classification benchmarks (76.44\% accuracy on the IP102 dataset and 99.79\% accuracy on the D0 dataset, respectively). In addition, extensive evaluations and ablation studies are conducted to thoroughly investigate our DeWi and demonstrate its superiority. Our source code is available at https://github.com/toannguyen1904/DeWi. △ Less

Submitted 16 September, 2024; originally announced September 2024.

arXiv:2409.04598 [pdf, other]

A Novel Dataset for Video-Based Autism Classification Leveraging Extra-Stimulatory Behavior

Authors: Manuel Serna-Aguilera, Xuan Bac Nguyen, Han-Seok Seo, Khoa Luu

Abstract: Autism Spectrum Disorder (ASD) can affect individuals at varying degrees of intensity, from challenges in overall health, communication, and sensory processing, and this often begins at a young age. Thus, it is critical for medical professionals to be able to accurately diagnose ASD in young children, but doing so is difficult. Deep learning can be responsibly leveraged to improve productivity in… ▽ More Autism Spectrum Disorder (ASD) can affect individuals at varying degrees of intensity, from challenges in overall health, communication, and sensory processing, and this often begins at a young age. Thus, it is critical for medical professionals to be able to accurately diagnose ASD in young children, but doing so is difficult. Deep learning can be responsibly leveraged to improve productivity in addressing this task. The availability of data, however, remains a considerable obstacle. Hence, in this work, we introduce the Video ASD dataset--a dataset that contains video frame convolutional and attention map feature data--to foster further progress in the task of ASD classification. The original videos showcase children reacting to chemo-sensory stimuli, among auditory, touch, and vision This dataset contains the features of the frames spanning 2,467 videos, for a total of approximately 1.4 million frames. Additionally, head pose angles are included to account for head movement noise, as well as full-sentence text labels for the taste and smell videos that describe how the facial expression changes before, immediately after, and long after interaction with the stimuli. In addition to providing features, we also test foundation models on this data to showcase how movement noise affects performance and the need for more data and more complex labels. △ Less

Submitted 6 September, 2024; originally announced September 2024.

arXiv:2409.00045 [pdf, other]

PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy

Authors: Debesh Jha, Nikhil Kumar Tomar, Vanshali Sharma, Quoc-Huy Trinh, Koushik Biswas, Hongyi Pan, Ritika K. Jha, Gorkem Durak, Alexander Hann, Jonas Varkey, Hang Viet Dao, Long Van Dao, Binh Phuc Nguyen, Khanh Cong Pham, Quang Trung Tran, Nikolaos Papachrysos, Brandon Rieders, Peter Thelin Schmidt, Enrik Geissler, Tyler Berzin, Pål Halvorsen, Michael A. Riegler, Thomas de Lange, Ulas Bagci

Abstract: Colonoscopy is the primary method for examination, detection, and removal of polyps. Regular screening helps detect and prevent colorectal cancer at an early curable stage. However, challenges such as variation among the endoscopists' skills, bowel quality preparation, and complex nature of the large intestine which cause large number of polyp miss-rate. These missed polyps can develop into cancer… ▽ More Colonoscopy is the primary method for examination, detection, and removal of polyps. Regular screening helps detect and prevent colorectal cancer at an early curable stage. However, challenges such as variation among the endoscopists' skills, bowel quality preparation, and complex nature of the large intestine which cause large number of polyp miss-rate. These missed polyps can develop into cancer later on, which underscores the importance of improving the detection methods. A computer-aided diagnosis system can support physicians by assisting in detecting overlooked polyps. However, one of the important challenges for developing novel deep learning models for automatic polyp detection and segmentation is the lack of publicly available, multi-center large and diverse datasets. To address this gap, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos to design efficient polyp detection and segmentation architectures. The dataset has been developed and verified by a team of 10 gastroenterologists. PolypDB comprises of images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) and three medical centers from Norway, Sweden and Vietnam. Thus, we split the dataset based on modality and medical center for modality-wise and center-wise analysis. We provide a benchmark on each modality using eight popular segmentation methods and six standard benchmark polyp detection methods. Furthermore, we also provide benchmark on center-wise under federated learning settings. Our dataset is public and can be downloaded at \url{https://osf.io/pr7ms/}. △ Less

Submitted 19 August, 2024; originally announced September 2024.

arXiv:2408.12064 [pdf, other]

A Practical Introduction to Benchmarking and Characterization of Quantum Computers

Authors: Akel Hashim, Long B. Nguyen, Noah Goss, Brian Marinelli, Ravi K. Naik, Trevor Chistolini, Jordan Hines, J. P. Marceaux, Yosep Kim, Pranav Gokhale, Teague Tomesh, Senrui Chen, Liang Jiang, Samuele Ferracin, Kenneth Rudinger, Timothy Proctor, Kevin C. Young, Robin Blume-Kohout, Irfan Siddiqi

Abstract: Rapid progress in quantum technology has transformed quantum computing and quantum information science from theoretical possibilities into tangible engineering challenges. Breakthroughs in quantum algorithms, quantum simulations, and quantum error correction are bringing useful quantum computation closer to fruition. These remarkable achievements have been facilitated by advances in quantum charac… ▽ More Rapid progress in quantum technology has transformed quantum computing and quantum information science from theoretical possibilities into tangible engineering challenges. Breakthroughs in quantum algorithms, quantum simulations, and quantum error correction are bringing useful quantum computation closer to fruition. These remarkable achievements have been facilitated by advances in quantum characterization, verification, and validation (QCVV). QCVV methods and protocols enable scientists and engineers to scrutinize, understand, and enhance the performance of quantum information-processing devices. In this Tutorial, we review the fundamental principles underpinning QCVV, and introduce a diverse array of QCVV tools used by quantum researchers. We define and explain QCVV's core models and concepts -- quantum states, measurements, and processes -- and illustrate how these building blocks are leveraged to examine a target system or operation. We survey and introduce protocols ranging from simple qubit characterization to advanced benchmarking methods. Along the way, we provide illustrated examples and detailed descriptions of the protocols, highlight the advantages and disadvantages of each, and discuss their potential scalability to future large-scale quantum computers. This Tutorial serves as a guidebook for researchers unfamiliar with the benchmarking and characterization of quantum computers, and also as a detailed reference for experienced practitioners. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.09339 [pdf, other]

doi 10.1016/j.physd.2024.134342

Power-law localization in one-dimensional systems with nonlinear disorder under fixed input conditions

Authors: Ba Phi Nguyen, Kihong Kim

Abstract: We conduct a numerical investigation into wave propagation and localization in one-dimensional lattices subject to nonlinear disorder, focusing on cases with fixed input conditions. Utilizing a discrete nonlinear Schrödinger equation with Kerr-type nonlinearity and a random coefficient, we compute the averages and variances of the transmittance, $T$, and its logarithm, as functions of the system s… ▽ More We conduct a numerical investigation into wave propagation and localization in one-dimensional lattices subject to nonlinear disorder, focusing on cases with fixed input conditions. Utilizing a discrete nonlinear Schrödinger equation with Kerr-type nonlinearity and a random coefficient, we compute the averages and variances of the transmittance, $T$, and its logarithm, as functions of the system size $L$, while maintaining constant intensity for the incident wave. In cases of purely nonlinear disorder, we observe power-law localization characterized by $\langle T \rangle \propto L^{-γ_a}$ and $\langle \ln T \rangle \approx -γ_g \ln L$ for sufficiently large $L$. At low input intensities, a transition from exponential to power-law decay in $\langle T \rangle$ occurs as $L$ increases. The exponents $γ_a$ and $γ_g$ are nearly identical, converging to approximately 0.5 as the strength of the nonlinear disorder, $β$, increases. Additionally, the variance of $T$ decays according to a power law with an exponent close to 1, and the variance of $\ln T$ approaches a small constant as $L$ increases. These findings are consistent with an underlying log-normal distribution of $T$ and suggest that wave propagation behavior becomes nearly deterministic as the system size increases. When both linear and nonlinear disorders are present, we observe a transition from power-law to exponential decay in transmittance with increasing $L$ when the strength of linear disorder, $V$, is less than $β$. As $V$ increases, the region exhibiting power-law localization diminishes and eventually disappears when $V$ exceeds $β$, leading to standard Anderson localization. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: 8 pages, 7 figures

Journal ref: Physica D 469, 134342 (2024)

arXiv:2408.03596 [pdf, other]

Hierarchical Quantum Control Gates for Functional MRI Understanding

Authors: Xuan-Bac Nguyen, Hoang-Quan Nguyen, Hugh Churchill, Samee U. Khan, Khoa Luu

Abstract: Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. Th… ▽ More Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. This approach includes two novel modules: the Local Quantum Control Gate (LQCG) and the Global Quantum Control Gate (GQCG), which are designed to extract local and global features of fMRI signals, respectively. Our method operates end-to-end on a quantum machine, leveraging quantum mechanics to learn patterns within extremely high-dimensional fMRI signals, such as 30,000 samples which is a challenge for classical computers. Empirical results demonstrate that our approach significantly outperforms classical methods. Additionally, we found that the proposed quantum model is more stable and less prone to overfitting than the classical methods. △ Less

Submitted 22 September, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: Accepted to IEEE Workshop on Signal Processing Systems (SiPS 2024)

arXiv:2407.13159 [pdf, other]

Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-based Visual Odometry in Underwater terrain

Authors: Bach Nguyen Gia, Chanh Minh Tran, Kamioka Eiji, Tan Phan Xuan

Abstract: This paper addresses the challenge of improving learning-based monocular visual odometry (VO) in underwater environments by integrating principles of underwater optical imaging to manipulate optical flow estimation. Leveraging the inherent properties of underwater imaging, the novel wflow-TartanVO is introduced, enhancing the accuracy of VO systems for autonomous underwater vehicles (AUVs). The pr… ▽ More This paper addresses the challenge of improving learning-based monocular visual odometry (VO) in underwater environments by integrating principles of underwater optical imaging to manipulate optical flow estimation. Leveraging the inherent properties of underwater imaging, the novel wflow-TartanVO is introduced, enhancing the accuracy of VO systems for autonomous underwater vehicles (AUVs). The proposed method utilizes a normalized medium transmission map as a weight map to adjust the estimated optical flow for emphasizing regions with lower degradation and suppressing uncertain regions affected by underwater light scattering and absorption. wflow-TartanVO does not require fine-tuning of pre-trained VO models, thus promoting its adaptability to different environments and camera models. Evaluation of different real-world underwater datasets demonstrates the outperformance of wflow-TartanVO over baseline VO methods, as evidenced by the considerably reduced Absolute Trajectory Error (ATE). The implementation code is available at: https://github.com/bachzz/wflow-TartanVO △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.11078 [pdf, other]

Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator

Authors: Thinh Nguyen, Khoa D Doan, Binh T. Nguyen, Danh Le-Phuoc, Kok-Seng Wong

Abstract: Federated Class-Incremental Learning (FCIL) increasingly becomes important in the decentralized setting, where it enables multiple participants to collaboratively train a global model to perform well on a sequence of tasks without sharing their private data. In FCIL, conventional Federated Learning algorithms such as FedAVG often suffer from catastrophic forgetting, resulting in significant perfor… ▽ More Federated Class-Incremental Learning (FCIL) increasingly becomes important in the decentralized setting, where it enables multiple participants to collaboratively train a global model to perform well on a sequence of tasks without sharing their private data. In FCIL, conventional Federated Learning algorithms such as FedAVG often suffer from catastrophic forgetting, resulting in significant performance declines on earlier tasks. Recent works, based on generative models, produce synthetic images to help mitigate this issue across all classes, but these approaches' testing accuracy on previous classes is still much lower than recent classes, i.e., having better plasticity than stability. To overcome these issues, this paper presents Federated Global Twin Generator (FedGTG), an FCIL framework that exploits privacy-preserving generative-model training on the global side without accessing client data. Specifically, the server trains a data generator and a feature generator to create two types of information from all seen classes, and then it sends the synthetic data to the client side. The clients then use feature-direction-controlling losses to make the local models retain knowledge and learn new tasks well. We extensively analyze the robustness of FedGTG on natural images, as well as its ability to converge to flat local minima and achieve better-predicting confidence (calibration). Experimental results on CIFAR-10, CIFAR-100, and tiny-ImageNet demonstrate the improvements in accuracy and forgetting measures of FedGTG compared to previous frameworks. △ Less

Submitted 13 July, 2024; originally announced July 2024.

MSC Class: 68T07 (Primary); 68T45 (Secondary)

arXiv:2407.08215 [pdf, other]

Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

Authors: Seyed Amir Hossein Aqajari, Ziyu Wang, Ali Tazarv, Sina Labbaf, Salar Jafarlou, Brenda Nguyen, Nikil Dutt, Marco Levorato, Amir M. Rahmani

Abstract: In today's fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without bu… ▽ More In today's fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without burdening users. The challenge is to timely send EMAs, especially during stress, balancing monitoring efficiency and user convenience. This paper introduces a novel context-aware active reinforcement learning (RL) algorithm for enhanced stress detection using Photoplethysmography (PPG) data from smartwatches and contextual data from smartphones. Our approach dynamically selects optimal times for deploying EMAs, utilizing the user's immediate context to maximize label accuracy and minimize intrusiveness. Initially, the study was executed in an offline environment to refine the label collection process, aiming to increase accuracy while reducing user burden. Later, we integrated a real-time label collection mechanism, transitioning to an online methodology. This shift resulted in an 11% improvement in stress detection efficiency. Incorporating contextual data improved model accuracy by 4%. Personalization studies indicated a 10% enhancement in AUC-ROC scores, demonstrating better stress level differentiation. This research marks a significant move towards personalized, context-driven real-time stress monitoring methods. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.05607 [pdf, other]

Weakly Supervised Test-Time Domain Adaptation for Object Detection

Authors: Anh-Dzung Doan, Bach Long Nguyen, Terry Lim, Madhuka Jayawardhana, Surabhi Gupta, Christophe Guettier, Ian Reid, Markus Wagner, Tat-Jun Chin

Abstract: Prior to deployment, an object detector is trained on a dataset compiled from a previous data collection campaign. However, the environment in which the object detector is deployed will invariably evolve, particularly in outdoor settings where changes in lighting, weather and seasons will significantly affect the appearance of the scene and target objects. It is almost impossible for all potential… ▽ More Prior to deployment, an object detector is trained on a dataset compiled from a previous data collection campaign. However, the environment in which the object detector is deployed will invariably evolve, particularly in outdoor settings where changes in lighting, weather and seasons will significantly affect the appearance of the scene and target objects. It is almost impossible for all potential scenarios that the object detector may come across to be present in a finite training dataset. This necessitates continuous updates to the object detector to maintain satisfactory performance. Test-time domain adaptation techniques enable machine learning models to self-adapt based on the distributions of the testing data. However, existing methods mainly focus on fully automated adaptation, which makes sense for applications such as self-driving cars. Despite the prevalence of fully automated approaches, in some applications such as surveillance, there is usually a human operator overseeing the system's operation. We propose to involve the operator in test-time domain adaptation to raise the performance of object detection beyond what is achievable by fully automated adaptation. To reduce manual effort, the proposed method only requires the operator to provide weak labels, which are then used to guide the adaptation process. Furthermore, the proposed method can be performed in a streaming setting, where each online sample is observed only once. We show that the proposed method outperforms existing works, demonstrating a great benefit of human-in-the-loop test-time domain adaptation. Our code is publicly available at https://github.com/dzungdoan6/WSTTA △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.04893 [pdf, other]

Qudit Dynamical Decoupling on a Superconducting Quantum Processor

Authors: Vinay Tripathi, Noah Goss, Arian Vezvaee, Long B. Nguyen, Irfan Siddiqi, Daniel A. Lidar

Abstract: Multi-level qudit systems are increasingly being explored as alternatives to traditional qubit systems due to their denser information storage and processing potential. However, qudits are more susceptible to decoherence than qubits due to increased loss channels, noise sensitivity, and crosstalk. To address these challenges, we develop protocols for dynamical decoupling (DD) of qudit systems base… ▽ More Multi-level qudit systems are increasingly being explored as alternatives to traditional qubit systems due to their denser information storage and processing potential. However, qudits are more susceptible to decoherence than qubits due to increased loss channels, noise sensitivity, and crosstalk. To address these challenges, we develop protocols for dynamical decoupling (DD) of qudit systems based on the Heisenberg-Weyl group. We implement and experimentally verify these DD protocols on a superconducting transmon processor that supports qudit operation based on qutrits $(d=3)$ and ququarts $(d=4)$. Specifically, we demonstrate single-qudit DD sequences to decouple qutrits and ququarts from system-bath-induced decoherence. We also introduce two-qudit DD sequences designed to suppress the detrimental cross-Kerr couplings between coupled qudits. This allows us to demonstrate a significant improvement in the fidelity of time-evolved qutrit Bell states. Our results highlight the utility of leveraging DD to enable scalable qudit-based quantum computing. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 12 pages, 5 figures, comments are welcome

arXiv:2407.03036 [pdf, other]

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

Authors: Bac Nguyen, Stefan Uhlich, Fabien Cardinaux, Lukas Mauch, Marzieh Edraki, Aaron Courville

Abstract: Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse… ▽ More Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse Adaptation for Fine-Tuning (SAFT), a method that prevents fine-tuning from forgetting the general knowledge in the pre-trained model. SAFT only updates a small subset of important parameters whose gradient magnitude is large, while keeping the other parameters frozen. SAFT is straightforward to implement and conceptually simple. Extensive experiments show that with only 0.1% of the model parameters, SAFT can significantly improve the performance of CLIP. It consistently outperforms baseline methods across several benchmarks. On the few-shot learning benchmark of ImageNet and its variants, SAFT gives a gain of 5.15% on average over the conventional fine-tuning method in OOD settings. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.10783 [pdf, ps, other]

Odd-even mass differences of well and rigidly deformed nuclei in the rare earth region: A test of a newly proposed fit of average pairing matrix elements

Authors: T. V. Nhan Hao, N. N. Bao Nguyen, D. Quang Tam, P. Quentin, Meng-Hock Koh, L. Bonneau

Abstract: We discuss a test of a recently proposed approach to determine average pairing matrix elements within a given interval of single-particle states (sp) around the Fermi level $λ$ as obtained in the so-called uniform gap method (UGM). It takes stock of the crucial role played by the averaged sp level density $\tildeρ(e)$. These matrix elements are deduced within the UGM approach, from microscopically… ▽ More We discuss a test of a recently proposed approach to determine average pairing matrix elements within a given interval of single-particle states (sp) around the Fermi level $λ$ as obtained in the so-called uniform gap method (UGM). It takes stock of the crucial role played by the averaged sp level density $\tildeρ(e)$. These matrix elements are deduced within the UGM approach, from microscopically calculated $\tildeρ(e)$ and gaps obtained from analytical formulae of a semi-classical nature. Two effects generally ignored in similar fits have been taken care of. They are: (a) the correction for a systematic bias in choosing to fit pairing gaps corresponding to equilibrium deformation solutions as discussed by Möller and Nix [Nucl. Phys. A 476, 1 (1992)] and (b) the correction for a systematic spurious enhancement of $\tildeρ(e)$ for protons in the vicinity of $λ$, because of the local Slater approximation used for the treatment of the Coulomb exchange terms in most calculations (see e.g. [Phys. Rev C 84, 014310 (2011)]). This approach has been deemed to be very efficient upon performing Hartree-Fock + BCS (with seniority force and self-consistent blocking when dealing with odd nuclei) calculations of a large sample of well and rigidly deformed even-even rare-earth nuclei. The reproduction of their experimental moments of inertia has been found to be at least of the same quality as what has been obtained in a direct fit of these data [Phys. Rev C 99, 064306 (2019)]. We extend here the test of our approach to the reproduction, in the same region, of three-point odd-even mass differences centered on odd-$N$ or odd-$Z$ nuclei. The agreement with the data is again roughly of the same quality as what has been obtained in a direct fit, as performed in [Phys. Rev C 99, 064306 (2019)]. △ Less

Submitted 16 September, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: 3 tables, submitted to Chinese Physics C

arXiv:2406.06239 [pdf, other]

I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data

Authors: Hoang H. Le, Duy M. H. Nguyen, Omair Shahzad Bhatti, Laszlo Kopacsi, Thinh P. Ngo, Binh T. Nguyen, Michael Barz, Daniel Sonntag

Abstract: Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object r… ▽ More Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations. Such mechanisms enable us to learn embedding functions capable of generalizing to new object angle views, facilitating rapid adaptation and efficient reasoning in dynamic contexts as users navigate their environment. Through experiments conducted on three distinct video sequences, our interactive-based method showcases significant performance improvements over fixed training/testing algorithms, even when trained on considerably smaller annotated samples collected through user feedback. Furthermore, we demonstrate exceptional efficiency in data annotation processes and surpass prior interactive methods that use complete object detectors, combine detectors with convolutional networks, or employ interactive video segmentation. △ Less

Submitted 7 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Updated version

arXiv:2406.05108 [pdf, other]

Adapting Physics-Informed Neural Networks To Optimize ODEs in Mosquito Population Dynamics

Authors: Dinh Viet Cuong, Branislava Lalić, Mina Petrić, Binh Nguyen, Mark Roantree

Abstract: Physics informed neural networks have been gaining popularity due to their unique ability to incorporate physics laws into data-driven models, ensuring that the predictions are not only consistent with empirical data but also align with domain-specific knowledge in the form of physics equations. The integration of physics principles enables the method to require less data while maintaining the rob… ▽ More Physics informed neural networks have been gaining popularity due to their unique ability to incorporate physics laws into data-driven models, ensuring that the predictions are not only consistent with empirical data but also align with domain-specific knowledge in the form of physics equations. The integration of physics principles enables the method to require less data while maintaining the robustness of deep learning in modeling complex dynamical systems. However, current PINN frameworks are not sufficiently mature for real-world ODE systems, especially those with extreme multi-scale behavior such as mosquito population dynamical modelling. In this research, we propose a PINN framework with several improvements for forward and inverse problems for ODE systems with a case study application in modelling the dynamics of mosquito populations. The framework tackles the gradient imbalance and stiff problems posed by mosquito ordinary differential equations. The method offers a simple but effective way to resolve the time causality issue in PINNs by gradually expanding the training time domain until it covers entire domain of interest. As part of a robust evaluation, we conduct experiments using simulated data to evaluate the effectiveness of the approach. Preliminary results indicate that physics-informed machine learning holds significant potential for advancing the study of ecological systems. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02317 [pdf, other]

Generative Conditional Distributions by Neural (Entropic) Optimal Transport

Authors: Bao Nguyen, Binh Nguyen, Hieu Trung Nguyen, Viet Anh Nguyen

Abstract: Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our… ▽ More Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our method relies on the minimax training of two neural networks: a generative network parametrizing the inverse cumulative distribution functions of the conditional distributions and another network parametrizing the conditional Kantorovich potential. To prevent overfitting, we regularize the objective function by penalizing the Lipschitz constant of the network output. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques. Our implementation can be found at https://github.com/nguyenngocbaocmt02/GENTLE. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 15 pages, 8 figures

arXiv:2406.00843 [pdf, other]

Diffusion-Inspired Quantum Noise Mitigation in Parameterized Quantum Circuits

Authors: Hoang-Quan Nguyen, Xuan Bac Nguyen, Samuel Yen-Chi Chen, Hugh Churchill, Nicholas Borys, Samee U. Khan, Khoa Luu

Abstract: Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the… ▽ More Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the PQCs on practical tasks, the errors caused by the quantum noise downgrade the performance when running on real quantum computers. In particular, when the quantum state is transformed through multiple quantum circuit layers, the effect of the quantum noise happens cumulatively and becomes closer to the maximally mixed state or complete noise. This paper studies the relationship between the quantum noise and the diffusion model. Then, we propose a novel diffusion-inspired learning approach to mitigate the quantum noise in the PQCs and reduce the error for specific tasks. Through our experiments, we illustrate the efficiency of the learning strategy and achieve state-of-the-art performance on classification tasks in the quantum noise scenarios. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20882 [pdf, other]

Sheaf HyperNetworks for Personalized Federated Learning

Authors: Bao Nguyen, Lorenzo Sani, Xinchi Qiu, Pietro Liò, Nicholas D. Lane

Abstract: Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heteroph… ▽ More Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heterophily. Moreover, we cannot apply GHNs directly to personalized federated learning (PFL) scenarios, where a priori client relation graph may be absent, private, or inaccessible. To mitigate these limitations in the context of PFL, we propose a novel class of HNs, sheaf hypernetworks (SHNs), which combine cellular sheaf theory with HNs to improve parameter sharing for PFL. We thoroughly evaluate SHNs across diverse PFL tasks, including multi-class classification, traffic and weather forecasting. Additionally, we provide a methodology for constructing client relation graphs in scenarios where such graphs are unavailable. We show that SHNs consistently outperform existing PFL solutions in complex non-IID scenarios. While the baselines' performance fluctuates depending on the task, SHNs show improvements of up to 2.7% in accuracy and 5.3% in lower mean squared error over the best-performing baseline. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 25 pages, 12 figures, 7 tables, pre-print under review

arXiv:2405.19725 [pdf, other]

Quantum Visual Feature Encoding Revisited

Authors: Xuan-Bac Nguyen, Hoang-Quan Nguyen, Hugh Churchill, Samee U. Khan, Khoa Luu

Abstract: Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the enc… ▽ More Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the encoding process, thus complicating the learning process of the quantum machine learning models. In particular, the problem, termed "Quantum Information Gap" (QIG), leads to a gap of information between classical and corresponding quantum features. We provide theoretical proof and practical demonstrations of that found and underscore the significance of QIG, as it directly impacts the performance of quantum machine learning algorithms. To tackle this challenge, we introduce a simple but efficient new loss function named Quantum Information Preserving (QIP) to minimize this gap, resulting in enhanced performance of quantum machine learning algorithms. Extensive experiments validate the effectiveness of our approach, showcasing superior performance compared to current methodologies and consistently achieving state-of-the-art results in quantum modeling. △ Less

Submitted 20 August, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted to Quantum Machine Intelligence

arXiv:2405.19722 [pdf, other]

QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual Clustering

Authors: Xuan-Bac Nguyen, Hoang-Quan Nguyen, Samuel Yen-Chi Chen, Samee U. Khan, Hugh Churchill, Khoa Luu

Abstract: Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In t… ▽ More Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In this study, we introduce QClusformer, a pioneering Transformer-based framework leveraging quantum machines to tackle unsupervised vision clustering challenges. Specifically, we design the Transformer architecture, including the self-attention module and transformer blocks, from a quantum perspective to enable execution on quantum hardware. In addition, we present QClusformer, a variant based on the Transformer architecture, tailored for unsupervised vision clustering tasks. By integrating these elements into an end-to-end framework, QClusformer consistently outperforms previous methods running on classical computers. Empirical evaluations across diverse benchmarks, including MS-Celeb-1M and DeepFashion, underscore the superior performance of QClusformer compared to state-of-the-art methods. △ Less

Submitted 7 August, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18808 [pdf, other]

BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning

Authors: Xuan-Bac Nguyen, Hojin Jang, Xin Li, Samee U. Khan, Pawan Sinha, Khoa Luu

Abstract: The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding b… ▽ More The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding brain representations via fMRI signals. It allows us to identify the brain's Regions of Interest (ROI) of the subjects. Unlike previous brain research methods, which can only identify ROIs for one subject at a time and are limited by the number of subjects, BRACTIVE automatically extends this identification to multiple subjects and ROIs. Our experiments demonstrate that BRACTIVE effectively identifies person-specific regions of interest, such as face and body-selective areas, aligning with neuroscience findings and indicating potential applicability to various object categories. More importantly, we found that leveraging human visual brain activity to guide deep neural networks enhances performance across various benchmarks. It encourages the potential of BRACTIVE in both neuroscience and machine intelligence studies. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18040 [pdf, other]

Fast-FedUL: A Training-Free Federated Unlearning with Provable Skew Resilience

Authors: Thanh Trung Huynh, Trong Bang Nguyen, Phi Le Nguyen, Thanh Tam Nguyen, Matthias Weidlich, Quoc Viet Hung Nguyen, Karl Aberer

Abstract: Federated learning (FL) has recently emerged as a compelling machine learning paradigm, prioritizing the protection of privacy for training data. The increasing demand to address issues such as ``the right to be forgotten'' and combat data poisoning attacks highlights the importance of techniques, known as \textit{unlearning}, which facilitate the removal of specific training data from trained FL… ▽ More Federated learning (FL) has recently emerged as a compelling machine learning paradigm, prioritizing the protection of privacy for training data. The increasing demand to address issues such as ``the right to be forgotten'' and combat data poisoning attacks highlights the importance of techniques, known as \textit{unlearning}, which facilitate the removal of specific training data from trained FL models. Despite numerous unlearning methods proposed for centralized learning, they often prove inapplicable to FL due to fundamental differences in the operation of the two learning paradigms. Consequently, unlearning in FL remains in its early stages, presenting several challenges. Many existing unlearning solutions in FL require a costly retraining process, which can be burdensome for clients. Moreover, these methods are primarily validated through experiments, lacking theoretical assurances. In this study, we introduce Fast-FedUL, a tailored unlearning method for FL, which eliminates the need for retraining entirely. Through meticulous analysis of the target client's influence on the global model in each round, we develop an algorithm to systematically remove the impact of the target client from the trained model. In addition to presenting empirical findings, we offer a theoretical analysis delineating the upper bound of our unlearned model and the exact retrained model (the one obtained through retraining using untargeted clients). Experimental results with backdoor attack scenarios indicate that Fast-FedUL effectively removes almost all traces of the target client, while retaining the knowledge of untargeted clients (obtaining a high accuracy of up to 98\% on the main task). Significantly, Fast-FedUL attains the lowest time complexity, providing a speed that is 1000 times faster than retraining. Our source code is publicly available at \url{https://github.com/thanhtrunghuynh93/fastFedUL}. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted in ECML PKDD 2024

arXiv:2405.16148 [pdf, other]

Accelerating Transformers with Spectrum-Preserving Token Merging

Authors: Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert

Abstract: Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr… ▽ More Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Version 1

arXiv:2405.09583 [pdf, other]

doi 10.18429/JACoW-IPAC2024-WEPC84

Comparison of WarpX and GUINEA-PIG for electron positron collisions

Authors: Bao Nguyen, Arianna Formenti, Remi Lehe, Jean-Luc Vay, Spencer Gessner, Luca Fedeli

Abstract: As part of the Snowmass'21 planning exercise, the Advanced Accelerator Concepts community proposed developing multi-TeV linear colliders and considered beam-beam effects for these machines. Such colliders operate under a high disruption regime with an enormous number of electron-positron pairs produced from QED effects. Thus, it requires a self-consistent treatment of the fields produced by the pa… ▽ More As part of the Snowmass'21 planning exercise, the Advanced Accelerator Concepts community proposed developing multi-TeV linear colliders and considered beam-beam effects for these machines. Such colliders operate under a high disruption regime with an enormous number of electron-positron pairs produced from QED effects. Thus, it requires a self-consistent treatment of the fields produced by the pairs, which is not implemented in state-of-the-art beam-beam codes such as GUINEA-PIG. WarpX is a parallel, open-source, and portable particle-in-cell code with an active developer community that models QED processes with photon and pair generation in relativistic laser-beam interactions. However, its application to beam-beam collisions has yet to be fully explored. In this work, we benchmark the luminosity spectra, photon spectra, and coherent production process from WarpX against GUINEA-PIG in the ILC and ultra-tight collision scenarios. Our performance comparison demonstrates a significant speed-up advantage of WarpX, ensuring a more robust and efficient modeling of electron-positron collisions at multi-TeV energies. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 3 pages conference proceeding. 15th International Particle Accelerator Conference (IPAC'24). Paper ID is WEPC84

Report number: doi:10.18429/JACoW-IPAC2024-WEPC84

Journal ref: International Particle Accelerator Conference 2024

arXiv:2405.03206 [pdf, other]

Vietnamese AI Generated Text Detection

Authors: Quang-Dan Tran, Van-Quan Nguyen, Quang-Huy Pham, K. B. Thang Nguyen, Trong-Hop Do

Abstract: In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we pre… ▽ More In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we present a dataset named ViDetect, comprising 6.800 samples of Vietnamese essay, with 3.400 samples authored by humans and the remainder generated by LLMs, serving the purpose of detecting text generated by AI. We conducted evaluations using state-of-the-art methods, including ViT5, BartPho, PhoBERT, mDeberta V3, and mBERT. These results contribute not only to the growing body of research on detecting text generated by AI but also demonstrate the adaptability and effectiveness of different methods in the Vietnamese language context. This research lays the foundation for future advancements in AI-generated text detection and provides valuable insights for researchers in the field of natural language processing. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.00722 [pdf, other]

LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study

Authors: Van Bach Nguyen, Paul Youssef, Jörg Schlötterer, Christin Seifert

Abstract: As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how… ▽ More As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions. △ Less

Submitted 26 April, 2024; originally announced May 2024.

arXiv:2404.17475 [pdf, other]

CEval: A Benchmark for Evaluating Counterfactual Text Generation

Authors: Van Bach Nguyen, Jörg Schlötterer, Christin Seifert

Abstract: Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, in… ▽ More Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, includes common counterfactual datasets with human annotations, standard baselines (MICE, GDBA, CREST) and the open-source language model LLAMA-2. Our experiments found no perfect method for generating counterfactual text. Methods that excel at counterfactual metrics often produce lower-quality text while LLMs with simple prompts generate high-quality text but struggle with counterfactual criteria. By making CEval available as an open-source Python library, we encourage the community to contribute more methods and maintain consistent evaluation in future work. △ Less

Submitted 13 August, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Journal ref: INLG 2024

arXiv:2404.16697 [pdf, other]

High-Coherence Kerr-cat qubit in 2D architecture

Authors: Ahmed Hajr, Bingcheng Qing, Ke Wang, Gerwin Koolstra, Zahra Pedramrazi, Ziqi Kang, Larry Chen, Long B. Nguyen, Christian Junger, Noah Goss, Irwin Huang, Bibek Bhandari, Nicholas E. Frattini, Shruti Puri, Justin Dressel, Andrew N. Jordan, David Santiago, Irfan Siddiqi

Abstract: The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions neces… ▽ More The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions necessary for stabilizing and controlling this qubit has traditionally required strong microwave drives that heat the qubit and degrade its performance. In contrast, increasing the coupling to the drive port removes the need for strong drives at the expense of large Purcell decay. By integrating an effective band-block filter on-chip, we overcome this trade-off and realize a Kerr-cat qubit in a scalable 2D superconducting circuit with high coherence. This filter provides 30 dB of isolation at the qubit frequency with negligible attenuation at the frequencies required for stabilization and readout. We experimentally demonstrate quantum non-demolition readout fidelity of 99.6% for a cat with 8 photons. Also, to have high-fidelity universal control over this qubit, we combine fast Rabi oscillations with a new demonstration of the X(90) gate through phase modulation of the stabilization drive. Finally, the lifetime in this architecture is examined as a function of the cat size of up to 10 photons in the oscillator achieving a bit-flip time higher than 1 ms and only a linear decrease in the phase-flip time, in good agreement with the theoretical analysis of the circuit. Our qubit shows promise as a building block for fault-tolerant quantum processors with a small footprint. △ Less

Submitted 19 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.15721 [pdf, other]

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Authors: Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville

Abstract: Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often f… ▽ More Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts. △ Less

Submitted 14 September, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: Conference paper at ECCV 2024. 11 pages main, 23 pages total including references and appendix

arXiv:2403.16613 [pdf, other]

Technical Development of a Semi-Autonomous Robotic Partition

Authors: Binh Vinh Duc Nguyen, Andrew Vande Moere

Abstract: This technical description details the design and engineering process of a semi-autonomous robotic partition. This robotic partition prototype was subsequently employed in a longer-term evaluation in-the-wild study conducted by the authors in a real-world office setting. This technical description details the design and engineering process of a semi-autonomous robotic partition. This robotic partition prototype was subsequently employed in a longer-term evaluation in-the-wild study conducted by the authors in a real-world office setting. △ Less