-
Mitigating Embedding Collapse in Diffusion Models for Categorical Data
Authors:
Bac Nguyen,
and Chieh-Hsin Lai,
Yuhta Takida,
Naoki Murata,
Toshimitsu Uesaka,
Stefano Ermon,
Yuki Mitsufuji
Abstract:
Latent diffusion models have enabled continuous-state diffusion models to handle a variety of datasets, including categorical data. However, most methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performan…
▽ More
Latent diffusion models have enabled continuous-state diffusion models to handle a variety of datasets, including categorical data. However, most methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performance, our analysis shows that end-to-end training risks embedding collapse, degrading generation quality. To address this issue, we introduce CATDM, a continuous diffusion framework within the embedding space that stabilizes training. We propose a novel objective combining the joint embedding-diffusion variational lower bound with a Consistency-Matching (CM) regularizer, alongside a shifted cosine noise schedule and random dropping strategy. The CM regularizer ensures the recovery of the true data distribution. Experiments on benchmarks show that CATDM mitigates embedding collapse, yielding superior results on FFHQ, LSUN Churches, and LSUN Bedrooms. In particular, CATDM achieves an FID of 6.81 on ImageNet $256\times256$ with 50 steps. It outperforms non-autoregressive models in machine translation and is on a par with previous methods in text generation.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving
Authors:
Naoki Murata,
Chieh-Hsin Lai,
Yuhta Takida,
Toshimitsu Uesaka,
Bac Nguyen,
Stefano Ermon,
Yuki Mitsufuji
Abstract:
Recent literature has effectively utilized diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limit…
▽ More
Recent literature has effectively utilized diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limited their application to inverse problems formulated in continuous spaces. This paper presents a novel method for addressing linear inverse problems by leveraging image-generation models based on discrete diffusion as priors. We overcome these limitations by approximating the true posterior distribution with a variational distribution constructed from categorical distributions and continuous relaxation techniques. Furthermore, we employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states, demonstrating that our method performs comparably to continuous diffusion techniques. To the best of our knowledge, this is the first approach to use discrete diffusion model-based priors for solving image inverse problems.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
LOO-PIT: A sensitive posterior test
Authors:
Alan B. H. Nguyen,
Marco Bonici,
Glen McGee,
Will J. Percival
Abstract:
With the advent of the next generation of astrophysics experiments, the volume of data available to researchers will be greater than ever. As these projects will significantly drive down statistical uncertainties in measurements, it is crucial to develop novel tools to assess the ability of our models to fit these data within the specified errors. We introduce to astronomy the Leave One Out-Probab…
▽ More
With the advent of the next generation of astrophysics experiments, the volume of data available to researchers will be greater than ever. As these projects will significantly drive down statistical uncertainties in measurements, it is crucial to develop novel tools to assess the ability of our models to fit these data within the specified errors. We introduce to astronomy the Leave One Out-Probability Integral Transform (LOO-PIT) technique. This first estimates the LOO posterior predictive distributions based on the model and likelihood distribution specified, then evaluates the quality of the match between the model and data by applying the PIT to each estimated distribution and data point, outputting a LOO-PIT distribution. Deviations between this output distribution and that expected can be characterised visually and with a standard Kolmogorov--Smirnov distribution test. We compare LOO-PIT and the more common $χ^2$ test using both a simplified model and a more realistic astrophysics problem, where we consider fitting Baryon Acoustic Oscillations in galaxy survey data with contamination from emission line interlopers. LOO-PIT and $χ^2$ tend to find different signals from the contaminants, and using these tests in conjunction increases the statistical power compared to using either test alone. We also show that LOO-PIT outperforms $χ^2$ in certain realistic test cases.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Loneliness Forecasting Using Multi-modal Wearable and Mobile Sensing in Everyday Settings
Authors:
Zhongqi Yang,
Iman Azimi,
Salar Jafarlou,
Sina Labbaf,
Brenda Nguyen,
Hana Qureshi,
Christopher Marcotullio,
Jessica L. Borelli,
Nikil Dutt,
Amir M. Rahmani
Abstract:
The adverse effects of loneliness on both physical and mental well-being are profound. Although previous research has utilized mobile sensing techniques to detect mental health issues, few studies have utilized state-of-the-art wearable devices to forecast loneliness and estimate the physiological manifestations of loneliness and its predictive nature. The primary objective of this study is to exa…
▽ More
The adverse effects of loneliness on both physical and mental well-being are profound. Although previous research has utilized mobile sensing techniques to detect mental health issues, few studies have utilized state-of-the-art wearable devices to forecast loneliness and estimate the physiological manifestations of loneliness and its predictive nature. The primary objective of this study is to examine the feasibility of forecasting loneliness by employing wearable devices, such as smart rings and watches, to monitor early physiological indicators of loneliness. Furthermore, smartphones are employed to capture initial behavioral signs of loneliness. To accomplish this, we employed personalized machine learning techniques, leveraging a comprehensive dataset comprising physiological and behavioral information obtained during our study involving the monitoring of college students. Through the development of personalized models, we achieved a notable accuracy of 0.82 and an F-1 score of 0.82 in forecasting loneliness levels seven days in advance. Additionally, the application of Shapley values facilitated model explainability. The wealth of data provided by this study, coupled with the forecasting methodology employed, possesses the potential to augment interventions and facilitate the early identification of loneliness within populations at risk.
△ Less
Submitted 15 September, 2024;
originally announced October 2024.
-
Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models
Authors:
Nguyen Gia Bach,
Chanh Minh Tran,
Eiji Kamioka,
Phan Xuan Tan
Abstract:
Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while…
▽ More
Underwater vision is crucial for autonomous underwater vehicles (AUVs), and enhancing degraded underwater images in real-time on a resource-constrained AUV is a key challenge due to factors like light absorption and scattering, or the sufficient model computational complexity to resolve such factors. Traditional image enhancement techniques lack adaptability to varying underwater conditions, while learning-based methods, particularly those using convolutional neural networks (CNNs) and generative adversarial networks (GANs), offer more robust solutions but face limitations such as inadequate enhancement, unstable training, or mode collapse. Denoising diffusion probabilistic models (DDPMs) have emerged as a state-of-the-art approach in image-to-image tasks but require intensive computational complexity to achieve the desired underwater image enhancement (UIE) using the recent UW-DDPM solution. To address these challenges, this paper introduces UW-DiffPhys, a novel physical-based and diffusion-based UIE approach. UW-DiffPhys combines light-computation physical-based UIE network components with a denoising U-Net to replace the computationally intensive distribution transformation U-Net in the existing UW-DDPM framework, reducing complexity while maintaining performance. Additionally, the Denoising Diffusion Implicit Model (DDIM) is employed to accelerate the inference process through non-Markovian sampling. Experimental results demonstrate that UW-DiffPhys achieved a substantial reduction in computational complexity and inference time compared to UW-DDPM, with competitive performance in key metrics such as PSNR, SSIM, UCIQE, and an improvement in the overall underwater image quality UIQM metric. The implementation code can be found at the following repository: https://github.com/bachzz/UW-DiffPhys
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style
Authors:
Yuepei Li,
Kang Zhou,
Qiao Qiao,
Bach Nguyen,
Qing Wang,
Qi Li
Abstract:
Retrieval-augmented generation (RAG) improves Large Language Models (LLMs) by incorporating external information into the response generation process. However, how context-faithful LLMs are and what factors influence LLMs' context-faithfulness remain largely unexplored. In this study, we investigate the impact of memory strength and evidence presentation on LLMs' receptiveness to external evidence…
▽ More
Retrieval-augmented generation (RAG) improves Large Language Models (LLMs) by incorporating external information into the response generation process. However, how context-faithful LLMs are and what factors influence LLMs' context-faithfulness remain largely unexplored. In this study, we investigate the impact of memory strength and evidence presentation on LLMs' receptiveness to external evidence. We introduce a method to quantify the memory strength of LLMs by measuring the divergence in LLMs' responses to different paraphrases of the same question, which is not considered by previous works. We also generate evidence in various styles to evaluate the effects of evidence in different styles. Two datasets are used for evaluation: Natural Questions (NQ) with popular questions and popQA featuring long-tail questions. Our results show that for questions with high memory strength, LLMs are more likely to rely on internal memory, particularly for larger LLMs such as GPT-4. On the other hand, presenting paraphrased evidence significantly increases LLMs' receptiveness compared to simple repetition or adding details.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Deep-Wide Learning Assistance for Insect Pest Classification
Authors:
Toan Nguyen,
Huy Nguyen,
Huy Ung,
Hieu Ung,
Binh Nguyen
Abstract:
Accurate insect pest recognition plays a critical role in agriculture. It is a challenging problem due to the intricate characteristics of insects. In this paper, we present DeWi, novel learning assistance for insect pest classification. With a one-stage and alternating training strategy, DeWi simultaneously improves several Convolutional Neural Networks in two perspectives: discrimination (by opt…
▽ More
Accurate insect pest recognition plays a critical role in agriculture. It is a challenging problem due to the intricate characteristics of insects. In this paper, we present DeWi, novel learning assistance for insect pest classification. With a one-stage and alternating training strategy, DeWi simultaneously improves several Convolutional Neural Networks in two perspectives: discrimination (by optimizing a triplet margin loss in a supervised training manner) and generalization (via data augmentation). From that, DeWi can learn discriminative and in-depth features of insect pests (deep) yet still generalize well to a large number of insect categories (wide). Experimental results show that DeWi achieves the highest performances on two insect pest classification benchmarks (76.44\% accuracy on the IP102 dataset and 99.79\% accuracy on the D0 dataset, respectively). In addition, extensive evaluations and ablation studies are conducted to thoroughly investigate our DeWi and demonstrate its superiority. Our source code is available at https://github.com/toannguyen1904/DeWi.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
A Novel Dataset for Video-Based Autism Classification Leveraging Extra-Stimulatory Behavior
Authors:
Manuel Serna-Aguilera,
Xuan Bac Nguyen,
Han-Seok Seo,
Khoa Luu
Abstract:
Autism Spectrum Disorder (ASD) can affect individuals at varying degrees of intensity, from challenges in overall health, communication, and sensory processing, and this often begins at a young age. Thus, it is critical for medical professionals to be able to accurately diagnose ASD in young children, but doing so is difficult. Deep learning can be responsibly leveraged to improve productivity in…
▽ More
Autism Spectrum Disorder (ASD) can affect individuals at varying degrees of intensity, from challenges in overall health, communication, and sensory processing, and this often begins at a young age. Thus, it is critical for medical professionals to be able to accurately diagnose ASD in young children, but doing so is difficult. Deep learning can be responsibly leveraged to improve productivity in addressing this task. The availability of data, however, remains a considerable obstacle. Hence, in this work, we introduce the Video ASD dataset--a dataset that contains video frame convolutional and attention map feature data--to foster further progress in the task of ASD classification. The original videos showcase children reacting to chemo-sensory stimuli, among auditory, touch, and vision This dataset contains the features of the frames spanning 2,467 videos, for a total of approximately 1.4 million frames. Additionally, head pose angles are included to account for head movement noise, as well as full-sentence text labels for the taste and smell videos that describe how the facial expression changes before, immediately after, and long after interaction with the stimuli. In addition to providing features, we also test foundation models on this data to showcase how movement noise affects performance and the need for more data and more complex labels.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy
Authors:
Debesh Jha,
Nikhil Kumar Tomar,
Vanshali Sharma,
Quoc-Huy Trinh,
Koushik Biswas,
Hongyi Pan,
Ritika K. Jha,
Gorkem Durak,
Alexander Hann,
Jonas Varkey,
Hang Viet Dao,
Long Van Dao,
Binh Phuc Nguyen,
Khanh Cong Pham,
Quang Trung Tran,
Nikolaos Papachrysos,
Brandon Rieders,
Peter Thelin Schmidt,
Enrik Geissler,
Tyler Berzin,
Pål Halvorsen,
Michael A. Riegler,
Thomas de Lange,
Ulas Bagci
Abstract:
Colonoscopy is the primary method for examination, detection, and removal of polyps. Regular screening helps detect and prevent colorectal cancer at an early curable stage. However, challenges such as variation among the endoscopists' skills, bowel quality preparation, and complex nature of the large intestine which cause large number of polyp miss-rate. These missed polyps can develop into cancer…
▽ More
Colonoscopy is the primary method for examination, detection, and removal of polyps. Regular screening helps detect and prevent colorectal cancer at an early curable stage. However, challenges such as variation among the endoscopists' skills, bowel quality preparation, and complex nature of the large intestine which cause large number of polyp miss-rate. These missed polyps can develop into cancer later on, which underscores the importance of improving the detection methods. A computer-aided diagnosis system can support physicians by assisting in detecting overlooked polyps. However, one of the important challenges for developing novel deep learning models for automatic polyp detection and segmentation is the lack of publicly available, multi-center large and diverse datasets. To address this gap, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos to design efficient polyp detection and segmentation architectures. The dataset has been developed and verified by a team of 10 gastroenterologists. PolypDB comprises of images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) and three medical centers from Norway, Sweden and Vietnam. Thus, we split the dataset based on modality and medical center for modality-wise and center-wise analysis. We provide a benchmark on each modality using eight popular segmentation methods and six standard benchmark polyp detection methods. Furthermore, we also provide benchmark on center-wise under federated learning settings. Our dataset is public and can be downloaded at \url{https://osf.io/pr7ms/}.
△ Less
Submitted 19 August, 2024;
originally announced September 2024.
-
A Practical Introduction to Benchmarking and Characterization of Quantum Computers
Authors:
Akel Hashim,
Long B. Nguyen,
Noah Goss,
Brian Marinelli,
Ravi K. Naik,
Trevor Chistolini,
Jordan Hines,
J. P. Marceaux,
Yosep Kim,
Pranav Gokhale,
Teague Tomesh,
Senrui Chen,
Liang Jiang,
Samuele Ferracin,
Kenneth Rudinger,
Timothy Proctor,
Kevin C. Young,
Robin Blume-Kohout,
Irfan Siddiqi
Abstract:
Rapid progress in quantum technology has transformed quantum computing and quantum information science from theoretical possibilities into tangible engineering challenges. Breakthroughs in quantum algorithms, quantum simulations, and quantum error correction are bringing useful quantum computation closer to fruition. These remarkable achievements have been facilitated by advances in quantum charac…
▽ More
Rapid progress in quantum technology has transformed quantum computing and quantum information science from theoretical possibilities into tangible engineering challenges. Breakthroughs in quantum algorithms, quantum simulations, and quantum error correction are bringing useful quantum computation closer to fruition. These remarkable achievements have been facilitated by advances in quantum characterization, verification, and validation (QCVV). QCVV methods and protocols enable scientists and engineers to scrutinize, understand, and enhance the performance of quantum information-processing devices. In this Tutorial, we review the fundamental principles underpinning QCVV, and introduce a diverse array of QCVV tools used by quantum researchers. We define and explain QCVV's core models and concepts -- quantum states, measurements, and processes -- and illustrate how these building blocks are leveraged to examine a target system or operation. We survey and introduce protocols ranging from simple qubit characterization to advanced benchmarking methods. Along the way, we provide illustrated examples and detailed descriptions of the protocols, highlight the advantages and disadvantages of each, and discuss their potential scalability to future large-scale quantum computers. This Tutorial serves as a guidebook for researchers unfamiliar with the benchmarking and characterization of quantum computers, and also as a detailed reference for experienced practitioners.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Power-law localization in one-dimensional systems with nonlinear disorder under fixed input conditions
Authors:
Ba Phi Nguyen,
Kihong Kim
Abstract:
We conduct a numerical investigation into wave propagation and localization in one-dimensional lattices subject to nonlinear disorder, focusing on cases with fixed input conditions. Utilizing a discrete nonlinear Schrödinger equation with Kerr-type nonlinearity and a random coefficient, we compute the averages and variances of the transmittance, $T$, and its logarithm, as functions of the system s…
▽ More
We conduct a numerical investigation into wave propagation and localization in one-dimensional lattices subject to nonlinear disorder, focusing on cases with fixed input conditions. Utilizing a discrete nonlinear Schrödinger equation with Kerr-type nonlinearity and a random coefficient, we compute the averages and variances of the transmittance, $T$, and its logarithm, as functions of the system size $L$, while maintaining constant intensity for the incident wave. In cases of purely nonlinear disorder, we observe power-law localization characterized by $\langle T \rangle \propto L^{-γ_a}$ and $\langle \ln T \rangle \approx -γ_g \ln L$ for sufficiently large $L$. At low input intensities, a transition from exponential to power-law decay in $\langle T \rangle$ occurs as $L$ increases. The exponents $γ_a$ and $γ_g$ are nearly identical, converging to approximately 0.5 as the strength of the nonlinear disorder, $β$, increases. Additionally, the variance of $T$ decays according to a power law with an exponent close to 1, and the variance of $\ln T$ approaches a small constant as $L$ increases. These findings are consistent with an underlying log-normal distribution of $T$ and suggest that wave propagation behavior becomes nearly deterministic as the system size increases. When both linear and nonlinear disorders are present, we observe a transition from power-law to exponential decay in transmittance with increasing $L$ when the strength of linear disorder, $V$, is less than $β$. As $V$ increases, the region exhibiting power-law localization diminishes and eventually disappears when $V$ exceeds $β$, leading to standard Anderson localization.
△ Less
Submitted 17 August, 2024;
originally announced August 2024.
-
Hierarchical Quantum Control Gates for Functional MRI Understanding
Authors:
Xuan-Bac Nguyen,
Hoang-Quan Nguyen,
Hugh Churchill,
Samee U. Khan,
Khoa Luu
Abstract:
Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. Th…
▽ More
Quantum computing has emerged as a powerful tool for solving complex problems intractable for classical computers, particularly in popular fields such as cryptography, optimization, and neurocomputing. In this paper, we present a new quantum-based approach named the Hierarchical Quantum Control Gates (HQCG) method for efficient understanding of Functional Magnetic Resonance Imaging (fMRI) data. This approach includes two novel modules: the Local Quantum Control Gate (LQCG) and the Global Quantum Control Gate (GQCG), which are designed to extract local and global features of fMRI signals, respectively. Our method operates end-to-end on a quantum machine, leveraging quantum mechanics to learn patterns within extremely high-dimensional fMRI signals, such as 30,000 samples which is a challenge for classical computers. Empirical results demonstrate that our approach significantly outperforms classical methods. Additionally, we found that the proposed quantum model is more stable and less prone to overfitting than the classical methods.
△ Less
Submitted 22 September, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Attenuation-Aware Weighted Optical Flow with Medium Transmission Map for Learning-based Visual Odometry in Underwater terrain
Authors:
Bach Nguyen Gia,
Chanh Minh Tran,
Kamioka Eiji,
Tan Phan Xuan
Abstract:
This paper addresses the challenge of improving learning-based monocular visual odometry (VO) in underwater environments by integrating principles of underwater optical imaging to manipulate optical flow estimation. Leveraging the inherent properties of underwater imaging, the novel wflow-TartanVO is introduced, enhancing the accuracy of VO systems for autonomous underwater vehicles (AUVs). The pr…
▽ More
This paper addresses the challenge of improving learning-based monocular visual odometry (VO) in underwater environments by integrating principles of underwater optical imaging to manipulate optical flow estimation. Leveraging the inherent properties of underwater imaging, the novel wflow-TartanVO is introduced, enhancing the accuracy of VO systems for autonomous underwater vehicles (AUVs). The proposed method utilizes a normalized medium transmission map as a weight map to adjust the estimated optical flow for emphasizing regions with lower degradation and suppressing uncertain regions affected by underwater light scattering and absorption. wflow-TartanVO does not require fine-tuning of pre-trained VO models, thus promoting its adaptability to different environments and camera models. Evaluation of different real-world underwater datasets demonstrates the outperformance of wflow-TartanVO over baseline VO methods, as evidenced by the considerably reduced Absolute Trajectory Error (ATE). The implementation code is available at: https://github.com/bachzz/wflow-TartanVO
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Overcoming Catastrophic Forgetting in Federated Class-Incremental Learning via Federated Global Twin Generator
Authors:
Thinh Nguyen,
Khoa D Doan,
Binh T. Nguyen,
Danh Le-Phuoc,
Kok-Seng Wong
Abstract:
Federated Class-Incremental Learning (FCIL) increasingly becomes important in the decentralized setting, where it enables multiple participants to collaboratively train a global model to perform well on a sequence of tasks without sharing their private data. In FCIL, conventional Federated Learning algorithms such as FedAVG often suffer from catastrophic forgetting, resulting in significant perfor…
▽ More
Federated Class-Incremental Learning (FCIL) increasingly becomes important in the decentralized setting, where it enables multiple participants to collaboratively train a global model to perform well on a sequence of tasks without sharing their private data. In FCIL, conventional Federated Learning algorithms such as FedAVG often suffer from catastrophic forgetting, resulting in significant performance declines on earlier tasks. Recent works, based on generative models, produce synthetic images to help mitigate this issue across all classes, but these approaches' testing accuracy on previous classes is still much lower than recent classes, i.e., having better plasticity than stability. To overcome these issues, this paper presents Federated Global Twin Generator (FedGTG), an FCIL framework that exploits privacy-preserving generative-model training on the global side without accessing client data. Specifically, the server trains a data generator and a feature generator to create two types of information from all seen classes, and then it sends the synthetic data to the client side. The clients then use feature-direction-controlling losses to make the local models retain knowledge and learn new tasks well. We extensively analyze the robustness of FedGTG on natural images, as well as its ability to converge to flat local minima and achieve better-predicting confidence (calibration). Experimental results on CIFAR-10, CIFAR-100, and tiny-ImageNet demonstrate the improvements in accuracy and forgetting measures of FedGTG compared to previous frameworks.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach
Authors:
Seyed Amir Hossein Aqajari,
Ziyu Wang,
Ali Tazarv,
Sina Labbaf,
Salar Jafarlou,
Brenda Nguyen,
Nikil Dutt,
Marco Levorato,
Amir M. Rahmani
Abstract:
In today's fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without bu…
▽ More
In today's fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without burdening users. The challenge is to timely send EMAs, especially during stress, balancing monitoring efficiency and user convenience. This paper introduces a novel context-aware active reinforcement learning (RL) algorithm for enhanced stress detection using Photoplethysmography (PPG) data from smartwatches and contextual data from smartphones. Our approach dynamically selects optimal times for deploying EMAs, utilizing the user's immediate context to maximize label accuracy and minimize intrusiveness. Initially, the study was executed in an offline environment to refine the label collection process, aiming to increase accuracy while reducing user burden. Later, we integrated a real-time label collection mechanism, transitioning to an online methodology. This shift resulted in an 11% improvement in stress detection efficiency. Incorporating contextual data improved model accuracy by 4%. Personalization studies indicated a 10% enhancement in AUC-ROC scores, demonstrating better stress level differentiation. This research marks a significant move towards personalized, context-driven real-time stress monitoring methods.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Weakly Supervised Test-Time Domain Adaptation for Object Detection
Authors:
Anh-Dzung Doan,
Bach Long Nguyen,
Terry Lim,
Madhuka Jayawardhana,
Surabhi Gupta,
Christophe Guettier,
Ian Reid,
Markus Wagner,
Tat-Jun Chin
Abstract:
Prior to deployment, an object detector is trained on a dataset compiled from a previous data collection campaign. However, the environment in which the object detector is deployed will invariably evolve, particularly in outdoor settings where changes in lighting, weather and seasons will significantly affect the appearance of the scene and target objects. It is almost impossible for all potential…
▽ More
Prior to deployment, an object detector is trained on a dataset compiled from a previous data collection campaign. However, the environment in which the object detector is deployed will invariably evolve, particularly in outdoor settings where changes in lighting, weather and seasons will significantly affect the appearance of the scene and target objects. It is almost impossible for all potential scenarios that the object detector may come across to be present in a finite training dataset. This necessitates continuous updates to the object detector to maintain satisfactory performance. Test-time domain adaptation techniques enable machine learning models to self-adapt based on the distributions of the testing data. However, existing methods mainly focus on fully automated adaptation, which makes sense for applications such as self-driving cars. Despite the prevalence of fully automated approaches, in some applications such as surveillance, there is usually a human operator overseeing the system's operation. We propose to involve the operator in test-time domain adaptation to raise the performance of object detection beyond what is achievable by fully automated adaptation. To reduce manual effort, the proposed method only requires the operator to provide weak labels, which are then used to guide the adaptation process. Furthermore, the proposed method can be performed in a streaming setting, where each online sample is observed only once. We show that the proposed method outperforms existing works, demonstrating a great benefit of human-in-the-loop test-time domain adaptation. Our code is publicly available at https://github.com/dzungdoan6/WSTTA
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Qudit Dynamical Decoupling on a Superconducting Quantum Processor
Authors:
Vinay Tripathi,
Noah Goss,
Arian Vezvaee,
Long B. Nguyen,
Irfan Siddiqi,
Daniel A. Lidar
Abstract:
Multi-level qudit systems are increasingly being explored as alternatives to traditional qubit systems due to their denser information storage and processing potential. However, qudits are more susceptible to decoherence than qubits due to increased loss channels, noise sensitivity, and crosstalk. To address these challenges, we develop protocols for dynamical decoupling (DD) of qudit systems base…
▽ More
Multi-level qudit systems are increasingly being explored as alternatives to traditional qubit systems due to their denser information storage and processing potential. However, qudits are more susceptible to decoherence than qubits due to increased loss channels, noise sensitivity, and crosstalk. To address these challenges, we develop protocols for dynamical decoupling (DD) of qudit systems based on the Heisenberg-Weyl group. We implement and experimentally verify these DD protocols on a superconducting transmon processor that supports qudit operation based on qutrits $(d=3)$ and ququarts $(d=4)$. Specifically, we demonstrate single-qudit DD sequences to decouple qutrits and ququarts from system-bath-induced decoherence. We also introduce two-qudit DD sequences designed to suppress the detrimental cross-Kerr couplings between coupled qudits. This allows us to demonstrate a significant improvement in the fidelity of time-evolved qutrit Bell states. Our results highlight the utility of leveraging DD to enable scalable qudit-based quantum computing.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Authors:
Bac Nguyen,
Stefan Uhlich,
Fabien Cardinaux,
Lukas Mauch,
Marzieh Edraki,
Aaron Courville
Abstract:
Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse…
▽ More
Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse Adaptation for Fine-Tuning (SAFT), a method that prevents fine-tuning from forgetting the general knowledge in the pre-trained model. SAFT only updates a small subset of important parameters whose gradient magnitude is large, while keeping the other parameters frozen. SAFT is straightforward to implement and conceptually simple. Extensive experiments show that with only 0.1% of the model parameters, SAFT can significantly improve the performance of CLIP. It consistently outperforms baseline methods across several benchmarks. On the few-shot learning benchmark of ImageNet and its variants, SAFT gives a gain of 5.15% on average over the conventional fine-tuning method in OOD settings.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Odd-even mass differences of well and rigidly deformed nuclei in the rare earth region: A test of a newly proposed fit of average pairing matrix elements
Authors:
T. V. Nhan Hao,
N. N. Bao Nguyen,
D. Quang Tam,
P. Quentin,
Meng-Hock Koh,
L. Bonneau
Abstract:
We discuss a test of a recently proposed approach to determine average pairing matrix elements within a given interval of single-particle states (sp) around the Fermi level $λ$ as obtained in the so-called uniform gap method (UGM). It takes stock of the crucial role played by the averaged sp level density $\tildeρ(e)$. These matrix elements are deduced within the UGM approach, from microscopically…
▽ More
We discuss a test of a recently proposed approach to determine average pairing matrix elements within a given interval of single-particle states (sp) around the Fermi level $λ$ as obtained in the so-called uniform gap method (UGM). It takes stock of the crucial role played by the averaged sp level density $\tildeρ(e)$. These matrix elements are deduced within the UGM approach, from microscopically calculated $\tildeρ(e)$ and gaps obtained from analytical formulae of a semi-classical nature. Two effects generally ignored in similar fits have been taken care of. They are: (a) the correction for a systematic bias in choosing to fit pairing gaps corresponding to equilibrium deformation solutions as discussed by Möller and Nix [Nucl. Phys. A 476, 1 (1992)] and (b) the correction for a systematic spurious enhancement of $\tildeρ(e)$ for protons in the vicinity of $λ$, because of the local Slater approximation used for the treatment of the Coulomb exchange terms in most calculations (see e.g. [Phys. Rev C 84, 014310 (2011)]). This approach has been deemed to be very efficient upon performing Hartree-Fock + BCS (with seniority force and self-consistent blocking when dealing with odd nuclei) calculations of a large sample of well and rigidly deformed even-even rare-earth nuclei. The reproduction of their experimental moments of inertia has been found to be at least of the same quality as what has been obtained in a direct fit of these data [Phys. Rev C 99, 064306 (2019)]. We extend here the test of our approach to the reproduction, in the same region, of three-point odd-even mass differences centered on odd-$N$ or odd-$Z$ nuclei. The agreement with the data is again roughly of the same quality as what has been obtained in a direct fit, as performed in [Phys. Rev C 99, 064306 (2019)].
△ Less
Submitted 16 September, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data
Authors:
Hoang H. Le,
Duy M. H. Nguyen,
Omair Shahzad Bhatti,
Laszlo Kopacsi,
Thinh P. Ngo,
Binh T. Nguyen,
Michael Barz,
Daniel Sonntag
Abstract:
Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object r…
▽ More
Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations. Such mechanisms enable us to learn embedding functions capable of generalizing to new object angle views, facilitating rapid adaptation and efficient reasoning in dynamic contexts as users navigate their environment. Through experiments conducted on three distinct video sequences, our interactive-based method showcases significant performance improvements over fixed training/testing algorithms, even when trained on considerably smaller annotated samples collected through user feedback. Furthermore, we demonstrate exceptional efficiency in data annotation processes and surpass prior interactive methods that use complete object detectors, combine detectors with convolutional networks, or employ interactive video segmentation.
△ Less
Submitted 7 July, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Adapting Physics-Informed Neural Networks To Optimize ODEs in Mosquito Population Dynamics
Authors:
Dinh Viet Cuong,
Branislava Lalić,
Mina Petrić,
Binh Nguyen,
Mark Roantree
Abstract:
Physics informed neural networks have been gaining popularity due to their unique ability to incorporate physics laws into data-driven models, ensuring that the predictions are not only consistent with empirical data but also align with domain-specific knowledge in the form of physics equations. The integration of physics principles enables the method to require less data while maintaining the rob…
▽ More
Physics informed neural networks have been gaining popularity due to their unique ability to incorporate physics laws into data-driven models, ensuring that the predictions are not only consistent with empirical data but also align with domain-specific knowledge in the form of physics equations. The integration of physics principles enables the method to require less data while maintaining the robustness of deep learning in modeling complex dynamical systems. However, current PINN frameworks are not sufficiently mature for real-world ODE systems, especially those with extreme multi-scale behavior such as mosquito population dynamical modelling. In this research, we propose a PINN framework with several improvements for forward and inverse problems for ODE systems with a case study application in modelling the dynamics of mosquito populations. The framework tackles the gradient imbalance and stiff problems posed by mosquito ordinary differential equations. The method offers a simple but effective way to resolve the time causality issue in PINNs by gradually expanding the training time domain until it covers entire domain of interest. As part of a robust evaluation, we conduct experiments using simulated data to evaluate the effectiveness of the approach. Preliminary results indicate that physics-informed machine learning holds significant potential for advancing the study of ecological systems.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Generative Conditional Distributions by Neural (Entropic) Optimal Transport
Authors:
Bao Nguyen,
Binh Nguyen,
Hieu Trung Nguyen,
Viet Anh Nguyen
Abstract:
Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our…
▽ More
Learning conditional distributions is challenging because the desired outcome is not a single distribution but multiple distributions that correspond to multiple instances of the covariates. We introduce a novel neural entropic optimal transport method designed to effectively learn generative models of conditional distributions, particularly in scenarios characterized by limited sample sizes. Our method relies on the minimax training of two neural networks: a generative network parametrizing the inverse cumulative distribution functions of the conditional distributions and another network parametrizing the conditional Kantorovich potential. To prevent overfitting, we regularize the objective function by penalizing the Lipschitz constant of the network output. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques. Our implementation can be found at https://github.com/nguyenngocbaocmt02/GENTLE.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Diffusion-Inspired Quantum Noise Mitigation in Parameterized Quantum Circuits
Authors:
Hoang-Quan Nguyen,
Xuan Bac Nguyen,
Samuel Yen-Chi Chen,
Hugh Churchill,
Nicholas Borys,
Samee U. Khan,
Khoa Luu
Abstract:
Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the…
▽ More
Parameterized Quantum Circuits (PQCs) have been acknowledged as a leading strategy to utilize near-term quantum advantages in multiple problems, including machine learning and combinatorial optimization. When applied to specific tasks, the parameters in the quantum circuits are trained to minimize the target function. Although there have been comprehensive studies to improve the performance of the PQCs on practical tasks, the errors caused by the quantum noise downgrade the performance when running on real quantum computers. In particular, when the quantum state is transformed through multiple quantum circuit layers, the effect of the quantum noise happens cumulatively and becomes closer to the maximally mixed state or complete noise. This paper studies the relationship between the quantum noise and the diffusion model. Then, we propose a novel diffusion-inspired learning approach to mitigate the quantum noise in the PQCs and reduce the error for specific tasks. Through our experiments, we illustrate the efficiency of the learning strategy and achieve state-of-the-art performance on classification tasks in the quantum noise scenarios.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Sheaf HyperNetworks for Personalized Federated Learning
Authors:
Bao Nguyen,
Lorenzo Sani,
Xinchi Qiu,
Pietro Liò,
Nicholas D. Lane
Abstract:
Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heteroph…
▽ More
Graph hypernetworks (GHNs), constructed by combining graph neural networks (GNNs) with hypernetworks (HNs), leverage relational data across various domains such as neural architecture search, molecular property prediction and federated learning. Despite GNNs and HNs being individually successful, we show that GHNs present problems compromising their performance, such as over-smoothing and heterophily. Moreover, we cannot apply GHNs directly to personalized federated learning (PFL) scenarios, where a priori client relation graph may be absent, private, or inaccessible. To mitigate these limitations in the context of PFL, we propose a novel class of HNs, sheaf hypernetworks (SHNs), which combine cellular sheaf theory with HNs to improve parameter sharing for PFL. We thoroughly evaluate SHNs across diverse PFL tasks, including multi-class classification, traffic and weather forecasting. Additionally, we provide a methodology for constructing client relation graphs in scenarios where such graphs are unavailable. We show that SHNs consistently outperform existing PFL solutions in complex non-IID scenarios. While the baselines' performance fluctuates depending on the task, SHNs show improvements of up to 2.7% in accuracy and 5.3% in lower mean squared error over the best-performing baseline.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Quantum Visual Feature Encoding Revisited
Authors:
Xuan-Bac Nguyen,
Hoang-Quan Nguyen,
Hugh Churchill,
Samee U. Khan,
Khoa Luu
Abstract:
Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the enc…
▽ More
Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the encoding process, thus complicating the learning process of the quantum machine learning models. In particular, the problem, termed "Quantum Information Gap" (QIG), leads to a gap of information between classical and corresponding quantum features. We provide theoretical proof and practical demonstrations of that found and underscore the significance of QIG, as it directly impacts the performance of quantum machine learning algorithms. To tackle this challenge, we introduce a simple but efficient new loss function named Quantum Information Preserving (QIP) to minimize this gap, resulting in enhanced performance of quantum machine learning algorithms. Extensive experiments validate the effectiveness of our approach, showcasing superior performance compared to current methodologies and consistently achieving state-of-the-art results in quantum modeling.
△ Less
Submitted 20 August, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual Clustering
Authors:
Xuan-Bac Nguyen,
Hoang-Quan Nguyen,
Samuel Yen-Chi Chen,
Samee U. Khan,
Hugh Churchill,
Khoa Luu
Abstract:
Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In t…
▽ More
Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In this study, we introduce QClusformer, a pioneering Transformer-based framework leveraging quantum machines to tackle unsupervised vision clustering challenges. Specifically, we design the Transformer architecture, including the self-attention module and transformer blocks, from a quantum perspective to enable execution on quantum hardware. In addition, we present QClusformer, a variant based on the Transformer architecture, tailored for unsupervised vision clustering tasks. By integrating these elements into an end-to-end framework, QClusformer consistently outperforms previous methods running on classical computers. Empirical evaluations across diverse benchmarks, including MS-Celeb-1M and DeepFashion, underscore the superior performance of QClusformer compared to state-of-the-art methods.
△ Less
Submitted 7 August, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
Authors:
Xuan-Bac Nguyen,
Hojin Jang,
Xin Li,
Samee U. Khan,
Pawan Sinha,
Khoa Luu
Abstract:
The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding b…
▽ More
The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding brain representations via fMRI signals. It allows us to identify the brain's Regions of Interest (ROI) of the subjects. Unlike previous brain research methods, which can only identify ROIs for one subject at a time and are limited by the number of subjects, BRACTIVE automatically extends this identification to multiple subjects and ROIs. Our experiments demonstrate that BRACTIVE effectively identifies person-specific regions of interest, such as face and body-selective areas, aligning with neuroscience findings and indicating potential applicability to various object categories. More importantly, we found that leveraging human visual brain activity to guide deep neural networks enhances performance across various benchmarks. It encourages the potential of BRACTIVE in both neuroscience and machine intelligence studies.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Fast-FedUL: A Training-Free Federated Unlearning with Provable Skew Resilience
Authors:
Thanh Trung Huynh,
Trong Bang Nguyen,
Phi Le Nguyen,
Thanh Tam Nguyen,
Matthias Weidlich,
Quoc Viet Hung Nguyen,
Karl Aberer
Abstract:
Federated learning (FL) has recently emerged as a compelling machine learning paradigm, prioritizing the protection of privacy for training data. The increasing demand to address issues such as ``the right to be forgotten'' and combat data poisoning attacks highlights the importance of techniques, known as \textit{unlearning}, which facilitate the removal of specific training data from trained FL…
▽ More
Federated learning (FL) has recently emerged as a compelling machine learning paradigm, prioritizing the protection of privacy for training data. The increasing demand to address issues such as ``the right to be forgotten'' and combat data poisoning attacks highlights the importance of techniques, known as \textit{unlearning}, which facilitate the removal of specific training data from trained FL models. Despite numerous unlearning methods proposed for centralized learning, they often prove inapplicable to FL due to fundamental differences in the operation of the two learning paradigms. Consequently, unlearning in FL remains in its early stages, presenting several challenges. Many existing unlearning solutions in FL require a costly retraining process, which can be burdensome for clients. Moreover, these methods are primarily validated through experiments, lacking theoretical assurances. In this study, we introduce Fast-FedUL, a tailored unlearning method for FL, which eliminates the need for retraining entirely. Through meticulous analysis of the target client's influence on the global model in each round, we develop an algorithm to systematically remove the impact of the target client from the trained model. In addition to presenting empirical findings, we offer a theoretical analysis delineating the upper bound of our unlearned model and the exact retrained model (the one obtained through retraining using untargeted clients). Experimental results with backdoor attack scenarios indicate that Fast-FedUL effectively removes almost all traces of the target client, while retaining the knowledge of untargeted clients (obtaining a high accuracy of up to 98\% on the main task). Significantly, Fast-FedUL attains the lowest time complexity, providing a speed that is 1000 times faster than retraining. Our source code is publicly available at \url{https://github.com/thanhtrunghuynh93/fastFedUL}.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Accelerating Transformers with Spectrum-Preserving Token Merging
Authors:
Hoai-Chau Tran,
Duy M. H. Nguyen,
Duy M. Nguyen,
Trung-Tin Nguyen,
Ngan Le,
Pengtao Xie,
Daniel Sonntag,
James Y. Zou,
Binh T. Nguyen,
Mathias Niepert
Abstract:
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr…
▽ More
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Comparison of WarpX and GUINEA-PIG for electron positron collisions
Authors:
Bao Nguyen,
Arianna Formenti,
Remi Lehe,
Jean-Luc Vay,
Spencer Gessner,
Luca Fedeli
Abstract:
As part of the Snowmass'21 planning exercise, the Advanced Accelerator Concepts community proposed developing multi-TeV linear colliders and considered beam-beam effects for these machines. Such colliders operate under a high disruption regime with an enormous number of electron-positron pairs produced from QED effects. Thus, it requires a self-consistent treatment of the fields produced by the pa…
▽ More
As part of the Snowmass'21 planning exercise, the Advanced Accelerator Concepts community proposed developing multi-TeV linear colliders and considered beam-beam effects for these machines. Such colliders operate under a high disruption regime with an enormous number of electron-positron pairs produced from QED effects. Thus, it requires a self-consistent treatment of the fields produced by the pairs, which is not implemented in state-of-the-art beam-beam codes such as GUINEA-PIG. WarpX is a parallel, open-source, and portable particle-in-cell code with an active developer community that models QED processes with photon and pair generation in relativistic laser-beam interactions. However, its application to beam-beam collisions has yet to be fully explored. In this work, we benchmark the luminosity spectra, photon spectra, and coherent production process from WarpX against GUINEA-PIG in the ILC and ultra-tight collision scenarios. Our performance comparison demonstrates a significant speed-up advantage of WarpX, ensuring a more robust and efficient modeling of electron-positron collisions at multi-TeV energies.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Vietnamese AI Generated Text Detection
Authors:
Quang-Dan Tran,
Van-Quan Nguyen,
Quang-Huy Pham,
K. B. Thang Nguyen,
Trong-Hop Do
Abstract:
In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we pre…
▽ More
In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we present a dataset named ViDetect, comprising 6.800 samples of Vietnamese essay, with 3.400 samples authored by humans and the remainder generated by LLMs, serving the purpose of detecting text generated by AI. We conducted evaluations using state-of-the-art methods, including ViT5, BartPho, PhoBERT, mDeberta V3, and mBERT. These results contribute not only to the growing body of research on detecting text generated by AI but also demonstrate the adaptability and effectiveness of different methods in the Vietnamese language context. This research lays the foundation for future advancements in AI-generated text detection and provides valuable insights for researchers in the field of natural language processing.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study
Authors:
Van Bach Nguyen,
Paul Youssef,
Jörg Schlötterer,
Christin Seifert
Abstract:
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how…
▽ More
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions.
△ Less
Submitted 26 April, 2024;
originally announced May 2024.
-
CEval: A Benchmark for Evaluating Counterfactual Text Generation
Authors:
Van Bach Nguyen,
Jörg Schlötterer,
Christin Seifert
Abstract:
Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, in…
▽ More
Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, includes common counterfactual datasets with human annotations, standard baselines (MICE, GDBA, CREST) and the open-source language model LLAMA-2. Our experiments found no perfect method for generating counterfactual text. Methods that excel at counterfactual metrics often produce lower-quality text while LLMs with simple prompts generate high-quality text but struggle with counterfactual criteria. By making CEval available as an open-source Python library, we encourage the community to contribute more methods and maintain consistent evaluation in future work.
△ Less
Submitted 13 August, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
High-Coherence Kerr-cat qubit in 2D architecture
Authors:
Ahmed Hajr,
Bingcheng Qing,
Ke Wang,
Gerwin Koolstra,
Zahra Pedramrazi,
Ziqi Kang,
Larry Chen,
Long B. Nguyen,
Christian Junger,
Noah Goss,
Irwin Huang,
Bibek Bhandari,
Nicholas E. Frattini,
Shruti Puri,
Justin Dressel,
Andrew N. Jordan,
David Santiago,
Irfan Siddiqi
Abstract:
The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions neces…
▽ More
The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions necessary for stabilizing and controlling this qubit has traditionally required strong microwave drives that heat the qubit and degrade its performance. In contrast, increasing the coupling to the drive port removes the need for strong drives at the expense of large Purcell decay. By integrating an effective band-block filter on-chip, we overcome this trade-off and realize a Kerr-cat qubit in a scalable 2D superconducting circuit with high coherence. This filter provides 30 dB of isolation at the qubit frequency with negligible attenuation at the frequencies required for stabilization and readout. We experimentally demonstrate quantum non-demolition readout fidelity of 99.6% for a cat with 8 photons. Also, to have high-fidelity universal control over this qubit, we combine fast Rabi oscillations with a new demonstration of the X(90) gate through phase modulation of the stabilization drive. Finally, the lifetime in this architecture is examined as a function of the cat size of up to 10 photons in the oscillator achieving a bit-flip time higher than 1 ms and only a linear decrease in the phase-flip time, in good agreement with the theoretical analysis of the circuit. Our qubit shows promise as a building block for fault-tolerant quantum processors with a small footprint.
△ Less
Submitted 19 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Authors:
Ankit Vani,
Bac Nguyen,
Samuel Lavoie,
Ranjay Krishna,
Aaron Courville
Abstract:
Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often f…
▽ More
Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.
△ Less
Submitted 14 September, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Technical Development of a Semi-Autonomous Robotic Partition
Authors:
Binh Vinh Duc Nguyen,
Andrew Vande Moere
Abstract:
This technical description details the design and engineering process of a semi-autonomous robotic partition. This robotic partition prototype was subsequently employed in a longer-term evaluation in-the-wild study conducted by the authors in a real-world office setting.
This technical description details the design and engineering process of a semi-autonomous robotic partition. This robotic partition prototype was subsequently employed in a longer-term evaluation in-the-wild study conducted by the authors in a real-world office setting.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Research Challenges for Adaptive Architecture: Empowering Occupants of Multi-Occupancy Buildings
Authors:
Binh Vinh Duc Nguyen,
Andrew Vande Moere
Abstract:
This positional paper outlines our vision of 'adaptive architecture', which involves the integration of robotic technology to physically change an architectural space in supporting the changing needs of its occupants, in response to the CHI'24 workshop "HabiTech - Inhabiting Buildings, Data & Technology" call on "How do new technologies enable and empower the inhabitants of multi-occupancy buildin…
▽ More
This positional paper outlines our vision of 'adaptive architecture', which involves the integration of robotic technology to physically change an architectural space in supporting the changing needs of its occupants, in response to the CHI'24 workshop "HabiTech - Inhabiting Buildings, Data & Technology" call on "How do new technologies enable and empower the inhabitants of multi-occupancy buildings?". Specifically, while adaptive architecture holds promise for enhancing occupant satisfaction, comfort, and overall health and well-being, there remains a range of research challenges of (1) how it can effectively support individual occupants, while (2) mediating the conflicting needs of collocated others, and (3) integrating meaningfully into the sociocultural characteristics of their building community.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
The Adaptive Workplace: Orchestrating Architectural Services around the Wellbeing of Individual Occupants
Authors:
Andrew Vande Moere,
Sara Arko,
Alena Safrova Drasilova,
Tomáš Ondráček,
Ilaria Pigliautile,
Benedetta Pioppi,
Anna Laura Pisello,
Jakub Prochazka,
Paula Acuna Roncancio,
Davide Schaumann,
Marcel Schweiker,
Binh Vinh Duc Nguyen
Abstract:
As the academic consortia members of the EU Horizon project SONATA ("Situation-aware OrchestratioN of AdapTive Architecture"), we respond to the workshop call for "Office Wellbeing by Design: Don't Stand for Anything Less" by proposing the "Adaptive Workplace" concept. In essence, our vision aims to adapt a workplace to the ever-changing needs of individual occupants, instead of that occupants are…
▽ More
As the academic consortia members of the EU Horizon project SONATA ("Situation-aware OrchestratioN of AdapTive Architecture"), we respond to the workshop call for "Office Wellbeing by Design: Don't Stand for Anything Less" by proposing the "Adaptive Workplace" concept. In essence, our vision aims to adapt a workplace to the ever-changing needs of individual occupants, instead of that occupants are expected to adapt to their workplace.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Giant Hall effect in two-dimensional CoSi$_2$ granular arrays
Authors:
Elica Anne Heredia,
Shao-Pin Chiu,
Ba-Anh-Vu Nguyen,
Ruey-Tay Wang,
Chih-Yuan Wu,
Sheng-Shiuan Yeh,
Juhn-Jong Lin
Abstract:
Granular metals offer tailorable electronic properties and play crucial roles in device and sensor applications. We have fabricated a series of nonmagnetic granular CoSi2 thin films and studied the Hall effect and transport properties. We observed a two orders of magnitude enhancement in the Hall coefficient in films fall slightly above the metal-insulator transition. This giant Hall effect (GHE)…
▽ More
Granular metals offer tailorable electronic properties and play crucial roles in device and sensor applications. We have fabricated a series of nonmagnetic granular CoSi2 thin films and studied the Hall effect and transport properties. We observed a two orders of magnitude enhancement in the Hall coefficient in films fall slightly above the metal-insulator transition. This giant Hall effect (GHE) is ascribed to the local quantum-interference effect induced reduction of the charge carriers. Transmission electron microscopy images and transport properties indicate that our films form two dimensional granular arrays. The GHE may provide useful and sensitive applications.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Spatially temporally distributed informative path planning for multi-robot systems
Authors:
Binh Nguyen,
Linh Nguyen,
Truong X. Nghiem,
Hung La,
Jose Baca,
Pablo Rangel,
Miguel Cid Montoya,
Thang Nguyen
Abstract:
This paper investigates the problem of informative path planning for a mobile robotic sensor network in spatially temporally distributed mapping. The robots are able to gather noisy measurements from an area of interest during their movements to build a Gaussian Process (GP) model of a spatio-temporal field. The model is then utilized to predict the spatio-temporal phenomenon at different points o…
▽ More
This paper investigates the problem of informative path planning for a mobile robotic sensor network in spatially temporally distributed mapping. The robots are able to gather noisy measurements from an area of interest during their movements to build a Gaussian Process (GP) model of a spatio-temporal field. The model is then utilized to predict the spatio-temporal phenomenon at different points of interest. To spatially and temporally navigate the group of robots so that they can optimally acquire maximal information gains while their connectivity is preserved, we propose a novel multistep prediction informative path planning optimization strategy employing our newly defined local cost functions. By using the dual decomposition method, it is feasible and practical to effectively solve the optimization problem in a distributed manner. The proposed method was validated through synthetic experiments utilizing real-world data sets.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition
Authors:
Bach Nguyen-Xuan,
Thien Nguyen-Hoang,
Thanh-Huy Nguyen,
Nhu Tai-Do
Abstract:
Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) metho…
▽ More
Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. By utilizing low-level feature information from the ipsilateral view (auxiliary view) before learning the high-level feature that emphasizes the shift in the human facial expression, our work seeks to provide a straightforward yet innovative way to improve the examined view (main view). We also suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model, focusing on pivotal local elements. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset, as observed in both training and validation contexts.
△ Less
Submitted 12 May, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Reference-based Metrics Disprove Themselves in Question Generation
Authors:
Bang Nguyen,
Mengxia Yu,
Yun Huang,
Meng Jiang
Abstract:
Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicate the annotation process and collect another reference. A good metric…
▽ More
Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicate the annotation process and collect another reference. A good metric is expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.
△ Less
Submitted 10 October, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Real-time hybrid controls of energy storage and load shedding for integrated power and energy systems of ships
Authors:
Linh Vu,
Thai-Thanh Nguyen,
Bang Le-Huy Nguyen,
Md Isfakul Anam,
Tuyen Vu
Abstract:
This paper presents an original energy management methodology to enhance the resilience of ship power systems. The integration of various energy storage systems (ESS), including battery energy storage systems (BESS) and super-capacitor energy storage systems (SCESS), in modern ship power systems poses challenges in designing an efficient energy management system (EMS). The EMS proposed in this pap…
▽ More
This paper presents an original energy management methodology to enhance the resilience of ship power systems. The integration of various energy storage systems (ESS), including battery energy storage systems (BESS) and super-capacitor energy storage systems (SCESS), in modern ship power systems poses challenges in designing an efficient energy management system (EMS). The EMS proposed in this paper aims to achieve multiple objectives. The primary objective is to minimize shed loads, while the secondary objective is to effectively manage different types of ESS. Considering the diverse ramp-rate characteristics of generators, SCESS, and BESS, the proposed EMS exploits these differences to determine an optimal long-term schedule for minimizing shed loads. Furthermore, the proposed EMS balances the state-of-charge (SoC) of ESS and prioritizes the SCESS's SoC levels to ensure the efficient operation of BESS and SCESS. For better computational efficiency, we introduce the receding horizon optimization method, enabling real-time EMS implementation. A comparison with the fixed horizon optimization (FHO) validates its effectiveness. Simulation studies and results demonstrate that the proposed EMS efficiently manages generators, BESS, and SCESS, ensuring system resilience under generation shortages. Additionally, the proposed methodology significantly reduces the computational burden compared to the FHO technique while maintaining acceptable resilience performance.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Authors:
Luca Zampierin,
Ghouthi Boukli Hacene,
Bac Nguyen,
Mirco Ravanelli
Abstract:
Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain…
▽ More
Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Video-Based Autism Detection with Deep Learning
Authors:
M. Serna-Aguilera,
X. B. Nguyen,
A. Singh,
L. Rockers,
S. Park,
L. Neely,
H. Seo,
K. Luu
Abstract:
Individuals with Autism Spectrum Disorder (ASD) often experience challenges in health, communication, and sensory processing; therefore, early diagnosis is necessary for proper treatment and care. In this work, we consider the problem of detecting or classifying ASD children to aid medical professionals in early diagnosis. We develop a deep learning model that analyzes video clips of children reac…
▽ More
Individuals with Autism Spectrum Disorder (ASD) often experience challenges in health, communication, and sensory processing; therefore, early diagnosis is necessary for proper treatment and care. In this work, we consider the problem of detecting or classifying ASD children to aid medical professionals in early diagnosis. We develop a deep learning model that analyzes video clips of children reacting to sensory stimuli, with the intent of capturing key differences in reactions and behavior between ASD and non-ASD participants. Unlike many recent studies in ASD classification with MRI data, which require expensive specialized equipment, our method utilizes a powerful but relatively affordable GPU, a standard computer setup, and a video camera for inference. Results show that our model effectively generalizes and understands key differences in the distinct movements of the children. It is noteworthy that our model exhibits successful classification performance despite the limited amount of data for a deep learning problem and limited temporal information available for learning, even with the motion artifacts.
△ Less
Submitted 30 March, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Cost-Adaptive Recourse Recommendation by Adaptive Preference Elicitation
Authors:
Duy Nguyen,
Bao Nguyen,
Viet Anh Nguyen
Abstract:
Algorithmic recourse recommends a cost-efficient action to a subject to reverse an unfavorable machine learning classification decision. Most existing methods in the literature generate recourse under the assumption of complete knowledge about the cost function. In real-world practice, subjects could have distinct preferences, leading to incomplete information about the underlying cost function of…
▽ More
Algorithmic recourse recommends a cost-efficient action to a subject to reverse an unfavorable machine learning classification decision. Most existing methods in the literature generate recourse under the assumption of complete knowledge about the cost function. In real-world practice, subjects could have distinct preferences, leading to incomplete information about the underlying cost function of the subject. This paper proposes a two-step approach integrating preference learning into the recourse generation problem. In the first step, we design a question-answering framework to refine the confidence set of the Mahalanobis matrix cost of the subject sequentially. Then, we generate recourse by utilizing two methods: gradient-based and graph-based cost-adaptive recourse that ensures validity while considering the whole confidence set of the cost matrix. The numerical evaluation demonstrates the benefits of our approach over state-of-the-art baselines in delivering cost-efficient recourse recommendations.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Combining unsupervised and supervised learning in microscopy enables defect analysis of a full 4H-SiC wafer
Authors:
Binh Duong Nguyen,
Johannes Steiner,
Peter Wellmann,
Stefan Sandfeld
Abstract:
Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experime…
▽ More
Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experiments, handling these tasks manually becomes more and more impossible. In this work, we combine various image analysis and data mining techniques for creating a robust and accurate, automated image analysis pipeline. This allows for extracting the type and position of all defects in a microscopy image of a KOH-etched 4H-SiC wafer that was stitched together from approximately 40,000 individual images.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Evaluating DTW Measures via a Synthesis Framework for Time-Series Data
Authors:
Kishansingh Rajput,
Duong Binh Nguyen,
Guoning Chen
Abstract:
Time-series data originate from various applications that describe specific observations or quantities of interest over time. Their analysis often involves the comparison across different time-series data sequences, which in turn requires the alignment of these sequences. Dynamic Time Warping (DTW) is the standard approach to achieve an optimal alignment between two temporal signals. Different var…
▽ More
Time-series data originate from various applications that describe specific observations or quantities of interest over time. Their analysis often involves the comparison across different time-series data sequences, which in turn requires the alignment of these sequences. Dynamic Time Warping (DTW) is the standard approach to achieve an optimal alignment between two temporal signals. Different variations of DTW have been proposed to address various needs for signal alignment or classifications. However, a comprehensive evaluation of their performance in these time-series data processing tasks is lacking. Most DTW measures perform well on certain types of time-series data without a clear explanation of the reason. To address that, we propose a synthesis framework to model the variation between two time-series data sequences for comparison. Our synthesis framework can produce a realistic initial signal and deform it with controllable variations that mimic real-world scenarios. With this synthesis framework, we produce a large number of time-series sequence pairs with different but known variations, which are used to assess the performance of a number of well-known DTW measures for the tasks of alignment and classification. We report their performance on different variations and suggest the proper DTW measure to use based on the type of variations between two time-series sequences. This is the first time such a guideline is presented for selecting a proper DTW measure. To validate our conclusion, we apply our findings to real-world applications, i.e., the detection of the formation top for the oil and gas industry and the pattern search in streamlines for flow visualization.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Can Large Language Models Learn Independent Causal Mechanisms?
Authors:
Gaël Gendron,
Bao Trung Nguyen,
Alex Yuxuan Peng,
Michael Witbrock,
Gillian Dobbie
Abstract:
Despite impressive performance on language modelling and complex reasoning tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting a lack of generalisation ability. By contrast, systems such as causal models, that learn abstract variables and causal relationships, can demonstrate increased robustness against changes in the distr…
▽ More
Despite impressive performance on language modelling and complex reasoning tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting a lack of generalisation ability. By contrast, systems such as causal models, that learn abstract variables and causal relationships, can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks. We also investigate the level of independence and domain specialisation and show that LLMs rely on pre-trained partially domain-invariant mechanisms resilient to fine-tuning.
△ Less
Submitted 9 September, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Authors:
Quang Pham,
Giang Do,
Huy Nguyen,
TrungTin Nguyen,
Chenghao Liu,
Mina Sartipi,
Binh T. Nguyen,
Savitha Ramasamy,
Xiaoli Li,
Steven Hoi,
Nhat Ho
Abstract:
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this…
▽ More
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator. We further propose CompeteSMoE, an effective and efficient algorithm to train large language models by deploying a simple router that predicts the competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains from the competition routing policy while having low computation overheads. Our extensive empirical evaluations on two transformer architectures and a wide range of tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.