subscribe to arXiv mailings

Adversarial Neural Networks in Medical Imaging Advancements and Challenges in Semantic Segmentation

Authors: Houze Liu, Bo Zhang, Yanlin Xiang, Yuxiang Hu, Aoran Shen, Yang Lin

Abstract: Recent advancements in artificial intelligence (AI) have precipitated a paradigm shift in medical imaging, particularly revolutionizing the domain of brain imaging. This paper systematically investigates the integration of deep learning -- a principal branch of AI -- into the semantic segmentation of brain images. Semantic segmentation serves as an indispensable technique for the delineation of di… ▽ More Recent advancements in artificial intelligence (AI) have precipitated a paradigm shift in medical imaging, particularly revolutionizing the domain of brain imaging. This paper systematically investigates the integration of deep learning -- a principal branch of AI -- into the semantic segmentation of brain images. Semantic segmentation serves as an indispensable technique for the delineation of discrete anatomical structures and the identification of pathological markers, essential for the diagnosis of complex neurological disorders. Historically, the reliance on manual interpretation by radiologists, while noteworthy for its accuracy, is plagued by inherent subjectivity and inter-observer variability. This limitation becomes more pronounced with the exponential increase in imaging data, which traditional methods struggle to process efficiently and effectively. In response to these challenges, this study introduces the application of adversarial neural networks, a novel AI approach that not only automates but also refines the semantic segmentation process. By leveraging these advanced neural networks, our approach enhances the precision of diagnostic outputs, reducing human error and increasing the throughput of imaging data analysis. The paper provides a detailed discussion on how adversarial neural networks facilitate a more robust, objective, and scalable solution, thereby significantly improving diagnostic accuracies in neurological evaluations. This exploration highlights the transformative impact of AI on medical imaging, setting a new benchmark for future research and clinical practice in neurology. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.12543 [pdf, other]

LLM-based Translation Inference with Iterative Bilingual Understanding

Authors: Andong Chen, Kehai Chen, Yang Xiang, Xuefeng Bai, Muyun Yang, Tiejun Zhao, Min zhang

Abstract: The remarkable understanding and generation capabilities of large language models (LLMs) have greatly improved translation performance. However, incorrect understanding of the sentence to be translated can degrade translation quality. To address this issue, we proposed a novel Iterative Bilingual Understanding Translation (IBUT) method based on the cross-lingual capabilities of LLMs and the dual c… ▽ More The remarkable understanding and generation capabilities of large language models (LLMs) have greatly improved translation performance. However, incorrect understanding of the sentence to be translated can degrade translation quality. To address this issue, we proposed a novel Iterative Bilingual Understanding Translation (IBUT) method based on the cross-lingual capabilities of LLMs and the dual characteristics of translation tasks. The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately. Furthermore, the dual characteristics allow IBUT to generate effective cross-lingual feedback, iteratively refining contextual understanding, thereby reducing errors and improving translation performance. Experimental results showed that the proposed IBUT outperforms several strong comparison methods, especially being generalized to multiple domains (e.g., news, commonsense, and cultural translation benchmarks). △ Less

Submitted 16 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

Comments: Work in progress

arXiv:2410.10737 [pdf, ps, other]

Online Statistical Inference for Time-varying Sample-averaged Q-learning

Authors: Saunak Kumar Panda, Ruiqi Liu, Yisha Xiang

Abstract: Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a time-varying batch-averaged Q-learning algorithm, termed sampleaveraged Q-learning, which improves upon traditional single-sampl… ▽ More Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a time-varying batch-averaged Q-learning algorithm, termed sampleaveraged Q-learning, which improves upon traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters. Numerical experiments conducted on classic OpenAI Gym environments show that the time-varying sample-averaged Q-learning method consistently outperforms both single-sample and constant-batch Q-learning methods, achieving superior accuracy while maintaining comparable learning speeds. △ Less

Submitted 14 October, 2024; originally announced October 2024.

arXiv:2410.09072 [pdf, other]

iTeach: Interactive Teaching for Robot Perception using Mixed Reality

Authors: Jishnu Jaykumar P, Cole Salvato, Vinaya Bomnale, Jikai Wang, Yu Xiang

Abstract: We introduce iTeach, a Mixed Reality (MR) framework to improve robot perception through real-time interactive teaching. By allowing human instructors to dynamically label robot RGB data, iTeach improves both the accuracy and adaptability of robot perception to new scenarios. The framework supports on-the-fly data collection and labeling, enhancing model performance, and generalization. Applied to… ▽ More We introduce iTeach, a Mixed Reality (MR) framework to improve robot perception through real-time interactive teaching. By allowing human instructors to dynamically label robot RGB data, iTeach improves both the accuracy and adaptability of robot perception to new scenarios. The framework supports on-the-fly data collection and labeling, enhancing model performance, and generalization. Applied to door and handle detection for household tasks, iTeach integrates a HoloLens app with an interactive YOLO model. Furthermore, we introduce the IRVLUTD DoorHandle dataset. DH-YOLO, our efficient detection model, significantly enhances the accuracy and efficiency of door and handle detection, highlighting the potential of MR to make robotic systems more capable and adaptive in real-world environments. The project page is available at https://irvlutd.github.io/iTeach. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2410.06308 [pdf, other]

Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers

Authors: Chuqi Chen, Qixuan Zhou, Yahong Yang, Yang Xiang, Tao Luo

Abstract: Neural network-based methods have emerged as powerful tools for solving partial differential equations (PDEs) in scientific and engineering applications, particularly when handling complex domains or incorporating empirical data. These methods leverage neural networks as basis functions to approximate PDE solutions. However, training such networks can be challenging, often resulting in limited acc… ▽ More Neural network-based methods have emerged as powerful tools for solving partial differential equations (PDEs) in scientific and engineering applications, particularly when handling complex domains or incorporating empirical data. These methods leverage neural networks as basis functions to approximate PDE solutions. However, training such networks can be challenging, often resulting in limited accuracy. In this paper, we investigate the training dynamics of neural network-based PDE solvers with a focus on the impact of initialization techniques. We assess training difficulty by analyzing the eigenvalue distribution of the kernel and apply the concept of effective rank to quantify this difficulty, where a larger effective rank correlates with faster convergence of the training error. Building upon this, we discover through theoretical analysis and numerical experiments that two initialization techniques, partition of unity (PoU) and variance scaling (VS), enhance the effective rank, thereby accelerating the convergence of training error. Furthermore, comprehensive experiments using popular PDE-solving frameworks, such as PINN, Deep Ritz, and the operator learning framework DeepOnet, confirm that these initialization techniques consistently speed up convergence, in line with our theoretical findings. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.04790 [pdf, other]

GARLIC: LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph for Long Document QA

Authors: Xinyu Wang, Yanzheng Xiang, Lin Gui, Yulan He

Abstract: In the past, Retrieval-Augmented Generation (RAG) methods split text into chunks to enable language models to handle long documents. Recent tree-based RAG methods are able to retrieve detailed information while preserving global context. However, with the advent of more powerful LLMs, such as Llama 3.1, which offer better comprehension and support for longer inputs, we found that even recent tree-… ▽ More In the past, Retrieval-Augmented Generation (RAG) methods split text into chunks to enable language models to handle long documents. Recent tree-based RAG methods are able to retrieve detailed information while preserving global context. However, with the advent of more powerful LLMs, such as Llama 3.1, which offer better comprehension and support for longer inputs, we found that even recent tree-based RAG methods perform worse than directly feeding the entire document into Llama 3.1, although RAG methods still hold an advantage in reducing computational costs. In this paper, we propose a new retrieval method, called LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph (GARLIC), which outperforms previous state-of-the-art baselines, including Llama 3.1, while retaining the computational efficiency of RAG methods. Our method introduces several improvements: (1) Rather than using a tree structure, we construct a Hierarchical Weighted Directed Acyclic Graph with many-to-many summarization, where the graph edges are derived from attention mechanisms, and each node focuses on a single event or very few events. (2) We introduce a novel retrieval method that leverages the attention weights of LLMs rather than dense embedding similarity. Our method allows for searching the graph along multiple paths and can terminate at any depth. (3) We use the LLM to control the retrieval process, enabling it to dynamically adjust the amount and depth of information retrieved for different queries. Experimental results show that our method outperforms previous state-of-the-art baselines, including Llama 3.1, on two single-document and two multi-document QA datasets, while maintaining similar computational complexity to traditional RAG methods. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2409.19563 [pdf, other]

CLIP-based Camera-Agnostic Feature Learning for Intra-camera Person Re-Identification

Authors: Xuan Tan, Xun Gong, Yang Xiang

Abstract: Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associa… ▽ More Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associations across cameras. This limits the effectiveness of text-based enhancements. To address this, we propose a novel framework called CLIP-based Camera-Agnostic Feature Learning (CCAFL) for ICS ReID. Accordingly, two custom modules are designed to guide the model to actively learn camera-agnostic pedestrian features: Intra-Camera Discriminative Learning (ICDL) and Inter-Camera Adversarial Learning (ICAL). Specifically, we first establish learnable textual prompts for intra-camera pedestrian images to obtain crucial semantic supervision signals for subsequent intra- and inter-camera learning. Then, we design ICDL to increase inter-class variation by considering the hard positive and hard negative samples within each camera, thereby learning intra-camera finer-grained pedestrian features. Additionally, we propose ICAL to reduce inter-camera pedestrian feature discrepancies by penalizing the model's ability to predict the camera from which a pedestrian image originates, thus enhancing the model's capability to recognize pedestrians from different viewpoints. Extensive experiments on popular ReID datasets demonstrate the effectiveness of our approach. Especially, on the challenging MSMT17 dataset, we arrive at 58.9\% in terms of mAP accuracy, surpassing state-of-the-art methods by 7.6\%. Code will be available at: https://github.com/Trangle12/CCAFL. △ Less

Submitted 29 September, 2024; originally announced September 2024.

Comments: Submitted to IEEE TCSVT

arXiv:2409.19510 [pdf, other]

CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought

Authors: Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin

Abstract: Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a spee… ▽ More Speech Language Models (SLMs) have demonstrated impressive performance on speech translation tasks. However, existing research primarily focuses on direct instruction fine-tuning and often overlooks the inherent reasoning capabilities of SLMs. In this paper, we introduce a three-stage training framework designed to activate the chain-of-thought (CoT) capabilities of SLMs. We propose CoT-ST, a speech translation model that utilizes multimodal CoT to decompose speech translation into sequential steps of speech recognition and translation. We validated the effectiveness of our method on two datasets: the CoVoST-2 dataset and MuST-C dataset. The experimental results demonstrate that CoT-ST outperforms previous state-of-the-art methods, achieving higher BLEU scores (CoVoST-2 en-ja: 30.5->30.8, en-zh: 45.2->47.7, MuST-C en-zh: 19.6->21.2). This work is open sourced at https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/st_covost2 . △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.15493 [pdf, other]

Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots

Authors: Sai Haneesh Allu, Itay Kadosh, Tyler Summers, Yu Xiang

Abstract: We introduce a new robotic system that enables a mobile robot to autonomously explore an unknown environment, build a semantic map of the environment, and subsequently update the semantic map to reflect environment changes, such as location changes of objects. Our system leverages a LiDAR scanner for 2D occupancy grid mapping and an RGB-D camera for object perception. We introduce a semantic map r… ▽ More We introduce a new robotic system that enables a mobile robot to autonomously explore an unknown environment, build a semantic map of the environment, and subsequently update the semantic map to reflect environment changes, such as location changes of objects. Our system leverages a LiDAR scanner for 2D occupancy grid mapping and an RGB-D camera for object perception. We introduce a semantic map representation that combines a 2D occupancy grid map for geometry, with a topological map for object semantics. This map representation enables us to effectively update the semantics by deleting or adding nodes to the topological map. Our system has been tested on a Fetch robot. The robot can semantically map a 93m x 90m floor and update the semantic map once objects are moved in the environment. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 7 pages, 7 figures. Project page is available at https://irvlutd.github.io/SemanticMapping/

arXiv:2409.14760 [pdf]

Isometric Immersion Learning with Riemannian Geometry

Authors: Zihao Chen, Wenyong Wang, Yu Xiang

Abstract: Manifold learning has been proven to be an effective method for capturing the implicitly intrinsic structure of non-Euclidean data, in which one of the primary challenges is how to maintain the distortion-free (isometry) of the data representations. Actually, there is still no manifold learning method that provides a theoretical guarantee of isometry. Inspired by Nash's isometric theorem, we intro… ▽ More Manifold learning has been proven to be an effective method for capturing the implicitly intrinsic structure of non-Euclidean data, in which one of the primary challenges is how to maintain the distortion-free (isometry) of the data representations. Actually, there is still no manifold learning method that provides a theoretical guarantee of isometry. Inspired by Nash's isometric theorem, we introduce a new concept called isometric immersion learning based on Riemannian geometry principles. Following this concept, an unsupervised neural network-based model that simultaneously achieves metric and manifold learning is proposed by integrating Riemannian geometry priors. What's more, we theoretically derive and algorithmically implement a maximum likelihood estimation-based training method for the new model. In the simulation experiments, we compared the new model with the state-of-the-art baselines on various 3-D geometry datasets, demonstrating that the new model exhibited significantly superior performance in multiple evaluation metrics. Moreover, we applied the Riemannian metric learned from the new model to downstream prediction tasks in real-world scenarios, and the accuracy was improved by an average of 8.8%. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.14519 [pdf, other]

RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis

Authors: Ninad Khargonkar, Luis Felipe Casas, Balakrishnan Prabhakaran, Yu Xiang

Abstract: We introduce a novel representation named as the unified gripper coordinate space for grasp synthesis of multiple grippers. The space is a 2D surface of a sphere in 3D using longitude and latitude as its coordinates, and it is shared for all robotic grippers. We propose a new algorithm to map the palm surface of a gripper into the unified gripper coordinate space, and design a conditional variatio… ▽ More We introduce a novel representation named as the unified gripper coordinate space for grasp synthesis of multiple grippers. The space is a 2D surface of a sphere in 3D using longitude and latitude as its coordinates, and it is shared for all robotic grippers. We propose a new algorithm to map the palm surface of a gripper into the unified gripper coordinate space, and design a conditional variational autoencoder to predict the unified gripper coordinates given an input object. The predicted unified gripper coordinates establish correspondences between the gripper and the object, which can be used in an optimization problem to solve the grasp pose and the finger joints for grasp synthesis. We demonstrate that using the unified gripper coordinate space improves the success rate and diversity in the grasp synthesis of multiple grippers. △ Less

Submitted 22 September, 2024; originally announced September 2024.

Comments: 7 pages, 8 figures, 2 tables. Project page available at https://irvlutd.github.io/RobotFingerPrint

arXiv:2409.13440 [pdf, other]

Differentially Private Multimodal Laplacian Dropout (DP-MLD) for EEG Representative Learning

Authors: Xiaowen Fu, Bingxin Wang, Xinzhou Guo, Guoqing Liu, Yang Xiang

Abstract: Recently, multimodal electroencephalogram (EEG) learning has shown great promise in disease detection. At the same time, ensuring privacy in clinical studies has become increasingly crucial due to legal and ethical concerns. One widely adopted scheme for privacy protection is differential privacy (DP) because of its clear interpretation and ease of implementation. Although numerous methods have be… ▽ More Recently, multimodal electroencephalogram (EEG) learning has shown great promise in disease detection. At the same time, ensuring privacy in clinical studies has become increasingly crucial due to legal and ethical concerns. One widely adopted scheme for privacy protection is differential privacy (DP) because of its clear interpretation and ease of implementation. Although numerous methods have been proposed under DP, it has not been extensively studied for multimodal EEG data due to the complexities of models and signal data considered there. In this paper, we propose a novel Differentially Private Multimodal Laplacian Dropout (DP-MLD) scheme for multimodal EEG learning. Our approach proposes a novel multimodal representative learning model that processes EEG data by language models as text and other modal data by vision transformers as images, incorporating well-designed cross-attention mechanisms to effectively extract and integrate cross-modal features. To achieve DP, we design a novel adaptive feature-level Laplacian dropout scheme, where randomness allocation and performance are dynamically optimized within given privacy budgets. In the experiment on an open-source multimodal dataset of Freezing of Gait (FoG) in Parkinson's Disease (PD), our proposed method demonstrates an approximate 4\% improvement in classification accuracy, and achieves state-of-the-art performance in multimodal EEG learning under DP. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.13083 [pdf, other]

FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

Authors: R G Gayathri, Atul Sajjanhar, Md Palash Uddin, Yong Xiang

Abstract: Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques… ▽ More Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques employed centralized ML methods to perform such an ITD. Organizations operating from multiple locations cannot contribute to the centralized models as the data is generated from various locations. In particular, the user behavior data, which is the primary source of ITD, cannot be shared among the locations due to privacy concerns. Additionally, the data distributed across various locations result in extreme class imbalance due to the rarity of attacks. Federated Learning (FL), a distributed data modeling paradigm, gained much interest recently. However, FL-enabled ITD is not yet explored, and it still needs research to study the significant issues of its implementation in practical settings. As such, our work investigates an FL-enabled multiclass ITD paradigm that considers non-Independent and Identically Distributed (non-IID) data distribution to detect insider threats from different locations (clients) of an organization. Specifically, we propose a Federated Adversarial Training (FedAT) approach using a generative model to alleviate the extreme data skewness arising from the non-IID data distribution among the clients. Besides, we propose to utilize a Self-normalized Neural Network-based Multi-Layer Perceptron (SNN-MLP) model to improve ITD. We perform comprehensive experiments and compare the results with the benchmarks to manifest the enhanced performance of the proposed FedATdriven ITD scheme. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: 10 pages, 7 figures

arXiv:2409.11652 [pdf, other]

Relax DARTS: Relaxing the Constraints of Differentiable Architecture Search for Eye Movement Recognition

Authors: Hongyu Zhu, Xin Jin, Hongchao Liao, Yan Xiang, Mounim A. El-Yacoubi, Huafeng Qin

Abstract: Eye movement biometrics is a secure and innovative identification method. Deep learning methods have shown good performance, but their network architecture relies on manual design and combined priori knowledge. To address these issues, we introduce automated network search (NAS) algorithms to the field of eye movement recognition and present Relax DARTS, which is an improvement of the Differentiab… ▽ More Eye movement biometrics is a secure and innovative identification method. Deep learning methods have shown good performance, but their network architecture relies on manual design and combined priori knowledge. To address these issues, we introduce automated network search (NAS) algorithms to the field of eye movement recognition and present Relax DARTS, which is an improvement of the Differentiable Architecture Search (DARTS) to realize more efficient network search and training. The key idea is to circumvent the issue of weight sharing by independently training the architecture parameters $α$ to achieve a more precise target architecture. Moreover, the introduction of module input weights $β$ allows cells the flexibility to select inputs, to alleviate the overfitting phenomenon and improve the model performance. Results on four public databases demonstrate that the Relax DARTS achieves state-of-the-art recognition performance. Notably, Relax DARTS exhibits adaptability to other multi-feature temporal classification tasks. △ Less

Submitted 17 September, 2024; originally announced September 2024.

Comments: Accepted By CCBR 2024

arXiv:2408.13985 [pdf, other]

TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models

Authors: Zelin Li, Kehai Chen, Lemao Liu, Xuefeng Bai, Mingming Yang, Yang Xiang, Min Zhang

Abstract: With the great advancements in large language models (LLMs), adversarial attacks against LLMs have recently attracted increasing attention. We found that pre-existing adversarial attack methodologies exhibit limited transferability and are notably inefficient, particularly when applied to LLMs. In this paper, we analyze the core mechanisms of previous predominant adversarial attack methods, reveal… ▽ More With the great advancements in large language models (LLMs), adversarial attacks against LLMs have recently attracted increasing attention. We found that pre-existing adversarial attack methodologies exhibit limited transferability and are notably inefficient, particularly when applied to LLMs. In this paper, we analyze the core mechanisms of previous predominant adversarial attack methods, revealing that 1) the distributions of importance score differ markedly among victim models, restricting the transferability; 2) the sequential attack processes induces substantial time overheads. Based on the above two insights, we introduce a new scheme, named TF-Attack, for Transferable and Fast adversarial attacks on LLMs. TF-Attack employs an external LLM as a third-party overseer rather than the victim model to identify critical units within sentences. Moreover, TF-Attack introduces the concept of Importance Level, which allows for parallel substitutions of attacks. We conduct extensive experiments on 6 widely adopted benchmarks, evaluating the proposed method through both automatic and human metrics. Results show that our method consistently surpasses previous methods in transferability and delivers significant speed improvements, up to 20 times faster than earlier attack strategies. △ Less

Submitted 8 September, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

Comments: 14 pages, 6 figures

arXiv:2408.09945 [pdf, other]

Benchmarking LLMs for Translating Classical Chinese Poetry:Evaluating Adequacy, Fluency, and Elegance

Authors: Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang

Abstract: Large language models (LLMs) have shown remarkable performance in translation tasks. However, the increasing demand for high-quality translations that are not only adequate but also fluent and elegant. To evaluate the extent to which current LLMs can meet these demands, we introduce a suitable benchmark (PoetMT) for translating classical Chinese poetry into English. This task requires not only ade… ▽ More Large language models (LLMs) have shown remarkable performance in translation tasks. However, the increasing demand for high-quality translations that are not only adequate but also fluent and elegant. To evaluate the extent to which current LLMs can meet these demands, we introduce a suitable benchmark (PoetMT) for translating classical Chinese poetry into English. This task requires not only adequacy in translating culturally and historically significant content but also a strict adherence to linguistic fluency and poetic elegance. To overcome the limitations of traditional evaluation metrics, we propose an automatic evaluation metric based on GPT-4, which better evaluates translation quality in terms of adequacy, fluency, and elegance. Our evaluation study reveals that existing large language models fall short in this task. To evaluate these issues, we propose RAT, a Retrieval-Augmented machine Translation method that enhances the translation process by incorporating knowledge related to classical poetry. Our dataset and code will be made available. △ Less

Submitted 16 October, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: Work in progress

arXiv:2408.09896 [pdf, other]

Instruction-Based Molecular Graph Generation with Unified Text-Graph Diffusion Model

Authors: Yuran Xiang, Haiteng Zhao, Chang Ma, Zhi-Hong Deng

Abstract: Recent advancements in computational chemistry have increasingly focused on synthesizing molecules based on textual instructions. Integrating graph generation with these instructions is complex, leading most current methods to use molecular sequences with pre-trained large language models. In response to this challenge, we propose a novel framework, named… ▽ More Recent advancements in computational chemistry have increasingly focused on synthesizing molecules based on textual instructions. Integrating graph generation with these instructions is complex, leading most current methods to use molecular sequences with pre-trained large language models. In response to this challenge, we propose a novel framework, named $\textbf{UTGDiff (Unified Text-Graph Diffusion Model)}$, which utilizes language models for discrete graph diffusion to generate molecular graphs from instructions. UTGDiff features a unified text-graph transformer as the denoising network, derived from pre-trained language models and minimally modified to process graph data through attention bias. Our experimental results demonstrate that UTGDiff consistently outperforms sequence-based baselines in tasks involving instruction-based molecule generation and editing, achieving superior performance with fewer parameters given an equivalent level of pretraining corpus. Our code is availble at https://github.com/ran1812/UTGDiff. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.07613 [pdf, other]

Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks

Authors: Liting Jiang, Feng Wang, Wenyi Zhang, Peifeng Li, Hongjian You, Yuming Xiang

Abstract: Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-… ▽ More Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at https://github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: submitted to IEEE jstars

arXiv:2408.07419 [pdf, other]

Unsupervised Stereo Matching Network For VHR Remote Sensing Images Based On Error Prediction

Authors: Liting Jiang, Yuming Xiang, Feng Wang, Hongjian You

Abstract: Stereo matching in remote sensing has recently garnered increased attention, primarily focusing on supervised learning. However, datasets with ground truth generated by expensive airbone Lidar exhibit limited quantity and diversity, constraining the effectiveness of supervised networks. In contrast, unsupervised learning methods can leverage the increasing availability of very-high-resolution (VHR… ▽ More Stereo matching in remote sensing has recently garnered increased attention, primarily focusing on supervised learning. However, datasets with ground truth generated by expensive airbone Lidar exhibit limited quantity and diversity, constraining the effectiveness of supervised networks. In contrast, unsupervised learning methods can leverage the increasing availability of very-high-resolution (VHR) remote sensing images, offering considerable potential in the realm of stereo matching. Motivated by this intuition, we propose a novel unsupervised stereo matching network for VHR remote sensing images. A light-weight module to bridge confidence with predicted error is introduced to refine the core model. Robust unsupervised losses are formulated to enhance network convergence. The experimental results on US3D and WHU-Stereo datasets demonstrate that the proposed network achieves superior accuracy compared to other unsupervised networks and exhibits better generalization capabilities than supervised models. Our code will be available at https://github.com/Elenairene/CBEM. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: Accepted to International Geoscience and Remote Sensing Symposium (IGARSS), 2024

arXiv:2407.19216 [pdf, other]

EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

Authors: Shigang Liu, Di Cao, Junae Kim, Tamas Abraham, Paul Montague, Seyit Camtepe, Jun Zhang, Yang Xiang

Abstract: Recently, deep learning has demonstrated promising results in enhancing the accuracy of vulnerability detection and identifying vulnerabilities in software. However, these techniques are still vulnerable to attacks. Adversarial examples can exploit vulnerabilities within deep neural networks, posing a significant threat to system security. This study showcases the susceptibility of deep learning m… ▽ More Recently, deep learning has demonstrated promising results in enhancing the accuracy of vulnerability detection and identifying vulnerabilities in software. However, these techniques are still vulnerable to attacks. Adversarial examples can exploit vulnerabilities within deep neural networks, posing a significant threat to system security. This study showcases the susceptibility of deep learning models to adversarial attacks, which can achieve 100% attack success rate (refer to Table 5). The proposed method, EaTVul, encompasses six stages: identification of important samples using support vector machines, identification of important features using the attention mechanism, generation of adversarial data based on these features using ChatGPT, preparation of an adversarial attack pool, selection of seed data using a fuzzy genetic algorithm, and the execution of an evasion attack. Extensive experiments demonstrate the effectiveness of EaTVul, achieving an attack success rate of more than 83% when the snippet size is greater than 2. Furthermore, in most cases with a snippet size of 4, EaTVul achieves a 100% attack success rate. The findings of this research emphasize the necessity of robust defenses against adversarial attacks in software vulnerability detection. △ Less

Submitted 27 July, 2024; originally announced July 2024.

arXiv:2407.13911 [pdf, other]

Continual Distillation Learning

Authors: Qifan Zhang, Yunhui Guo, Yu Xiang

Abstract: We study the problem of Continual Distillation Learning (CDL) that considers Knowledge Distillation (KD) in the Continual Learning (CL) setup. A teacher model and a student model need to learn a sequence of tasks, and the knowledge of the teacher model will be distilled to the student to improve the student model. We introduce a novel method named CDL-Prompt that utilizes prompt-based continual le… ▽ More We study the problem of Continual Distillation Learning (CDL) that considers Knowledge Distillation (KD) in the Continual Learning (CL) setup. A teacher model and a student model need to learn a sequence of tasks, and the knowledge of the teacher model will be distilled to the student to improve the student model. We introduce a novel method named CDL-Prompt that utilizes prompt-based continual learning models to build the teacher-student model. We investigate how to utilize the prompts of the teacher model in the student model for knowledge distillation, and propose an attention-based prompt mapping scheme to use the teacher prompts for the student. We demonstrate that our method can be applied to different prompt-based continual learning models such as L2P, DualPrompt and CODA-Prompt to improve their performance using powerful teacher models. Although recent CL methods focus on prompt learning, we show that our method can be utilized to build efficient CL models using prompt-based knowledge distillation. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.11529 [pdf, other]

Cross-Phase Mutual Learning Framework for Pulmonary Embolism Identification on Non-Contrast CT Scans

Authors: Bizhe Bai, Yan-Jie Zhou, Yujian Hu, Tony C. W. Mok, Yilang Xiang, Le Lu, Hongkun Zhang, Minfeng Xu

Abstract: Pulmonary embolism (PE) is a life-threatening condition where rapid and accurate diagnosis is imperative yet difficult due to predominantly atypical symptomatology. Computed tomography pulmonary angiography (CTPA) is acknowledged as the gold standard imaging tool in clinics, yet it can be contraindicated for emergency department (ED) patients and represents an onerous procedure, thus necessitating… ▽ More Pulmonary embolism (PE) is a life-threatening condition where rapid and accurate diagnosis is imperative yet difficult due to predominantly atypical symptomatology. Computed tomography pulmonary angiography (CTPA) is acknowledged as the gold standard imaging tool in clinics, yet it can be contraindicated for emergency department (ED) patients and represents an onerous procedure, thus necessitating PE identification through non-contrast CT (NCT) scans. In this work, we explore the feasibility of applying a deep-learning approach to NCT scans for PE identification. We propose a novel Cross-Phase Mutual learNing framework (CPMN) that fosters knowledge transfer from CTPA to NCT, while concurrently conducting embolism segmentation and abnormality classification in a multi-task manner. The proposed CPMN leverages the Inter-Feature Alignment (IFA) strategy that enhances spatial contiguity and mutual learning between the dual-pathway network, while the Intra-Feature Discrepancy (IFD) strategy can facilitate precise segmentation of PE against complex backgrounds for single-pathway networks. For a comprehensive assessment of the proposed approach, a large-scale dual-phase dataset containing 334 PE patients and 1,105 normal subjects has been established. Experimental results demonstrate that CPMN achieves the leading identification performance, which is 95.4\% and 99.6\% in patient-level sensitivity and specificity on NCT scans, indicating the potential of our approach as an economical, accessible, and precise tool for PE identification in clinical practice. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Early accept by MICCAI 2024

arXiv:2407.07289 [pdf, other]

Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

Authors: Dengyan Luo, Yanping Xiang, Hu Wang, Luping Ji, Shuai Li, Mao Ye

Abstract: The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensa… ▽ More The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensation, resulting in poor performance in the case of a video sequence including large motion. In this paper, we propose a Deformable Feature Alignment and Refinement (DFAR) method based on deformable convolution to explicitly use motion context in both the training and inference stages. Specifically, a Temporal Deformable Alignment (TDA) module based on the designed Dilated Convolution Attention Fusion (DCAF) block is developed to explicitly align the adjacent frames with the current frame at the feature level. Then, the feature refinement module adaptively fuses the aligned features and further aggregates useful spatio-temporal information by means of the proposed Attention-guided Deformable Fusion (AGDF) block. In addition, to improve the alignment of adjacent frames with the current frame, we extend the traditional loss function by introducing a new motion compensation loss. Extensive experimental results demonstrate that the proposed DFAR method achieves the state-of-the-art performance on two benchmark datasets including DAUB and IRDST. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.03945 [pdf, other]

A fast neural hybrid Newton solver adapted to implicit methods for nonlinear dynamics

Authors: Tianyu Jin, Georg Maierhofer, Katharina Schratz, Yang Xiang

Abstract: The use of implicit time-stepping schemes for the numerical approximation of solutions to stiff nonlinear time-evolution equations brings well-known advantages including, typically, better stability behaviour and corresponding support of larger time steps, and better structure preservation properties. However, this comes at the price of having to solve a nonlinear equation at every time step of th… ▽ More The use of implicit time-stepping schemes for the numerical approximation of solutions to stiff nonlinear time-evolution equations brings well-known advantages including, typically, better stability behaviour and corresponding support of larger time steps, and better structure preservation properties. However, this comes at the price of having to solve a nonlinear equation at every time step of the numerical scheme. In this work, we propose a novel operator learning based hybrid Newton's method to accelerate this solution of the nonlinear time step system for stiff time-evolution nonlinear equations. We propose a targeted learning strategy which facilitates robust unsupervised learning in an offline phase and provides a highly efficient initialisation for the Newton iteration leading to consistent acceleration of Newton's method. A quantifiable rate of improvement in Newton's method achieved by improved initialisation is provided and we analyse the upper bound of the generalisation error of our unsupervised learning strategy. These theoretical results are supported by extensive numerical results, demonstrating the efficiency of our proposed neural hybrid solver both in one- and two-dimensional cases. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02280 [pdf, other]

FedIA: Federated Medical Image Segmentation with Heterogeneous Annotation Completeness

Authors: Yangyang Xiang, Nannan Wu, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

Abstract: Federated learning has emerged as a compelling paradigm for medical image segmentation, particularly in light of increasing privacy concerns. However, most of the existing research relies on relatively stringent assumptions regarding the uniformity and completeness of annotations across clients. Contrary to this, this paper highlights a prevalent challenge in medical practice: incomplete annotatio… ▽ More Federated learning has emerged as a compelling paradigm for medical image segmentation, particularly in light of increasing privacy concerns. However, most of the existing research relies on relatively stringent assumptions regarding the uniformity and completeness of annotations across clients. Contrary to this, this paper highlights a prevalent challenge in medical practice: incomplete annotations. Such annotations can introduce incorrectly labeled pixels, potentially undermining the performance of neural networks in supervised learning. To tackle this issue, we introduce a novel solution, named FedIA. Our insight is to conceptualize incomplete annotations as noisy data (i.e., low-quality data), with a focus on mitigating their adverse effects. We begin by evaluating the completeness of annotations at the client level using a designed indicator. Subsequently, we enhance the influence of clients with more comprehensive annotations and implement corrections for incomplete ones, thereby ensuring that models are trained on accurate data. Our method's effectiveness is validated through its superior performance on two extensively used medical image segmentation datasets, outperforming existing solutions. The code is available at https://github.com/HUSTxyy/FedIA. △ Less

Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Early accepted by MICCAI 2024

arXiv:2406.17969 [pdf, other]

Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

Authors: Hanqi Yan, Yanzheng Xiang, Guangyi Chen, Yifei Wang, Lin Gui, Yulan He

Abstract: To better interpret the intrinsic mechanism of large language models (LLMs), recent studies focus on monosemanticity on its basic units. A monosemantic neuron is dedicated to a single and specific concept, which forms a one-to-one correlation between neurons and concepts. Despite extensive research in monosemanticity probing, it remains unclear whether monosemanticity is beneficial or harmful to m… ▽ More To better interpret the intrinsic mechanism of large language models (LLMs), recent studies focus on monosemanticity on its basic units. A monosemantic neuron is dedicated to a single and specific concept, which forms a one-to-one correlation between neurons and concepts. Despite extensive research in monosemanticity probing, it remains unclear whether monosemanticity is beneficial or harmful to model capacity. To explore this question, we revisit monosemanticity from the feature decorrelation perspective and advocate for its encouragement. We experimentally observe that the current conclusion by wang2024learning, which suggests that decreasing monosemanticity enhances model performance, does not hold when the model changes. Instead, we demonstrate that monosemanticity consistently exhibits a positive correlation with model capacity, in the preference alignment process. Consequently, we apply feature correlation as a proxy for monosemanticity and incorporate a feature decorrelation regularizer into the dynamic preference optimization process. The experiments show that our method not only enhances representation diversity and activation sparsity but also improves preference alignment performance. △ Less

Submitted 15 October, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: EMNLP24, Main, Long

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 16 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07232 [pdf, other]

DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

Authors: Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang

Abstract: Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual l… ▽ More Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual learning of translation tasks to provide effective feedback, thereby enhancing the models' self-reflective abilities and improving translation performance. The application of this method across various translation tasks has proven its effectiveness in improving translation accuracy and eliminating ambiguities, especially in translation tasks with low-resource language pairs. △ Less

Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 main conference

arXiv:2406.07036 [pdf, other]

Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model

Authors: Hongbin Zhang, Kehai Chen, Xuefeng Bai, Yang Xiang, Min Zhang

Abstract: Large language models (LLMs) have showcased impressive multilingual machine translation ability. However, unlike encoder-decoder style models, decoder-only LLMs lack an explicit alignment between source and target contexts. Analyzing contribution scores during generation processes revealed that LLMs can be biased towards previously generated tokens over corresponding source tokens, leading to unfa… ▽ More Large language models (LLMs) have showcased impressive multilingual machine translation ability. However, unlike encoder-decoder style models, decoder-only LLMs lack an explicit alignment between source and target contexts. Analyzing contribution scores during generation processes revealed that LLMs can be biased towards previously generated tokens over corresponding source tokens, leading to unfaithful translations. To address this issue, we propose to encourage LLMs to pay more attention to the source context from both source and target perspectives in zeroshot prompting: 1) adjust source context attention weights; 2) suppress irrelevant target prefix influence; Additionally, we propose 3) avoiding over-reliance on the target prefix in instruction tuning. Experimental results from both human-collected unfaithfulness test sets focusing on LLM-generated unfaithful translations and general test sets, verify our methods' effectiveness across multiple language pairs. Further human evaluation shows our method's efficacy in reducing hallucinatory translations and facilitating faithful translation generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by ACL2024 Findings

arXiv:2406.06843 [pdf, other]

HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

Authors: Jikai Wang, Qifan Zhang, Yu-Wei Chao, Bowen Wen, Xiaohu Guo, Yu Xiang

Abstract: We introduce a data capture system and a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method to obtain annotations of shape and pose of hands and o… ▽ More We introduce a data capture system and a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method to obtain annotations of shape and pose of hands and objects in the collected videos, which significantly reduces the required annotation time compared to manual labeling. With this system, we captured a video dataset of humans using objects to perform different tasks, as well as simple pick-and-place and handover of an object from one hand to the other, which can be used as human demonstrations for embodied AI and robot manipulation research. Our data capture setup and annotation framework can be used by the community to reconstruct 3D shapes of objects and human hands and track their poses in videos. △ Less

Submitted 16 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.03880 [pdf, other]

Memorization in deep learning: A survey

Authors: Jiaheng Wei, Yanjun Zhang, Leo Yu Zhang, Ming Ding, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

Abstract: Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model… ▽ More Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model generalization, security, and privacy. This raises critical questions about the nature of generalization in DNNs and their susceptibility to security breaches. In this survey, we present a systematic framework to organize memorization definitions based on the generalization and security/privacy domains and summarize memorization evaluation methods at both the example and model levels. Through a comprehensive literature review, we explore DNN memorization behaviors and their impacts on security and privacy. We also introduce privacy vulnerabilities caused by memorization and the phenomenon of forgetting and explore its connection with memorization. Furthermore, we spotlight various applications leveraging memorization and forgetting mechanisms, including noisy label learning, privacy preservation, and model enhancement. This survey offers the first-in-kind understanding of memorization in DNNs, providing insights into its challenges and opportunities for enhancing AI development while addressing critical ethical concerns. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02630 [pdf, other]

AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways

Authors: Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, Yang Xiang

Abstract: An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or makes decisions based on pre-defined objectives and data inputs. AI agents, capable of perceiving user inputs, reasoning and planning tasks, and executing actions, have seen remarkable advancements in algorithm development and task performance. However, the security challenges they pose remain under-expl… ▽ More An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or makes decisions based on pre-defined objectives and data inputs. AI agents, capable of perceiving user inputs, reasoning and planning tasks, and executing actions, have seen remarkable advancements in algorithm development and task performance. However, the security challenges they pose remain under-explored and unresolved. This survey delves into the emerging security threats faced by AI agents, categorizing them into four critical knowledge gaps: unpredictability of multi-step user inputs, complexity in internal executions, variability of operational environments, and interactions with untrusted external entities. By systematically reviewing these threats, this paper highlights both the progress made and the existing limitations in safeguarding AI agents. The insights provided aim to inspire further research into addressing the security threats associated with AI agents, thereby fostering the development of more robust and secure AI agent applications. △ Less

Submitted 5 September, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Submitted to ACM Computing Survey

arXiv:2405.17859 [pdf, other]

Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

Authors: Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, Yu Xiang

Abstract: Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision metho… ▽ More Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize the Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the top RGB methods by 3.6 AP and remains competitive with the best RGB-D method. Code is available at: https://github.com/YoungSean/NIDS-Net △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 22 pages, 9 figures, Code is available at: https://github.com/YoungSean/NIDS-Net

arXiv:2405.16594 [pdf, ps, other]

Training-Conditional Coverage Bounds under Covariate Shift

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: Training-conditional coverage guarantees in conformal prediction concern the concentration of the error distribution, conditional on the training data, below some nominal level. The conformal prediction methodology has recently been generalized to the covariate shift setting, namely, the covariate distribution changes between the training and test data. In this paper, we study the training-conditi… ▽ More Training-conditional coverage guarantees in conformal prediction concern the concentration of the error distribution, conditional on the training data, below some nominal level. The conformal prediction methodology has recently been generalized to the covariate shift setting, namely, the covariate distribution changes between the training and test data. In this paper, we study the training-conditional coverage properties of a range of conformal prediction methods under covariate shift via a weighted version of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality tailored for distribution change. The result for the split conformal method is almost assumption-free, while the results for the full conformal and jackknife+ methods rely on strong assumptions including the uniform stability of the training algorithm. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2404.13731

arXiv:2405.15258 [pdf, other]

Leakage-Resilient and Carbon-Neutral Aggregation Featuring the Federated AI-enabled Critical Infrastructure

Authors: Zehang Deng, Ruoxi Sun, Minhui Xue, Sheng Wen, Seyit Camtepe, Surya Nepal, Yang Xiang

Abstract: AI-enabled critical infrastructures (ACIs) integrate artificial intelligence (AI) technologies into various essential systems and services that are vital to the functioning of society, offering significant implications for efficiency, security and resilience. While adopting decentralized AI approaches (such as federated learning technology) in ACIs is plausible, private and sensitive data are stil… ▽ More AI-enabled critical infrastructures (ACIs) integrate artificial intelligence (AI) technologies into various essential systems and services that are vital to the functioning of society, offering significant implications for efficiency, security and resilience. While adopting decentralized AI approaches (such as federated learning technology) in ACIs is plausible, private and sensitive data are still susceptible to data reconstruction attacks through gradient optimization. In this work, we propose Compressed Differentially Private Aggregation (CDPA), a leakage-resilient, communication-efficient, and carbon-neutral approach for ACI networks. Specifically, CDPA has introduced a novel random bit-flipping mechanism as its primary innovation. This mechanism first converts gradients into a specific binary representation and then selectively flips masked bits with a certain probability. The proposed bit-flipping introduces a larger variance to the noise while providing differentially private protection and commendable efforts in energy savings while applying vector quantization techniques within the context of federated learning. The experimental evaluation indicates that CDPA can reduce communication cost by half while preserving model utility. Moreover, we demonstrate that CDPA can effectively defend against state-of-the-art data reconstruction attacks in both computer vision and natural language processing tasks. We highlight existing benchmarks that generate 2.6x to over 100x more carbon emissions than CDPA. We hope that the CDPA developed in this paper can inform the federated AI-enabled critical infrastructure of a more balanced trade-off between utility and privacy, resilience protection, as well as a better carbon offset with less communication overhead. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14099 [pdf, other]

Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations

Authors: Chuqi Chen, Yahong Yang, Yang Xiang, Wenrui Hao

Abstract: Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One advantage of the neural network methods for PDEs lies in its automatic differentiation (AD), which necessitates only the sample points themselves, unlike traditi… ▽ More Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One advantage of the neural network methods for PDEs lies in its automatic differentiation (AD), which necessitates only the sample points themselves, unlike traditional finite difference (FD) approximations that require nearby local points to compute derivatives. In this paper, we quantitatively demonstrate the advantage of AD in training neural networks. The concept of truncated entropy is introduced to characterize the training property. Specifically, through comprehensive experimental and theoretical analyses conducted on random feature models and two-layer neural networks, we discover that the defined truncated entropy serves as a reliable metric for quantifying the residual loss of random feature models and the training speed of neural networks for both AD and FD methods. Our experimental and theoretical analyses demonstrate that, from a training perspective, AD outperforms FD in solving PDEs. △ Less

Submitted 2 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12114 [pdf, other]

A New Cross-Space Total Variation Regularization Model for Color Image Restoration with Quaternion Blur Operator

Authors: Zhigang Jia, Yuelian Xiang, Meixiang Zhao, Tingting Wu, Michael K. Ng

Abstract: The cross-channel deblurring problem in color image processing is difficult to solve due to the complex coupling and structural blurring of color pixels. Until now, there are few efficient algorithms that can reduce color infection in deblurring process. To solve this challenging problem, we present a novel cross-space total variation (CSTV) regularization model for color image deblurring by intro… ▽ More The cross-channel deblurring problem in color image processing is difficult to solve due to the complex coupling and structural blurring of color pixels. Until now, there are few efficient algorithms that can reduce color infection in deblurring process. To solve this challenging problem, we present a novel cross-space total variation (CSTV) regularization model for color image deblurring by introducing a quaternion blur operator and a cross-color space regularization functional. The existence and uniqueness of the solution is proved and a new L-curve method is proposed to find a sweet balance of regularization functionals on different color spaces. The Euler-Lagrange equation is derived to show that CSTV has taken into account the coupling of all color channels and the local smoothing within each color channel. A quaternion operator splitting method is firstly proposed to enhance the ability of color infection reduction of the CSTV regularization model. This strategy also applies to the well-known color deblurring models. Numerical experiments on color image databases illustrate the efficiency and manoeuvrability of the new model and algorithms. The color images restored by them successfully maintain the color and spatial information and are of higher quality in terms of PSNR, SSIM, MSE and CIEde2000 than the restorations of the-state-of-the-art methods. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 15pages,10figures

arXiv:2405.10616 [pdf, other]

Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization

Authors: Yixin Ji, Yang Xiang, Juntao Li, Wei Chen, Zhongyi Liu, Kehai Chen, Min Zhang

Abstract: In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in L… ▽ More In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank dimensions. Experiments on the LLaMA-2 models demonstrate that our method outperforms existing strong structured pruning and low-rank compression techniques in maintaining model performance at the same compression ratio. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted by 2024 ACL findings

arXiv:2405.09298 [pdf]

Deep Blur Multi-Model (DeepBlurMM) -- a strategy to mitigate the impact of image blur on deep learning model performance in histopathology image analysis

Authors: Yujie Xiang, Bojing Liu, Mattias Rantalainen

Abstract: AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality, including unsharp areas of WSIs, impacts model performance. We investigate the impact of blur and propose a multi-model approach to mitigate negative impact of unsharp image areas. In this study, we use a simulation approach, evaluating model performance under varying levels… ▽ More AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality, including unsharp areas of WSIs, impacts model performance. We investigate the impact of blur and propose a multi-model approach to mitigate negative impact of unsharp image areas. In this study, we use a simulation approach, evaluating model performance under varying levels of added Gaussian blur to image tiles from >900 H&E-stained breast cancer WSIs. To reduce impact of blur, we propose a novel multi-model approach (DeepBlurMM) where multiple models trained on data with variable amounts of Gaussian blur are used to predict tiles based on their blur levels. Using histological grade as a principal example, we found that models trained with mildly blurred tiles improved performance over the base model when moderate-high blur was present. DeepBlurMM outperformed the base model in presence of moderate blur across all tiles (AUC:0.764 vs. 0.710), and in presence of a mix of low, moderate, and high blur across tiles (AUC:0.821 vs. 0.789). Unsharp image tiles in WSIs impact prediction performance. DeepBlurMM improved prediction performance under some conditions and has the potential to increase quality in both research and clinical applications. △ Less

Submitted 23 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

ACM Class: I.4; J.3

arXiv:2405.06902

Causal Inference from Slowly Varying Nonstationary Processes

Authors: Kang Du, Yu Xiang

Abstract: Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In t… ▽ More Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In this work, we propose a new class of restricted SCM, via a time-varying filter and stationary noise, and exploit the asymmetry from nonstationarity for causal identification in both bivariate and network settings. We propose efficient procedures by leveraging powerful estimates of the bivariate evolutionary spectra for slowly varying processes. Various synthetic and real datasets that involve high-order and non-smooth filters are evaluated to demonstrate the effectiveness of our proposed methodology. △ Less

Submitted 29 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

Comments: This work was intended as a replacement of arXiv:2012.13025 and any subsequent updates will appear there

arXiv:2405.05498 [pdf, other]

The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

Authors: Jingguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93\% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88\% on the track 2 evaluation set. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04858 [pdf, other]

Pedestrian Attribute Recognition as Label-balanced Multi-label Learning

Authors: Yibo Zhou, Hai-Miao Hu, Yirong Xiang, Xiaokang Zhang, Haotian Wu

Abstract: Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity… ▽ More Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity. To render perfect label balancing, we propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence, i.e., we equalize the sampling prior of an attribute while not biasing that of the co-occurred others. To diversify the attributes semantics and mitigate the feature noise, we propose a Bayesian feature augmentation method to introduce true in-distribution novelty. Handling both imbalances jointly, our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted as ICML2024 main conference paper

arXiv:2405.00273 [pdf, other]

Social Life Simulation for Non-Cognitive Skills Learning

Authors: Zihan Yan, Yaohong Xiang, Yun Huang

Abstract: Non-cognitive skills are crucial for personal and social life well-being, and such skill development can be supported by narrative-based (e.g., storytelling) technologies. While generative AI enables interactive and role-playing storytelling, little is known about how users engage with and perceive the use of AI in social life simulation for non-cognitive skills learning. Additionally, the benefit… ▽ More Non-cognitive skills are crucial for personal and social life well-being, and such skill development can be supported by narrative-based (e.g., storytelling) technologies. While generative AI enables interactive and role-playing storytelling, little is known about how users engage with and perceive the use of AI in social life simulation for non-cognitive skills learning. Additionally, the benefits of AI mentorship on self-reflection awareness and ability in this context remain largely underexplored. To this end, we introduced Simulife++, an interactive platform enabled by a large language model (LLM). The system allows users to act as protagonists, creating stories with one or multiple AI-based characters in diverse social scenarios. In particular, we expanded the Human-AI interaction to a Human-AI-AI collaboration by including a Sage Agent, who acts as a bystander, providing users with some perspectives and guidance on their choices and conversations in terms of non-cognitive skills to promote reflection. In a within-subject user study, our quantitative results reveal that, when accompanied by Sage Agent, users exhibit significantly higher levels of reflection on motivation, self-perceptions, and resilience & coping, along with an enhanced experience of narrative transportation. Additionally, our qualitative findings suggest that Sage Agent plays a crucial role in promoting reflection on non-cognitive skills, enhancing social communication and decision-making performance, and improving overall user experience within Simulife++. Multiple supportive relationships between Sage Agent and users were also reported. We offer design implications for the application of generative AI in narrative solutions and the future potential of Sage Agent for non-cognitive skill development in broader social contexts. △ Less

Submitted 19 July, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

arXiv:2405.00026 [pdf]

Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach

Authors: Mengran Zhu, Ye Zhang, Yulu Gong, Changxin Xu, Yafei Xiang

Abstract: Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transact… ▽ More Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection. Results demonstrat e that the integration of NN and SMOTE exhibits superior precision, recall, and F1-score compared to traditional models, highlighting its potential as an advanced solution for handling imbalanced datasets in credit card fraud detection scenarios. This rese arch contributes to the ongoing efforts to develop effective and efficient mechanisms for safeguarding financial transactions from fraudulent activities. △ Less

Submitted 26 February, 2024; originally announced May 2024.

arXiv:2404.17877 [pdf, ps, other]

doi 10.1007/978-3-031-44693-1_21

PromptCL: Improving Event Representation via Prompt Template and Contrastive Learning

Authors: Yubo Feng, Lishuang Li, Yi Xiang, Xueyang Qin

Abstract: The representation of events in text plays a significant role in various NLP tasks. Recent research demonstrates that contrastive learning has the ability to improve event comprehension capabilities of Pre-trained Language Models (PLMs) and enhance the performance of event representation learning. However, the efficacy of event representation learning based on contrastive learning and PLMs is limi… ▽ More The representation of events in text plays a significant role in various NLP tasks. Recent research demonstrates that contrastive learning has the ability to improve event comprehension capabilities of Pre-trained Language Models (PLMs) and enhance the performance of event representation learning. However, the efficacy of event representation learning based on contrastive learning and PLMs is limited by the short length of event texts. The length of event texts differs significantly from the text length used in the pre-training of PLMs. As a result, there is inconsistency in the distribution of text length between pre-training and event representation learning, which may undermine the learning process of event representation based on PLMs. In this study, we present PromptCL, a novel framework for event representation learning that effectively elicits the capabilities of PLMs to comprehensively capture the semantics of short event texts. PromptCL utilizes a Prompt template borrowed from prompt learning to expand the input text during Contrastive Learning. This helps in enhancing the event representation learning by providing a structured outline of the event components. Moreover, we propose Subject-Predicate-Object (SPO) word order and Event-oriented Masked Language Modeling (EventMLM) to train PLMs to understand the relationships between event components. Our experimental results demonstrate that PromptCL outperforms state-of-the-art baselines on event related tasks. Additionally, we conduct a thorough analysis and demonstrate that using a prompt results in improved generalization capabilities for event representations. Our code will be available at https://github.com/YuboFeng2023/PromptCL. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: NLPCC 2023 Best Student Paper

Journal ref: Natural Language Processing and Chinese Computing (NLPCC 2023)

arXiv:2404.15245 [pdf, other]

Mining Invariance from Nonlinear Multi-Environment Data: Binary Classification

Authors: Austin Goddard, Kang Du, Yu Xiang

Abstract: Making predictions in an unseen environment given data from multiple training environments is a challenging task. We approach this problem from an invariance perspective, focusing on binary classification to shed light on general nonlinear data generation mechanisms. We identify a unique form of invariance that exists solely in a binary setting that allows us to train models invariant over environ… ▽ More Making predictions in an unseen environment given data from multiple training environments is a challenging task. We approach this problem from an invariance perspective, focusing on binary classification to shed light on general nonlinear data generation mechanisms. We identify a unique form of invariance that exists solely in a binary setting that allows us to train models invariant over environments. We provide sufficient conditions for such invariance and show it is robust even when environmental conditions vary greatly. Our formulation admits a causal interpretation, allowing us to compare it with various frameworks. Finally, we propose a heuristic prediction method and conduct experiments using real and synthetic datasets. △ Less

Submitted 3 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: Accepted to the 2024 International Symposium on Information Theory (ISIT)

arXiv:2404.13731 [pdf, ps, other]

Training-Conditional Coverage Bounds for Uniformly Stable Learning Algorithms

Authors: Mehrdad Pournaderi, Yu Xiang

Abstract: The training-conditional coverage performance of the conformal prediction is known to be empirically sound. Recently, there have been efforts to support this observation with theoretical guarantees. The training-conditional coverage bounds for jackknife+ and full-conformal prediction regions have been established via the notion of $(m,n)$-stability by Liang and Barber~[2023]. Although this notion… ▽ More The training-conditional coverage performance of the conformal prediction is known to be empirically sound. Recently, there have been efforts to support this observation with theoretical guarantees. The training-conditional coverage bounds for jackknife+ and full-conformal prediction regions have been established via the notion of $(m,n)$-stability by Liang and Barber~[2023]. Although this notion is weaker than uniform stability, it is not clear how to evaluate it for practical models. In this paper, we study the training-conditional coverage bounds of full-conformal, jackknife+, and CV+ prediction regions from a uniform stability perspective which is known to hold for empirical risk minimization over reproducing kernel Hilbert spaces with convex regularization. We derive coverage bounds for finite-dimensional models by a concentration argument for the (estimated) predictor function, and compare the bounds with existing ones under ridge regression. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: Accepted to the ISIT 2024 workshop on Information-Theoretic Methods for Trustworthy Machine Learning (IT-TML)

arXiv:2404.12715 [pdf, other]

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Authors: Yichong Huang, Xiaocheng Feng, Baohang Li, Yang Xiang, Hui Wang, Bing Qin, Ting Liu

Abstract: Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignorin… ▽ More Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignoring the valuable information in the internal representations. In this work, we propose a training-free ensemble framework DeePEn, fusing the informative probability distributions yielded by different LLMs at each decoding step. Unfortunately, the vocabulary discrepancy between heterogeneous LLMs directly makes averaging the distributions unfeasible due to the token misalignment. To address this challenge, DeePEn maps the probability distribution of each model from its own probability space to a universal relative space based on the relative representation theory, and performs aggregation. Next, we devise a search-based inverse transformation to transform the aggregated result back to the probability space of one of the ensembling LLMs (main model), in order to determine the next token. We conduct extensive experiments on ensembles of different number of LLMs, ensembles of LLMs with different architectures, and ensembles between the LLM and the specialist model. Experimental results show that (i) DeePEn achieves consistent improvements across six benchmarks covering subject examination, reasoning, and knowledge, (ii) a well-performing specialist model can benefit from a less effective LLM through distribution fusion, and (iii) DeePEn has complementary strengths with other ensemble methods such as voting. △ Less

Submitted 30 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: 16 pages, 9 figures, 9 tables

arXiv:2404.11667 [pdf, other]

Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification

Authors: Shivvrat Arya, Yu Xiang, Vibhav Gogate

Abstract: We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with… ▽ More We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Will appear in AISTATS 2024. arXiv admin note: substantial text overlap with arXiv:2302.00633

arXiv:2404.08690 [pdf, other]

Towards Building a Robust Toxicity Predictor

Authors: Dmitriy Bespalov, Sourav Bhabesh, Yi Xiang, Liutong Zhou, Yanjun Qi

Abstract: Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies t… ▽ More Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign. ToxicTrap exploits greedy based search strategies to enable fast and effective generation of toxic adversarial examples. Two novel goal function designs allow ToxicTrap to identify weaknesses in both multiclass and multilabel toxic language detectors. Our empirical results show that SOTA toxicity text classifiers are indeed vulnerable to the proposed attacks, attaining over 98\% attack success rates in multilabel cases. We also show how a vanilla adversarial training and its improved version can help increase robustness of a toxicity detector even against unseen attacks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: ACL 2023 /

Showing 1–50 of 346 results for author: Xiang, Y