Skip to main content

Showing 1–50 of 645 results for author: Zheng, H

  1. arXiv:2410.14919  [pdf, other

    cs.CV cs.LG

    Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

    Authors: Mingyuan Zhou, Huangjie Zheng, Yi Gu, Zhendong Wang, Hai Huang

    Abstract: Score identity Distillation (SiD) is a data-free method that has achieved state-of-the-art performance in image generation by leveraging only a pretrained diffusion model, without requiring any training data. However, the ultimate performance of SiD is constrained by the accuracy with which the pretrained model captures the true data scores at different stages of the diffusion process. In this pap… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  2. arXiv:2410.14167  [pdf

    cs.IR

    Optimizing Retrieval-Augmented Generation with Elasticsearch for Enhanced Question-Answering Systems

    Authors: Jiajing Chen, Runyuan Bao, Hongye Zheng, Zhen Qi, Jianjun Wei, Jiacheng Hu

    Abstract: This study aims to improve the accuracy and quality of large-scale language models (LLMs) in answering questions by integrating Elasticsearch into the Retrieval Augmented Generation (RAG) framework. The experiment uses the Stanford Question Answering Dataset (SQuAD) version 2.0 as the test dataset and compares the performance of different retrieval methods, including traditional methods based on k… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  3. arXiv:2410.09406  [pdf, other

    eess.IV cs.ET quant-ph

    Quantum Neural Network for Accelerated Magnetic Resonance Imaging

    Authors: Shuo Zhou, Yihang Zhou, Congcong Liu, Yanjie Zhu, Hairong Zheng, Dong Liang, Haifeng Wang

    Abstract: Magnetic resonance image reconstruction starting from undersampled k-space data requires the recovery of many potential nonlinear features, which is very difficult for algorithms to recover these features. In recent years, the development of quantum computing has discovered that quantum convolution can improve network accuracy, possibly due to potential quantum advantages. This article proposes a… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: Accepted at 2024 IEEE International Conference on Imaging Systems and Techniques (IST 2024)

  4. arXiv:2410.08224  [pdf, other

    eess.SP cs.AI cs.LG q-bio.NC

    A Survey of Spatio-Temporal EEG data Analysis: from Models to Applications

    Authors: Pengfei Wang, Huanran Zheng, Silong Dai, Yiqiao Wang, Xiaotian Gu, Yuanbin Wu, Xiaoling Wang

    Abstract: In recent years, the field of electroencephalography (EEG) analysis has witnessed remarkable advancements, driven by the integration of machine learning and artificial intelligence. This survey aims to encapsulate the latest developments, focusing on emerging methods and technologies that are poised to transform our comprehension and interpretation of brain activity. We delve into self-supervised… ▽ More

    Submitted 26 September, 2024; originally announced October 2024.

    Comments: submitted to IECE Chinese Journal of Information Fusion

    Report number: X123YZ

  5. arXiv:2410.07688  [pdf, other

    cs.RO cs.CV

    PokeFlex: A Real-World Dataset of Deformable Objects for Robotics

    Authors: Jan Obrist, Miguel Zamora, Hehui Zheng, Ronan Hinchet, Firat Ozdemir, Juan Zarate, Robert K. Katzschmann, Stelian Coros

    Abstract: Data-driven methods have shown great potential in solving challenging manipulation tasks, however, their application in the domain of deformable objects has been constrained, in part, by the lack of data. To address this, we propose PokeFlex, a dataset featuring real-world paired and annotated multimodal data that includes 3D textured meshes, point clouds, RGB images, and depth maps. Such data can… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  6. arXiv:2410.01573  [pdf, other

    cs.CV

    PASS:Test-Time Prompting to Adapt Styles and Semantic Shapes in Medical Image Segmentation

    Authors: Chuyan Zhang, Hao Zheng, Xin You, Yefeng Zheng, Yun Gu

    Abstract: Test-time adaptation (TTA) has emerged as a promising paradigm to handle the domain shifts at test time for medical images from different institutions without using extra training data. However, existing TTA solutions for segmentation tasks suffer from (1) dependency on modifying the source training stage and access to source priors or (2) lack of emphasis on shape-related semantic knowledge that… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: Submitted to IEEE TMI

  7. arXiv:2409.20175  [pdf, other

    cs.LG stat.ML

    Ensemble Kalman Diffusion Guidance: A Derivative-free Method for Inverse Problems

    Authors: Hongkai Zheng, Wenda Chu, Austin Wang, Nikola Kovachki, Ricardo Baptista, Yisong Yue

    Abstract: When solving inverse problems, it is increasingly popular to use pre-trained diffusion models as plug-and-play priors. This framework can accommodate different forward models without re-training while preserving the generative capability of diffusion models. Despite their success in many imaging inverse problems, most existing methods rely on privileged information such as derivative, pseudo-inver… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  8. arXiv:2409.19272  [pdf, other

    cs.CL

    Perception Compressor:A training-free prompt compression method in long context scenarios

    Authors: Jiwei Tang, Jin Xu, Tingwei Lu, Hai Lin, Yiming Zhao, Hai-Tao Zheng

    Abstract: Large Language Models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and tend to be lost in the middle in long context scenarios, leading to inferior performance. To address these challenges, we present Perception Compressor, a training-free prompt compression method. It includes a dual-slope ratio allocator to dynamically ass… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

    Comments: 9 pages, 2 figures

  9. arXiv:2409.19171  [pdf, other

    q-bio.QM cs.LG eess.IV

    Reducing Overtreatment of Indeterminate Thyroid Nodules Using a Multimodal Deep Learning Model

    Authors: Shreeram Athreya, Andrew Melehy, Sujit Silas Armstrong Suthahar, Vedrana Ivezić, Ashwath Radhachandran, Vivek Sant, Chace Moleta, Henry Zheng, Maitraya Patel, Rinat Masamed, Corey W. Arnold, William Speier

    Abstract: Objective: Molecular testing (MT) classifies cytologically indeterminate thyroid nodules as benign or malignant with high sensitivity but low positive predictive value (PPV), only using molecular profiles, ignoring ultrasound (US) imaging and biopsy. We address this limitation by applying attention multiple instance learning (AMIL) to US images. Methods: We retrospectively reviewed 333 patients… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: 9 pages, 3 figures

  10. arXiv:2409.18962  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring Token Pruning in Vision State Space Models

    Authors: Zheng Zhan, Zhenglun Kong, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang

    Abstract: State Space Models (SSMs) have the advantage of keeping linear computational complexity compared to attention modules in transformers, and have been applied to vision tasks as a new type of powerful vision foundation model. Inspired by the observations that the final prediction in vision transformers (ViTs) is only based on a subset of most informative tokens, we take the novel step of enhancing t… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: NeurIPS'24

  11. arXiv:2409.17124  [pdf, other

    cs.RO

    PokeFlex: Towards a Real-World Dataset of Deformable Objects for Robotic Manipulation

    Authors: Jan Obrist, Miguel Zamora, Hehui Zheng, Juan Zarate, Robert K. Katzschmann, Stelian Coros

    Abstract: Advancing robotic manipulation of deformable objects can enable automation of repetitive tasks across multiple industries, from food processing to textiles and healthcare. Yet robots struggle with the high dimensionality of deformable objects and their complex dynamics. While data-driven methods have shown potential for solving manipulation tasks, their application in the domain of deformable obje… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Extended Abstract, 40th Anniversary of the IEEE International Conference on Robotics and Automation. (ICRA@40 Rotterdam 2024)

  12. arXiv:2409.16945  [pdf, other

    cs.CV

    Face Forgery Detection with Elaborate Backbone

    Authors: Zonghui Guo, Yingjie Liu, Jie Zhang, Haiyong Zheng, Shiguang Shan

    Abstract: Face Forgery Detection (FFD), or Deepfake detection, aims to determine whether a digital face is real or fake. Due to different face synthesis algorithms with diverse forgery patterns, FFD models often overfit specific patterns in training datasets, resulting in poor generalization to other unseen forgeries. This severe challenge requires FFD models to possess strong capabilities in representing c… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  13. arXiv:2409.16675  [pdf, other

    cs.CR cs.DB cs.LG

    CryptoTrain: Fast Secure Training on Encrypted Dataset

    Authors: Jiaqi Xue, Yancheng Zhang, Yanshan Wang, Xueqiang Wang, Hao Zheng, Qian Lou

    Abstract: Secure training, while protecting the confidentiality of both data and model weights, typically incurs significant training overhead. Traditional Fully Homomorphic Encryption (FHE)-based non-inter-active training models are heavily burdened by computationally demanding bootstrapping. To develop an efficient secure training system, we established a foundational framework, CryptoTrain-B, utilizing a… ▽ More

    Submitted 26 September, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: Accepted by CCS-LAMPS 2024

  14. arXiv:2409.15763  [pdf, other

    cs.IR cs.AI

    IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios

    Authors: Hai Lin, Shaoxiong Zhan, Junyou Su, Haitao Zheng, Hui Wang

    Abstract: In Retrieval-Augmented Generation (RAG) tasks using Large Language Models (LLMs), the quality of retrieved information is critical to the final output. This paper introduces the IRSC benchmark for evaluating the performance of embedding models in multilingual RAG tasks. The benchmark encompasses five retrieval tasks: query retrieval, title retrieval, part-of-paragraph retrieval, keyword retrieval,… ▽ More

    Submitted 26 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

  15. arXiv:2409.15314  [pdf

    cs.LG

    Reducing Bias in Deep Learning Optimization: The RSGDM Approach

    Authors: Honglin Qin, Hongye Zheng, Bingxing Wang, Zhizhong Wu, Bingyao Liu, Yuanfang Yang

    Abstract: Currently, widely used first-order deep learning optimizers include non-adaptive learning rate optimizers and adaptive learning rate optimizers. The former is represented by SGDM (Stochastic Gradient Descent with Momentum), while the latter is represented by Adam. Both of these methods use exponential moving averages to estimate the overall gradient. However, estimating the overall gradient using… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  16. arXiv:2409.14978  [pdf, other

    cs.AI

    TS-TCD: Triplet-Level Cross-Modal Distillation for Time-Series Forecasting Using Large Language Models

    Authors: Pengfei Wang, Huanran Zheng, Silong Dai, Wenjing Yue, Wei Zhu, Xiaoling Wang

    Abstract: In recent years, large language models (LLMs) have shown great potential in time-series analysis by capturing complex dependencies and improving predictive performance. However, existing approaches often struggle with modality alignment, leading to suboptimal results. To address these challenges, we present a novel framework, TS-TCD, which introduces a comprehensive three-tiered cross-modal knowle… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

    MSC Class: 62M10; 68T07

  17. arXiv:2409.12314  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Understanding Implosion in Text-to-Image Generative Models

    Authors: Wenxin Ding, Cathy Y. Li, Shawn Shan, Ben Y. Zhao, Haitao Zheng

    Abstract: Recent works show that text-to-image generative models are surprisingly vulnerable to a variety of poisoning attacks. Empirical results find that these models can be corrupted by altering associations between individual text prompts and associated visual features. Furthermore, a number of concurrent poisoning attacks can induce "model implosion," where the model becomes unable to produce meaningfu… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: ACM CCS 2024

  18. arXiv:2409.11308  [pdf, other

    cs.CL

    SpMis: An Investigation of Synthetic Spoken Misinformation Detection

    Authors: Peizhuo Liu, Li Wang, Renqiang He, Haorui He, Lei Wang, Huadi Zheng, Jie Shi, Tong Xiao, Zhizheng Wu

    Abstract: In recent years, speech generation technology has advanced rapidly, fueled by generative models and large-scale training techniques. While these developments have enabled the production of high-quality synthetic speech, they have also raised concerns about the misuse of this technology, particularly for generating synthetic misinformation. Current research primarily focuses on distinguishing machi… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Accepted in SLT 2024

  19. arXiv:2409.08551  [pdf, other

    stat.ML cs.LG

    Think Twice Before You Act: Improving Inverse Problem Solving With MCMC

    Authors: Yaxuan Zhu, Zehao Dou, Haoxin Zheng, Yasi Zhang, Ying Nian Wu, Ruiqi Gao

    Abstract: Recent studies demonstrate that diffusion models can serve as a strong prior for solving inverse problems. A prominent example is Diffusion Posterior Sampling (DPS), which approximates the posterior distribution of data given the measure using Tweedie's formula. Despite the merits of being versatile in solving various inverse problems without re-training, the performance of DPS is hindered by the… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  20. arXiv:2409.08395  [pdf, other

    q-bio.QM cs.LG stat.AP

    Graphical Structural Learning of rs-fMRI data in Heavy Smokers

    Authors: Yiru Gong, Qimin Zhang, Huili Zheng, Zheyan Liu, Shaohan Chen

    Abstract: Recent studies revealed structural and functional brain changes in heavy smokers. However, the specific changes in topological brain connections are not well understood. We used Gaussian Undirected Graphs with the graphical lasso algorithm on rs-fMRI data from smokers and non-smokers to identify significant changes in brain connections. Our results indicate high stability in the estimated graphs a… ▽ More

    Submitted 16 September, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: Accepted by IEEE CCSB 2024 conference

  21. arXiv:2409.07690  [pdf, other

    cs.RO

    Characterization and Design of A Hollow Cylindrical Ultrasonic Motor

    Authors: Zhanyue Zhao, Yang Wang, Charles Bales, Daniel Ruiz-Cadalso, Howard Zheng, Cosme Furlong-Vazquez, Gregory Fischer

    Abstract: Piezoelectric ultrasonic motors perform the advantages of compact design, faster reaction time, and simpler setup compared to other motion units such as pneumatic and hydraulic motors, especially its non-ferromagnetic property makes it a perfect match in MRI-compatible robotics systems compared to traditional DC motors. Hollow shaft motors address the advantages of being lightweight and comparable… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 6 pages, 9 figures, 2 tables

  22. arXiv:2409.06912  [pdf, other

    cs.RO cs.AI

    A Bayesian Framework for Active Tactile Object Recognition, Pose Estimation and Shape Transfer Learning

    Authors: Haodong Zheng, Andrei Jalba, Raymond H. Cuijpers, Wijnand IJsselsteijn, Sanne Schoenmakers

    Abstract: As humans can explore and understand the world through active touch, similar capability is desired for robots. In this paper, we address the problem of active tactile object recognition, pose estimation and shape transfer learning, where a customized particle filter (PF) and Gaussian process implicit surface (GPIS) is combined in a unified Bayesian framework. Upon new tactile input, the customized… ▽ More

    Submitted 11 October, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

  23. arXiv:2409.03881  [pdf, other

    cs.RO cs.AI cs.MA

    Multi-agent Path Finding for Mixed Autonomy Traffic Coordination

    Authors: Han Zheng, Zhongxia Yan, Cathy Wu

    Abstract: In the evolving landscape of urban mobility, the prospective integration of Connected and Automated Vehicles (CAVs) with Human-Driven Vehicles (HDVs) presents a complex array of challenges and opportunities for autonomous driving systems. While recent advancements in robotics have yielded Multi-Agent Path Finding (MAPF) algorithms tailored for agent coordination task characterized by simplified ki… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  24. arXiv:2409.01579  [pdf, other

    cs.CL cs.AI

    AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models

    Authors: Qianchi Zhang, Hainan Zhang, Liang Pang, Hongwei Zheng, Zhiming Zheng

    Abstract: Retrieved documents containing noise will hinder RAG from detecting answer clues and make the inference process slow and expensive. Therefore, context compression is necessary to enhance its accuracy and efficiency. Existing context compression methods use extractive or generative models to retain the most query-relevant sentences or apply the information bottleneck theory to preserve sufficient i… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 8 pages, 5 figures, code available at https://anonymous.4open.science/r/AdaComp-8C0C/

  25. arXiv:2408.17072  [pdf, other

    cs.CL

    MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models

    Authors: Yujing Wang, Hainan Zhang, Liang Pang, Liang Pang, Hongwei Zheng, Zhiming Zheng

    Abstract: In a real-world RAG system, the current query often involves spoken ellipses and ambiguous references from dialogue contexts, necessitating query rewriting to better describe user's information needs. However, traditional context-based rewriting has minimal enhancement on downstream generation tasks due to the lengthy process from query rewriting to response generation. Some researchers try to uti… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  26. arXiv:2408.16068  [pdf, other

    q-bio.GN cs.AI stat.ML

    Identification of Prognostic Biomarkers for Stage III Non-Small Cell Lung Carcinoma in Female Nonsmokers Using Machine Learning

    Authors: Huili Zheng, Qimin Zhang, Yiru Gong, Zheyan Liu, Shaohan Chen

    Abstract: Lung cancer remains a leading cause of cancer-related deaths globally, with non-small cell lung cancer (NSCLC) being the most common subtype. This study aimed to identify key biomarkers associated with stage III NSCLC in non-smoking females using gene expression profiling from the GDS3837 dataset. Utilizing XGBoost, a machine learning algorithm, the analysis achieved a strong predictive performanc… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted for publication in the IEEE ICBASE 2024 conference

  27. arXiv:2408.12615  [pdf, other

    eess.IV cs.CV cs.LG

    Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

    Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

    Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

  28. arXiv:2408.11839  [pdf

    cs.LG cs.AI

    Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function

    Authors: Hongye Zheng, Bingxing Wang, Minheng Xiao, Honglin Qin, Zhizhong Wu, Lianghao Tan

    Abstract: Adaptive optimizers are pivotal in guiding the weight updates of deep neural networks, yet they often face challenges such as poor generalization and oscillation issues. To counter these, we introduce sigSignGrad and tanhSignGrad, two novel optimizers that integrate adaptive friction coefficients based on the Sigmoid and Tanh functions, respectively. These algorithms leverage short-term gradient i… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  29. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  30. arXiv:2408.04575  [pdf

    cs.AI cs.CL

    SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals

    Authors: Haoran Zheng, Utku Pamuksuz

    Abstract: Explainable Artificial Intelligence (XAI) plays a crucial role in enhancing the transparency and accountability of AI models, particularly in natural language processing (NLP) tasks. However, popular XAI methods such as LIME and SHAP have been found to be unstable and potentially misleading, underscoring the need for a standardized evaluation approach. This paper introduces SCENE (Soft Counterfact… ▽ More

    Submitted 16 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  31. arXiv:2408.02980  [pdf, other

    cs.CV

    Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models

    Authors: Haonan Zheng, Wen Jiang, Xinyang Deng, Wenrui Li

    Abstract: Recent studies on AI security have highlighted the vulnerability of Vision-Language Pre-training (VLP) models to subtle yet intentionally designed perturbations in images and texts. Investigating multimodal systems' robustness via adversarial attacks is crucial in this field. Most multimodal attacks are sample-specific, generating a unique perturbation for each sample to construct adversarial samp… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures, published in ACMMM2024

  32. arXiv:2408.01929  [pdf, other

    eess.IV cs.CV

    Advancing H&E-to-IHC Stain Translation in Breast Cancer: A Multi-Magnification and Attention-Based Approach

    Authors: Linhao Qu, Chengsheng Zhang, Guihui Li, Haiyong Zheng, Chen Peng, Wei He

    Abstract: Breast cancer presents a significant healthcare challenge globally, demanding precise diagnostics and effective treatment strategies, where histopathological examination of Hematoxylin and Eosin (H&E) stained tissue sections plays a central role. Despite its importance, evaluating specific biomarkers like Human Epidermal Growth Factor Receptor 2 (HER2) for personalized treatment remains constraine… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE CIS-RAM 2024 Invited Session Oral

  33. arXiv:2407.21693  [pdf, other

    cs.AI

    TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

    Authors: Ming Zhang, Caishuang Huang, Yilong Wu, Shichun Liu, Huiyuan Zheng, Yurui Dong, Yujiong Shen, Shihan Dou, Jun Zhao, Junjie Ye, Qi Zhang, Tao Gui, Xuanjing Huang

    Abstract: Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection. How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can signif… ▽ More

    Submitted 12 October, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  34. arXiv:2407.21038  [pdf, other

    cs.CL cs.AI cs.IR

    Advancing Chart Question Answering with Robust Chart Component Recognition

    Authors: Hanwen Zheng, Sijia Wang, Chris Thomas, Lifu Huang

    Abstract: Chart comprehension presents significant challenges for machine learning models due to the diverse and intricate shapes of charts. Existing multimodal methods often overlook these visual features or fail to integrate them effectively for chart question answering (ChartQA). To address this, we introduce Chartformer, a unified framework that enhances chart component recognition by accurately identif… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  35. arXiv:2407.20207  [pdf, other

    cs.CL cs.AI cs.IR

    QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval

    Authors: Hongming Tan, Shaoxiong Zhan, Hai Lin, Hai-Tao Zheng, Wai Kin, Chan

    Abstract: In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching. Additionally, low-quality texts with excessive noise or sparse key information are unlikely to align well with relevant queries. Recent studies mainly focus on improving the sentence embedding model or retrieval process. In this work, we introduce a novel text augm… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  36. Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation

    Authors: Jingjun Yi, Qi Bi, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li, Yefeng Zheng

    Abstract: The rapid development of Vision Foundation Model (VFM) brings inherent out-domain generalization for a variety of down-stream tasks. Among them, domain generalized semantic segmentation (DGSS) holds unique challenges as the cross-domain images share common pixel-wise content information but vary greatly in terms of the style. In this paper, we present a novel Spectral-dEcomposed Token (SET) learni… ▽ More

    Submitted 28 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: accecpted by ACM MM2024

  37. arXiv:2407.18391  [pdf, other

    cs.CV

    UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

    Authors: Xinyu Pi, Mingyuan Wu, Jize Jiang, Haozhen Zheng, Beitong Tian, Chengxiang Zhai, Klara Nahrstedt, Zhiting Hu

    Abstract: Smaller-scale Vision-Langauge Models (VLMs) often claim to perform on par with larger models in general-domain visual grounding and question-answering benchmarks while offering advantages in computational efficiency and storage. However, their ability to handle rare objects, which fall into the long tail of data distributions, is less understood. To rigorously evaluate this aspect, we introduce th… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 10 pages

  38. arXiv:2407.17797  [pdf, other

    cs.CV cs.AI

    A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models

    Authors: Haonan Zheng, Xinyang Deng, Wen Jiang, Wenrui Li

    Abstract: With Vision-Language Pre-training (VLP) models demonstrating powerful multimodal interaction capabilities, the application scenarios of neural networks are no longer confined to unimodal domains but have expanded to more complex multimodal V+L downstream tasks. The security vulnerabilities of unimodal models have been extensively examined, whereas those of VLP models remain challenging. We note th… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures, published in ACMMM2024(oral)

  39. arXiv:2407.13181  [pdf, other

    cs.CV

    Training-Free Large Model Priors for Multiple-in-One Image Restoration

    Authors: Xuanhua He, Lang Li, Yingying Wang, Hui Zheng, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a nov… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  40. arXiv:2407.11686  [pdf, other

    cs.CL cs.AI

    CCoE: A Compact LLM with Collaboration of Experts

    Authors: Shaomang Huang, Jianfeng Pan, Hanzhong Zheng

    Abstract: In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily couplin… ▽ More

    Submitted 24 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  41. arXiv:2407.08223  [pdf, other

    cs.CL cs.AI

    Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

    Authors: Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

    Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Specul… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Preprint

  42. arXiv:2407.07835  [pdf, other

    cs.CV cs.AI

    RoBus: A Multimodal Dataset for Controllable Road Networks and Building Layouts Generation

    Authors: Tao Li, Ruihang Li, Huangnan Zheng, Shanding Ye, Shijian Li, Zhijie Pan

    Abstract: Automated 3D city generation, focusing on road networks and building layouts, is in high demand for applications in urban design, multimedia games and autonomous driving simulations. The surge of generative AI facilitates designing city layouts based on deep learning models. However, the lack of high-quality datasets and benchmarks hinders the progress of these data-driven methods in generating ro… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  43. arXiv:2407.06153  [pdf, other

    cs.SE cs.CL

    What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

    Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

    Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

  44. arXiv:2407.04888  [pdf, other

    eess.IV cs.CV

    Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling

    Authors: Mahdi Ait Lhaj Loutfi, Teodora Boblea Podasca, Alex Zwanenburg, Taman Upadhaya, Jorge Barrios, David R. Raleigh, William C. Chen, Dante P. I. Capaldi, Hong Zheng, Olivier Gevaert, Jing Wu, Alvin C. Silva, Paul J. Zhang, Harrison X. Bai, Jan Seuntjens, Steffen Löck, Patrick O. Richard, Olivier Morin, Caroline Reinhold, Martin Lepage, Martin Vallières

    Abstract: Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  45. arXiv:2407.01146  [pdf, other

    eess.IV cs.CV

    Cross-Slice Attention and Evidential Critical Loss for Uncertainty-Aware Prostate Cancer Detection

    Authors: Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Kaifeng Pang, Demetri Terzopoulos, Kyunghyun Sung

    Abstract: Current deep learning-based models typically analyze medical images in either 2D or 3D albeit disregarding volumetric information or suffering sub-optimal performance due to the anisotropic resolution of MR data. Furthermore, providing an accurate uncertainty estimation is beneficial to clinicians, as it indicates how confident a model is about its prediction. We propose a novel 2.5D cross-slice a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  46. arXiv:2407.00942  [pdf, other

    cs.IR cs.AI cs.CL

    ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions

    Authors: Jingheng Ye, Yong Jiang, Xiaobin Wang, Yinghui Li, Yangning Li, Hai-Tao Zheng, Pengjun Xie, Fei Huang

    Abstract: This paper introduces the task of product demand clarification within an e-commercial scenario, where the user commences the conversation with ambiguous queries and the task-oriented agent is designed to achieve more accurate and tailored product searching by asking clarification questions. To address this task, we propose ProductAgent, a conversational information seeking agent equipped with abil… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 17 pages, 13 tables, 6 figures. Under review

  47. arXiv:2407.00934  [pdf, other

    cs.CL

    CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

    Authors: Jingheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su

    Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 16 pages, 8 tables, 2 figures. Under review

  48. arXiv:2407.00924  [pdf, other

    cs.CL

    EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction

    Authors: Jingheng Ye, Shang Qin, Yinghui Li, Xuxin Cheng, Libo Qin, Hai-Tao Zheng, Peng Xing, Zishan Xu, Guo Cheng, Zhao Wei

    Abstract: Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 22 pages, 10 tables, 9 figures. Under review

  49. arXiv:2407.00028  [pdf, other

    q-bio.NC cs.LG stat.AP

    Harnessing XGBoost for Robust Biomarker Selection of Obsessive-Compulsive Disorder (OCD) from Adolescent Brain Cognitive Development (ABCD) data

    Authors: Xinyu Shen, Qimin Zhang, Huili Zheng, Weiwei Qi

    Abstract: This study evaluates the performance of various supervised machine learning models in analyzing highly correlated neural signaling data from the Adolescent Brain Cognitive Development (ABCD) Study, with a focus on predicting obsessive-compulsive disorder scales. We simulated a dataset to mimic the correlation structures commonly found in imaging data and evaluated logistic regression, elastic netw… ▽ More

    Submitted 14 May, 2024; originally announced July 2024.

  50. arXiv:2406.17309  [pdf, other

    cs.CV

    Zero-Shot Long-Form Video Understanding through Screenplay

    Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

    Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge