Skip to main content

Showing 1–50 of 110 results for author: Lei, W

  1. arXiv:2410.15744  [pdf, other

    cs.CV cs.AI

    Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment

    Authors: Yankai Jiang, Wenhui Lei, Xiaofan Zhang, Shaoting Zhang

    Abstract: Recent advancements in medical vision-language pre-training models have driven significant progress in zero-shot disease recognition. However, transferring image-level knowledge to pixel-level tasks, such as lesion segmentation in 3D CT scans, remains a critical challenge. Due to the complexity and variability of pathological visual characteristics, existing methods struggle to align fine-grained… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  2. arXiv:2409.14399  [pdf, other

    cs.CL cs.AI

    Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

    Authors: Peixin Qin, Chen Huang, Yang Deng, Wenqiang Lei, Tat-Seng Chua

    Abstract: With the aid of large language models, current conversational recommender system (CRS) has gaining strong abilities to persuade users to accept recommended items. While these CRSs are highly persuasive, they can mislead users by incorporating incredible information in their explanations, ultimately damaging the long-term trust between users and the CRS. To address this, we propose a simple yet eff… ▽ More

    Submitted 7 October, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: Findings of EMNLP 2024. Our code is available at https://github.com/mumen798/PC-CRS

  3. arXiv:2409.01459  [pdf, other

    cs.CV

    3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos

    Authors: Meiyu Qiu, Yun Li, Wenjun Huang, Haoyun Zhang, Weiping Zheng, Wenbin Lei, Xiaomao Fan

    Abstract: Laryngeal cancer is a malignant disease with a high morality rate in otorhinolaryngology, posing an significant threat to human health. Traditionally larygologists manually visual-inspect laryngeal cancer in laryngoscopic videos, which is quite time-consuming and subjective. In this study, we propose a novel automatic framework via 3D-large-scale pretrained models termed 3D-LSPTM for laryngeal can… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2408.05426  [pdf, other

    cs.CV

    SAM-FNet: SAM-Guided Fusion Network for Laryngo-Pharyngeal Tumor Detection

    Authors: Jia Wei, Yun Li, Meiyu Qiu, Hongyu Chen, Xiaomao Fan, Wenbin Lei

    Abstract: Laryngo-pharyngeal cancer (LPC) is a highly fatal malignant disease affecting the head and neck region. Previous studies on endoscopic tumor detection, particularly those leveraging dual-branch network architectures, have shown significant advancements in tumor detection. These studies highlight the potential of dual-branch networks in improving diagnostic accuracy by effectively integrating globa… ▽ More

    Submitted 14 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

  5. arXiv:2408.03633  [pdf, other

    cs.CL

    CARE: A Clue-guided Assistant for CSRs to Read User Manuals

    Authors: Weihong Du, Jia Liu, Zujie Wen, Dingnan Jin, Hongru Liang, Wenqiang Lei

    Abstract: It is time-saving to build a reading assistant for customer service representations (CSRs) when reading user manuals, especially information-rich ones. Current solutions don't fit the online custom service scenarios well due to the lack of attention to user questions and possible responses. Hence, we propose to develop a time-saving and careful reading assistant for CSRs, named CARE. It can help t… ▽ More

    Submitted 26 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  6. arXiv:2408.03630  [pdf, other

    cs.CL

    PAGED: A Benchmark for Procedural Graphs Extraction from Documents

    Authors: Weihong Du, Wenrui Liao, Hongru Liang, Wenqiang Lei

    Abstract: Automatic extraction of procedural graphs from documents creates a low-cost way for users to easily understand a complex procedure by skimming visual graphs. Despite the progress in recent studies, it remains unanswered: whether the existing studies have well solved this task (Q1) and whether the emerging large language models (LLMs) can bring new opportunities to this task (Q2). To this end, we p… ▽ More

    Submitted 7 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  7. arXiv:2408.00415  [pdf, other

    cs.RO cs.AI cs.CV

    DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

    Authors: Xuemeng Yang, Licheng Wen, Yukai Ma, Jianbiao Mei, Xin Li, Tiantian Wei, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 19 pages, 9 figures

  8. arXiv:2407.08428  [pdf, other

    cs.CV cs.AI

    A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

    Authors: Wentao Lei, Jinting Wang, Fengji Ma, Guanjie Huang, Li Liu

    Abstract: Human video generation is a dynamic and rapidly evolving task that aims to synthesize 2D human body video sequences with generative models given control conditions such as text, audio, and pose. With the potential for wide-ranging applications in film, gaming, and virtual communication, the ability to generate natural and realistic human video is critical. Recent advancements in generative models… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  9. arXiv:2407.07314  [pdf, ps, other

    cs.IT

    Proactive Eavesdropping in Relay Systems via Trajectory and Power Optimization

    Authors: Qian Dan, Hongjiang Lei, Ki-Hong Park, Weijia Lei, Gaofeng Pan

    Abstract: Wireless relays can effectively extend the transmission range of information. However, if relay technology is utilized unlawfully, it can amplify potential harm. Effectively surveilling illegitimate relay links poses a challenging problem. Unmanned aerial vehicles (UAVs) can proactively surveil wireless relay systems due to their flexible mobility. This work focuses on maximizing the eavesdropping… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 8 figures, submitted to IEEE Journal for review

  10. arXiv:2406.08124  [pdf, other

    cs.CL cs.AI

    Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

    Authors: Duanyu Feng, Bowen Qin, Chen Huang, Youcheng Huang, Zheng Zhang, Wenqiang Lei

    Abstract: The success of the reward model in distinguishing between responses with subtle safety differences depends critically on the high-quality preference dataset, which should capture the fine-grained nuances of harmful and harmless responses. This motivates the need to develop a dataset involving preference margins, which accurately quantify how harmless one response is compared to another. In this pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/colfeng/Legend

  11. arXiv:2406.01931  [pdf, other

    cs.CL

    Dishonesty in Helpful and Harmless Alignment

    Authors: Youcheng Huang, Jingkun Tang, Duanyu Feng, Zheng Zhang, Wenqiang Lei, Jiancheng Lv, Anthony G. Cohn

    Abstract: People tell lies when seeking rewards. Large language models (LLMs) are aligned to human values with reinforcement learning where they get rewards if they satisfy human preference. We find that this also induces dishonesty in helpful and harmless alignment where LLMs tell lies in generating harmless responses. Using the latest interpreting tools, we detect dishonesty, show how LLMs can be harmful… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  12. arXiv:2406.01601  [pdf, other

    cs.DC cs.AI cs.LG

    Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration

    Authors: Wei Ji, Li Li, Zheqi Lv, Wenqiao Zhang, Mengze Li, Zhen Wan, Wenqiang Lei, Roger Zimmermann

    Abstract: In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing artificial intelligence (AI) systems primarily rooted in the cloud. As these systems grapple with shifting data distribu… ▽ More

    Submitted 17 August, 2024; v1 submitted 21 May, 2024; originally announced June 2024.

  13. arXiv:2405.12081  [pdf, other

    cs.CL

    Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model

    Authors: Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan

    Abstract: To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data),… ▽ More

    Submitted 22 September, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Findings of EMNLP 2024

  14. arXiv:2405.12063  [pdf, other

    cs.CL

    CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

    Authors: Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua

    Abstract: Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024. Camera Ready. Our dataset is available at https://github.com/zt991211/CLAMBER

  15. arXiv:2405.12059  [pdf, other

    cs.CL

    STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents

    Authors: Yue Chen, Chen Huang, Yang Deng, Wenqiang Lei, Dingnan Jin, Jia Liu, Tat-Seng Chua

    Abstract: Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still s… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to Findings of ACL 2024. Camera Ready

  16. arXiv:2405.11912  [pdf, other

    cs.CL cs.HC

    ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation

    Authors: Chen Huang, Yiping Jin, Ilija Ilievski, Wenqiang Lei, Jiancheng Lv

    Abstract: Human annotation is a time-consuming task that requires a significant amount of effort. To address this issue, interactive data annotation utilizes an annotation model to provide suggestions for humans to approve or correct. However, annotation models trained with limited labeled data are prone to generating incorrect suggestions, leading to extra human correction effort. To tackle this challenge,… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024. Camera Ready

  17. arXiv:2405.10248  [pdf, other

    cs.HC cs.IR

    Co-Matching: Towards Human-Machine Collaborative Legal Case Matching

    Authors: Chen Huang, Xinwei Yang, Yang Deng, Wenqiang Lei, JianCheng Lv, Tat-Seng Chua

    Abstract: Recent efforts have aimed to improve AI machines in legal case matching by integrating legal domain knowledge. However, successful legal case matching requires the tacit knowledge of legal practitioners, which is difficult to verbalize and encode into machines. This emphasizes the crucial role of involving legal practitioners in high-stakes legal case matching. To address this, we propose a collab… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Draft V1: 23 pages, 7 figures

  18. arXiv:2404.19277  [pdf, other

    cs.CV

    Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model

    Authors: Wentao Lei, Li Liu, Jun Wang

    Abstract: Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings, enabling people with hearing impairments to communicate efficiently. CS video generation aims to produce specific lip and gesture movements of CS from audio or text inputs. The main challenge is that given limited CS data, we strive to simultaneously generate fine-grained hand and finger… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Journal ref: IJCAI 2024

  19. arXiv:2404.04626  [pdf, ps, other

    cs.CL cs.AI

    Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective

    Authors: Duanyu Feng, Bowen Qin, Chen Huang, Zheng Zhang, Wenqiang Lei

    Abstract: Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across various tasks, DPO has been criticized for its sensitivity to the SFT's effectiveness and its hindrance to the learning capacity towards human-preferred responses, le… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Draft version

  20. arXiv:2404.03304  [pdf, other

    cs.CL cs.AI

    Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors

    Authors: Chen Huang, Peixin Qin, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

    Abstract: The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protoco… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: 33 pages, 18 tables, and 10 figures. Our code is available at https://github.com/huangzichun/Concept4CRS

  21. arXiv:2403.17770  [pdf, other

    eess.IV cs.CV

    CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

    Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

    Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  22. arXiv:2403.06769  [pdf, other

    cs.CL

    Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation

    Authors: Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua

    Abstract: We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives. This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training stra… ▽ More

    Submitted 22 September, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by EMNLP 2024 (Main)

  23. arXiv:2402.01868  [pdf, other

    cs.LG math.OC stat.ML

    Challenges in Training PINNs: A Loss Landscape Perspective

    Authors: Pratik Rathore, Weimu Lei, Zachary Frangella, Lu Lu, Madeleine Udell

    Abstract: This paper explores challenges in training Physics-Informed Neural Networks (PINNs), emphasizing the role of the loss landscape in the training process. We examine difficulties in minimizing the PINN loss function, particularly due to ill-conditioning caused by differential operators in the residual term. We compare gradient-based optimizers Adam, L-BFGS, and their combination Adam+L-BFGS, showing… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: ICML 2024 Oral; 33 pages (including appendices), 10 figures, 3 tables

  24. arXiv:2402.01246  [pdf, other

    cs.RO eess.SY

    LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

    Authors: Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao

    Abstract: The emergence of Multimodal Large Language Models ((M)LLMs) has ushered in new avenues in artificial intelligence, particularly for autonomous driving by offering enhanced understanding and reasoning capabilities. This paper introduces LimSim++, an extended version of LimSim designed for the application of (M)LLMs in autonomous driving. Acknowledging the limitations of existing simulation platform… ▽ More

    Submitted 12 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted by 35th IEEE Intelligent Vehicles Symposium (IV 2024)

  25. arXiv:2401.14876  [pdf, other

    cs.LG cs.AI

    Cross-Space Adaptive Filter: Integrating Graph Topology and Node Attributes for Alleviating the Over-smoothing Problem

    Authors: Chen Huang, Haoyang Li, Yifan Zhang, Wenqiang Lei, Jiancheng Lv

    Abstract: The vanilla Graph Convolutional Network (GCN) uses a low-pass filter to extract low-frequency signals from graph topology, which may lead to the over-smoothing problem when GCN goes deep. To this end, various methods have been proposed to create an adaptive filter by incorporating an extra filter (e.g., a high-pass filter) extracted from the graph topology. However, these methods heavily rely on t… ▽ More

    Submitted 10 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted to WWW 2024. V2: update the results on GCN-BC based on our rebuttal on OpenReview. Our code is available at https://github.com/huangzichun/Cross-Space-Adaptive-Filter

  26. arXiv:2401.12540  [pdf, other

    cs.IR cs.CL

    DREditor: An Time-efficient Approach for Building a Domain-specific Dense Retrieval Model

    Authors: Chen Huang, Duanyu Feng, Wenqiang Lei, Jiancheng Lv

    Abstract: Deploying dense retrieval models efficiently is becoming increasingly important across various industries. This is especially true for enterprise search services, where customizing search engines to meet the time demands of different enterprises in different domains is crucial. Motivated by this, we develop a time-efficient approach called DREditor to edit the matching rule of an off-the-shelf den… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 6 figures, Codes are available at https://github.com/huangzichun/DREditor

  27. arXiv:2401.07544  [pdf, other

    cs.CL

    See the Unseen: Better Context-Consistent Knowledge-Editing by Noises

    Authors: Youcheng Huang, Wenqiang Lei, Zheng Zhang, Jiancheng Lv, Shuicheng Yan

    Abstract: Knowledge-editing updates knowledge of large language models (LLMs) and contributes to the interpretability and application of LLMs. However, knowledge applying is context-consistent: LLMs can recall the same knowledge in different contexts. Existing works ignore this property and the editing lacks generalization. In this paper, we empirically find that the effects of different contexts upon LLMs… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

  28. arXiv:2312.10053  [pdf, other

    cs.CY cs.AI cs.IR

    Towards Goal-oriented Intelligent Tutoring Systems in Online Education

    Authors: Yang Deng, Zifeng Ren, An Zhang, Wenqiang Lei, Tat-Seng Chua

    Abstract: Interactive Intelligent Tutoring Systems (ITSs) enhance traditional ITSs by promoting effective learning through interactions and problem resolution in online education. Yet, proactive engagement, prioritizing resource optimization with planning and assessment capabilities, is often overlooked in current ITS designs. In this work, we investigate a new task, named Goal-oriented Intelligent Tutoring… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  29. arXiv:2312.07280  [pdf, other

    cs.CL

    Towards Equipping Transformer with the Ability of Systematic Compositionality

    Authors: Chen Huang, Peixin Qin, Wenqiang Lei, Jiancheng Lv

    Abstract: One of the key factors in language productivity and human cognition is the ability of systematic compositionality, which refers to understanding composed unseen examples of seen primitives. However, recent evidence reveals that the Transformers have difficulty generalizing the composed context based on the seen primitives. To this end, we take the first step to propose a compositionality-aware Tra… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024. Paper with appendix

  30. arXiv:2311.16081  [pdf, other

    cs.CV cs.AI

    ViT-Lens: Towards Omni-modal Representations

    Authors: Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou

    Abstract: Aiming to advance AI agents, large foundation models significantly improve reasoning and instruction execution, yet the current focus on vision and language neglects the potential of perceiving diverse modalities in open-world environments. However, the success of data-driven vision and language models is costly or even infeasible to be reproduced for rare modalities. In this paper, we present ViT… ▽ More

    Submitted 26 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: This work is a follow-up of arXiv:2308.10185. Accepted to CVPR2024

  31. arXiv:2311.14931  [pdf, other

    cs.LG

    One-Shot Transfer Learning for Nonlinear ODEs

    Authors: Wanzhou Lei, Pavlos Protopapas, Joy Parikh

    Abstract: We introduce a generalizable approach that combines perturbation method and one-shot transfer learning to solve nonlinear ODEs with a single polynomial term, using Physics-Informed Neural Networks (PINNs). Our method transforms non-linear ODEs into linear ODE systems, trains a PINN across varied conditions, and offers a closed-form solution for new instances within the same non-linear ODE class. W… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: 7 pages, 3 figures, accepted to 2023 NeurIPS Workshop of The Symbiosis of Deep Learning and Differential Equations

    MSC Class: 68T07 ACM Class: I.2.1

  32. arXiv:2311.13307  [pdf, other

    cs.CV cs.CL cs.MM

    Rethinking Radiology Report Generation via Causal Inspired Counterfactual Augmentation

    Authors: Xiao Song, Jiafan Liu, Yun Li, Yan Liu, Wenbin Lei, Ruxin Wang

    Abstract: Radiology Report Generation (RRG) draws attention as a vision-and-language interaction of biomedical fields. Previous works inherited the ideology of traditional language generation tasks, aiming to generate paragraphs with high readability as reports. Despite significant progress, the independence between diseases-a specific property of RRG-was neglected, yielding the models being confused by the… ▽ More

    Submitted 30 July, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 10 pages,5 figures

  33. BioImage.IO Chatbot: A Community-Driven AI Assistant for Integrative Computational Bioimaging

    Authors: Wanlu Lei, Caterina Fuster-Barceló, Gabriel Reder, Arrate Muñoz-Barrutia, Wei Ouyang

    Abstract: We present the BioImage$.$IO Chatbot, an AI assistant powered by Large Language Models and supported by a community-driven knowledge base and toolset. This chatbot is designed to cater to a wide range of user needs through a flexible extension mechanism that spans from information retrieval to AI-enhanced analysis and microscopy control. Embracing open-source principles, the chatbot is designed to… ▽ More

    Submitted 16 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 15 pages, 2 figures

  34. arXiv:2310.16517  [pdf, other

    cs.CL

    OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models

    Authors: Mingfeng Xue, Dayiheng Liu, Kexin Yang, Guanting Dong, Wenqiang Lei, Zheng Yuan, Chang Zhou, Jingren Zhou

    Abstract: The emergence of large language models (LLMs) has revolutionized natural language processing tasks. However, existing instruction-tuning datasets suffer from occupational bias: the majority of data relates to only a few occupations, which hampers the instruction-tuned LLMs to generate helpful responses to professional queries from practitioners in specific fields. To mitigate this issue and promot… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  35. arXiv:2308.10185  [pdf, other

    cs.CV

    ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights

    Authors: Weixian Lei, Yixiao Ge, Jianfeng Zhang, Dylan Sun, Kun Yi, Ying Shan, Mike Zheng Shou

    Abstract: Though the success of CLIP-based training recipes in vision-language models, their scalability to more modalities (e.g., 3D, audio, etc.) is limited to large-scale data, which is expensive or even inapplicable for rare modalities. In this paper, we present ViT-Lens that facilitates efficient omni-modal representation learning by perceiving novel modalities with a pretrained ViT and aligning to a p… ▽ More

    Submitted 26 March, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: 19 pages, 4 figures and 9 tables

  36. arXiv:2308.08849  [pdf, other

    cs.CV

    A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

    Authors: Li Liu, Lufei Gao, Wentao Lei, Fengji Ma, Xiaotian Lin, Jinting Wang

    Abstract: Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-m… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  37. arXiv:2308.04805  [pdf, other

    cs.IR cs.SD eess.AS

    DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music

    Authors: Hongru Liang, Jingyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei

    Abstract: Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, published to ACM MM 2023

  38. arXiv:2307.00257  [pdf, other

    cs.CV cs.AI

    Efficient Subclass Segmentation in Medical Images

    Authors: Linrui Dai, Wenhui Lei, Xiaofan Zhang

    Abstract: As research interests in medical image analysis become increasingly fine-grained, the cost for extensive annotation also rises. One feasible way to reduce the cost is to annotate with coarse-grained superclass labels while using limited fine-grained annotations as a complement. In this way, fine-grained data learning is assisted by ample coarse annotations. Recent studies in classification tasks h… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023 early accept

  39. arXiv:2306.14752  [pdf, other

    cs.CV

    MedLSAM: Localize and Segment Anything Model for 3D CT Images

    Authors: Wenhui Lei, Xu Wei, Xiaofan Zhang, Kang Li, Shaoting Zhang

    Abstract: Recent advancements in foundation models have shown significant potential in medical image analysis. However, there is still a gap in models specifically designed for medical image localization. To address this, we introduce MedLAM, a 3D medical foundation localization model that accurately identifies any anatomical part within the body using only a few template scans. MedLAM employs two self-supe… ▽ More

    Submitted 9 October, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: MIA 2024. Code is public at https://github.com/openmedlab/MedLSAM

  40. arXiv:2306.04487  [pdf, other

    cs.IR

    Vague Preference Policy Learning for Conversational Recommendation

    Authors: Gangyi Zhang, Chongming Gao, Wenqiang Lei, Xiaojie Guo, Shijun Li, Hongshen Chen, Zhuozhi Ding, Sulong Xu, Lingfei Wu

    Abstract: Conversational recommendation systems (CRS) commonly assume users have clear preferences, leading to potential over-filtering of relevant alternatives. However, users often exhibit vague, non-binary preferences. We introduce the Vague Preference Multi-round Conversational Recommendation (VPMCR) scenario, employing a soft estimation mechanism to accommodate users' vague and dynamic preferences whil… ▽ More

    Submitted 1 September, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

  41. Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals

    Authors: Hongru Liang, Jia Liu, Weihong Du, Dingnan Jin, Wenqiang Lei, Zujie Wen, Jiancheng Lv

    Abstract: The machine reading comprehension (MRC) of user manuals has huge potential in customer service. However, current methods have trouble answering complex questions. Therefore, we introduce the Knowing-how & Knowing-that task that requires the model to answer factoid-style, procedure-style, and inconsistent questions about user manuals. We resolve this task by jointly representing the steps and facts… ▽ More

    Submitted 8 August, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023. (2023)

  42. arXiv:2305.20087  [pdf, other

    cs.CV

    Too Large; Data Reduction for Vision-Language Pre-Training

    Authors: Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou

    Abstract: This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major s… ▽ More

    Submitted 18 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: ICCV2023. Code: https://github.com/showlab/datacentric.vlp

  43. arXiv:2305.13626  [pdf, other

    cs.CL

    Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration

    Authors: Yang Deng, Lizi Liao, Liang Chen, Hongru Wang, Wenqiang Lei, Tat-Seng Chua

    Abstract: Conversational systems based on Large Language Models (LLMs), such as ChatGPT, show exceptional proficiency in context understanding and response generation. However, despite their impressive capabilities, they still possess limitations, such as providing randomly-guessed answers to ambiguous queries or failing to refuse users' requests, both of which are considered aspects of a conversational age… ▽ More

    Submitted 14 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by EMNLP 2023 Findings

  44. arXiv:2305.02750  [pdf, other

    cs.CL cs.AI

    A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects

    Authors: Yang Deng, Wenqiang Lei, Wai Lam, Tat-Seng Chua

    Abstract: Proactive dialogue systems, related to a wide range of real-world conversational applications, equip the conversational agent with the capability of leading the conversation direction towards achieving pre-defined targets or fulfilling certain goals from the system side. It is empowered by advanced techniques to progress to more complicated tasks that require strategical and motivational interacti… ▽ More

    Submitted 9 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCAI 2023 Survey Track

  45. arXiv:2303.01707  [pdf, other

    eess.IV cs.CV

    Spatio-Temporal Structure Consistency for Semi-supervised Medical Image Classification

    Authors: Wentao Lei, Lei Liu, Li Liu

    Abstract: Intelligent medical diagnosis has shown remarkable progress based on the large-scale datasets with precise annotations. However, fewer labeled images are available due to significantly expensive cost for annotating data by experts. To fully exploit the easily available unlabeled data, we propose a novel Spatio-Temporal Structure Consistent (STSC) learning framework. Specifically, a gram matrix is… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  46. arXiv:2302.09786  [pdf

    physics.app-ph cs.HC

    Ultra-conformable Liquid Metal Particle Monolayer on Air/water Interface for Substrate-free E-tattoo

    Authors: Fali Li, Wenjuan Lei, Yuwei Wang, Xingjian Lu, Shengbin Li, Feng Xu, Zidong He, Jinyun Liu, Huali Yang, Yuanzhao Wu, Jie Shang, Yiwei Liu, Run-Wei Li

    Abstract: Gallium-based liquid metal is getting increased attention in conformal flexible electronics for its high electrical conductivity, intrinsic deformability and biocompatibility. A series of flexible devices are developed based on the micro-particles of liquid metal. But it is still challenging to fabricate conformal liquid metal film with a large area and high uniformity. Interfacial self-assembly i… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  47. arXiv:2302.08975  [pdf, other

    cs.CL

    Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors

    Authors: Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie

    Abstract: Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  48. arXiv:2302.04088  [pdf, other

    cs.LG cs.IR

    FFHR: Fully and Flexible Hyperbolic Representation for Knowledge Graph Completion

    Authors: Wentao Shi, Junkang Wu, Xuezhi Cao, Jiawei Chen, Wenqiang Lei, Wei Wu, Xiangnan He

    Abstract: Learning hyperbolic embeddings for knowledge graph (KG) has gained increasing attention due to its superiority in capturing hierarchies. However, some important operations in hyperbolic space still lack good definitions, making existing methods unable to fully leverage the merits of hyperbolic space. Specifically, they suffer from two main limitations: 1) existing Graph Convolutional Network (GCN)… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: submit to TKDE

  49. arXiv:2212.11565  [pdf, other

    cs.CV

    Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

    Authors: Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou

    Abstract: To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work, we propose a new T2V generation setting$\unicode{x2014}$One-Shot Video Tuning, where only one text-video pair is presented. Our model is built on state-of-the-a… ▽ More

    Submitted 17 March, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Preprint

  50. arXiv:2211.15470  [pdf, other

    cs.CV

    Learning to Learn: How to Continuously Teach Humans and Machines

    Authors: Parantak Singh, You Li, Ankur Sikarwar, Weixian Lei, Daniel Gao, Morgan Bruce Talbot, Ying Sun, Mike Zheng Shou, Gabriel Kreiman, Mengmi Zhang

    Abstract: Curriculum design is a fundamental component of education. For example, when we learn mathematics at school, we build upon our knowledge of addition to learn multiplication. These and other concepts must be mastered before our first algebra lesson, which also reinforces our addition and multiplication skills. Designing a curriculum for teaching either a human or a machine shares the underlying goa… ▽ More

    Submitted 17 August, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: International Conference on Computer Vision (ICCV), 2023