Skip to main content

Showing 1–50 of 707 results for author: Chen, R

  1. arXiv:2410.15334  [pdf, other

    cs.CV

    Modality-Fair Preference Optimization for Trustworthy MLLM Alignment

    Authors: Songtao Jiang, Yan Zhang, Ruizhe Chen, Yeying Jin, Zuozhu Liu

    Abstract: Direct Preference Optimization (DPO) is effective for aligning large language models (LLMs), but when applied to multimodal models (MLLMs), it often favors text over image information, leading to unreliable outputs and visual hallucinations. To address this, we propose Modality-Fair Preference Optimization (MFPO) to balance text and image preferences. First, we found that the lack of image-related… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  2. arXiv:2410.14231  [pdf, other

    cs.CL

    Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework

    Authors: Zhen Tao, Zhiyu Li, Runyu Chen, Dinghao Xi, Wei Xu

    Abstract: Large language models (LLMs) have transformed human writing by enhancing grammar correction, content expansion, and stylistic refinement. However, their widespread use raises concerns about authorship, originality, and ethics, even potentially threatening scholarly integrity. Existing detection methods, which mainly rely on single-feature analysis and binary classification, often fail to effective… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  3. arXiv:2410.14161  [pdf, other

    cs.CV

    Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping

    Authors: Renguang Chen, Guolong Zheng, Xu Yang, Zhide Chen, Jiwu Shu, Wencheng Yang, Kexin Zhu, Chen Feng

    Abstract: The growing popularity of online sports and exercise necessitates effective methods for evaluating the quality of online exercise executions. Previous action quality assessment methods, which relied on labeled scores from motion videos, exhibited slightly lower accuracy and discriminability. This limitation hindered their rapid application to newly added exercises. To address this problem, this pa… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  4. arXiv:2410.13808  [pdf, other

    cs.CL

    De-mark: Watermark Removal in Large Language Models

    Authors: Ruibo Chen, Yihan Wu, Junfeng Guo, Heng Huang

    Abstract: Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models (LMs). However, the robustness of the watermarking schemes has not been well explored. In this paper, we present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel queryi… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  5. arXiv:2410.13805  [pdf, other

    cs.CL

    A Watermark for Order-Agnostic Language Models

    Authors: Ruibo Chen, Yihan Wu, Yanshuo Chen, Chenxi Liu, Junfeng Guo, Heng Huang

    Abstract: Statistical watermarking techniques are well-established for sequentially decoded language models (LMs). However, these techniques cannot be directly applied to order-agnostic LMs, as the tokens in order-agnostic LMs are not generated sequentially. In this work, we introduce Pattern-mark, a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  6. arXiv:2410.13242  [pdf

    cs.CV

    Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model

    Authors: Weiyi Zhang, Jiancheng Yang, Ruoyu Chen, Siyu Huang, Pusheng Xu, Xiaolan Chen, Shanfu Lu, Hongyu Cao, Mingguang He, Danli Shi

    Abstract: Fundus fluorescein angiography (FFA) is crucial for diagnosing and monitoring retinal vascular issues but is limited by its invasive nature and restricted accessibility compared to color fundus (CF) imaging. Existing methods that convert CF images to FFA are confined to static image generation, missing the dynamic lesional changes. We introduce Fundus2Video, an autoregressive generative adversaria… ▽ More

    Submitted 18 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

  7. arXiv:2410.09854  [pdf, other

    cs.SE

    A Model Is Not Built By A Single Prompt: LLM-Based Domain Modeling With Question Decomposition

    Authors: Ru Chen, Jingwei Shen, Xiao He

    Abstract: Domain modeling, a crucial part of model-driven engineering, demands extensive domain knowledge and experience from engineers. When the system description is highly complicated, the modeling task can become particularly challenging and time-consuming. Large language Models(LLMs) can assist by automatically generating an initial object model from the system description. Although LLMs have demonstra… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  8. arXiv:2410.09804  [pdf, other

    cs.CR cs.AI cs.CL cs.LG cs.NE

    BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models

    Authors: Xinyuan Wang, Victor Shea-Jay Huang, Renmiao Chen, Hao Wang, Chengwei Pan, Lei Sha, Minlie Huang

    Abstract: While large language models (LLMs) exhibit remarkable capabilities across various tasks, they encounter potential security risks such as jailbreak attacks, which exploit vulnerabilities to bypass security measures and generate harmful outputs. Existing jailbreak strategies mainly focus on maximizing attack success rate (ASR), frequently neglecting other critical factors, including the relevance of… ▽ More

    Submitted 18 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

  9. arXiv:2410.09403  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.MA

    Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation

    Authors: Haoyang Su, Renqi Chen, Shixiang Tang, Xinzhe Zheng, Jingzhe Li, Zhenfei Yin, Wanli Ouyang, Nanqing Dong

    Abstract: The rapid advancement of scientific progress requires innovative tools that can accelerate discovery. While recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short in replicating the collaborative nature of real-world scientific practices, where diverse teams of experts work together to tackle… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  10. arXiv:2410.08260  [pdf, other

    cs.CV cs.AI

    Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

    Authors: Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang

    Abstract: As visual generation technologies continue to advance, the scale of video datasets has expanded rapidly, and the quality of these datasets is critical to the performance of video generation models. We argue that temporal splitting, detailed captions, and video quality filtering are three key factors that determine dataset quality. However, existing datasets exhibit various limitations in these are… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Project page: https://koala36m.github.io/

  11. Enhancing Soccer Camera Calibration Through Keypoint Exploitation

    Authors: Nikolay S. Falaleev, Ruilong Chen

    Abstract: Accurate camera calibration is essential for transforming 2D images from camera sensors into 3D world coordinates, enabling precise scene geometry interpretation and supporting sports analytics tasks such as player tracking, offside detection, and performance analysis. However, obtaining a sufficient number of high-quality point pairs remains a significant challenge for both traditional and deep l… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 7th ACM International Workshop on Multimedia Content Analysis in Sports

    Journal ref: In Proceedings of the 7th ACM International Workshop on Multimedia Content Analysis in Sports (MMSports '24). Association for Computing Machinery, New York, NY, USA (2024) 65-73

  12. arXiv:2410.06982  [pdf, other

    cs.CV

    Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation

    Authors: Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu, Yupeng Jia, Juan Wang, Xuepeng Ma

    Abstract: Monocular depth estimation, enabled by self-supervised learning, is a key technique for 3D perception in computer vision. However, it faces significant challenges in real-world scenarios, which encompass adverse weather variations, motion blur, as well as scenes with poor lighting conditions at night. Our research reveals that we can divide monocular depth estimation into three sub-problems: depth… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: To be published in Asian Conference on Computer Vision 2024

  13. arXiv:2410.04422  [pdf, other

    cs.CL

    Hyper-multi-step: The Truth Behind Difficult Long-context Tasks

    Authors: Yijiong Yu, Ma Xiufa, Fang Jianwei, Zhi Xu, Su Guangyao, Wang Jiancheng, Yongfeng Huang, Zhixiao Qi, Wei Wang, Weifeng Liu, Ran Chen, Ji Pei

    Abstract: Long-context language models (LCLM), characterized by their extensive context window, is becoming increasingly popular. Meanwhile, many long-context benchmarks present challenging tasks that even the most advanced LCLMs struggle to complete. However, the underlying sources of various challenging long-context tasks have seldom been studied. To bridge this gap, we conduct experiments to indicate the… ▽ More

    Submitted 18 October, 2024; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: Our code is publicly available at https://github.com/yuyijiong/hard_retrieval_for_llm and the datasets is at https://huggingface.co/datasets/yuyijiong/difficult_retrieval

  14. arXiv:2410.04070  [pdf, other

    cs.CL cs.AI

    PAD: Personalized Alignment at Decoding-Time

    Authors: Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, Zuozhu Liu

    Abstract: Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In response, this paper presents Personalized Alignment at Decoding-time (PAD), a novel framework designed to align LLM outputs with diverse personalized preferences… ▽ More

    Submitted 16 October, 2024; v1 submitted 5 October, 2024; originally announced October 2024.

    Comments: This paper presents Personalized Alignment at Decoding-time (PAD), a novel framework designed to align LLM outputs with diverse personalized preferences during the inference phase

  15. arXiv:2410.03509  [pdf, other

    cs.RO

    GAP-RL: Grasps As Points for RL Towards Dynamic Object Grasping

    Authors: Pengwei Xie, Siang Chen, Qianrun Chen, Wei Tang, Dingchang Hu, Yixiang Dai, Rui Chen, Guijin Wang

    Abstract: Dynamic grasping of moving objects in complex, continuous motion scenarios remains challenging. Reinforcement Learning (RL) has been applied in various robotic manipulation tasks, benefiting from its closed-loop property. However, existing RL-based methods do not fully explore the potential for enhancing visual representations. In this letter, we propose a novel framework called Grasps As Points f… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted by RA-L for further publication, may be unavailable or updated in the future

  16. arXiv:2410.01212  [pdf, other

    cs.LG

    Absolute State-wise Constrained Policy Optimization: High-Probability State-wise Constraints Satisfaction

    Authors: Weiye Zhao, Feihan Li, Yifan Sun, Yujie Wang, Rui Chen, Tianhao Wei, Changliu Liu

    Abstract: Enforcing state-wise safety constraints is critical for the application of reinforcement learning (RL) in real-world problems, such as autonomous driving and robot manipulation. However, existing safe RL methods only enforce state-wise constraints in expectation or enforce hard state-wise constraints with strong assumptions. The former does not exclude the probability of safety violations, while t… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: submission to Journal of Machine Learning Research

  17. arXiv:2410.00425  [pdf, other

    cs.RO cs.AI

    ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

    Authors: Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse-kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, Hao Su

    Abstract: Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generali… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: Project website: http://maniskill.ai/

  18. arXiv:2410.00316  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control

    Authors: Haozhe Chen, Run Chen, Julia Hirschberg

    Abstract: While recent advances in Text-to-Speech (TTS) technology produce natural and expressive speech, they lack the option for users to select emotion and control intensity. We propose EmoKnob, a framework that allows fine-grained emotion control in speech synthesis with few-shot demonstrative samples of arbitrary emotion. Our framework leverages the expressive speaker representation space made possible… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: EMNLP 2024 Main

  19. arXiv:2409.18405  [pdf, other

    cs.RO

    Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots

    Authors: Ruo Chen, David Blow, Adnan Abdullah, Md Jahidul Islam

    Abstract: This paper explores the design and development of a language-based interface for dynamic mission programming of autonomous underwater vehicles (AUVs). The proposed 'Word2Wave' (W2W) framework enables interactive programming and parameter configuration of AUVs for remote subsea missions. The W2W framework includes: (i) a set of novel language rules and command structures for efficient language-to-m… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  20. arXiv:2409.14709  [pdf, other

    eess.AS cs.SD

    Video-to-Audio Generation with Fine-grained Temporal Semantics

    Authors: Yuchen Hu, Yu Gu, Chenxing Li, Rilin Chen, Dong Yu

    Abstract: With recent advances of AIGC, video generation have gained a surge of research interest in both academia and industry (e.g., Sora). However, it remains a challenge to produce temporally aligned audio to synchronize the generated video, considering the complicated semantic information included in the latter. In this work, inspired by the recent success of text-to-audio (TTA) generation, we first in… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  21. arXiv:2409.11414  [pdf, other

    cs.AR cs.AI cs.SE

    RTLRewriter: Methodologies for Large Models aided RTL Code Optimization

    Authors: Xufeng Yao, Yiwen Wang, Xing Li, Yingzhao Lian, Ran Chen, Lei Chen, Mingxuan Yuan, Hong Xu, Bei Yu

    Abstract: Register Transfer Level (RTL) code optimization is crucial for enhancing the efficiency and performance of digital circuits during early synthesis stages. Currently, optimization relies heavily on manual efforts by skilled engineers, often requiring multiple iterations based on synthesis feedback. In contrast, existing compiler-based methods fall short in addressing complex designs. This paper int… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: ICCAD2024

  22. arXiv:2409.11022  [pdf, other

    cs.CL cs.AI

    GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models

    Authors: Hanjun Luo, Yingbin Jin, Xuecheng Liu, Tong Shang, Ruizhe Chen, Zuozhu Liu

    Abstract: Large Language Models (LLMs) have supplanted traditional methods in numerous natural language processing tasks. Nonetheless, in Named Entity Recognition (NER), existing LLM-based methods underperform compared to baselines and require significantly more computational resources, limiting their application. In this paper, we introduce the task of generation-based extraction and in-context classificat… ▽ More

    Submitted 25 September, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

  23. arXiv:2409.10064  [pdf, other

    cs.CL cs.AI cs.HC

    MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM

    Authors: Sijie Ji, Xinzhe Zheng, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava

    Abstract: Mental health disorders are among the most prevalent diseases worldwide, affecting nearly one in four people. Despite their widespread impact, the intervention rate remains below 25%, largely due to the significant cooperation required from patients for both diagnosis and intervention. The core issue behind this low treatment rate is stigma, which discourages over half of those affected from seeki… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  24. arXiv:2409.09975  [pdf, other

    cs.RO

    Constrained Bandwidth Observation Sharing for Multi-Robot Navigation in Dynamic Environments via Intelligent Knapsack

    Authors: Anirudh Chari, Rui Chen, Changliu Liu

    Abstract: Multi-robot navigation is increasingly crucial in various domains, including disaster response, autonomous vehicles, and warehouse and manufacturing automation. Robot teams often must operate in highly dynamic environments and under strict bandwidth constraints imposed by communication infrastructure, rendering effective observation sharing within the system a challenging problem. This paper prese… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  25. arXiv:2409.09882  [pdf, other

    eess.SY cs.RO

    Safe Control of Quadruped in Varying Dynamics via Safety Index Adaptation

    Authors: Kai S. Yun, Rui Chen, Chase Dunaway, John M. Dolan, Changliu Liu

    Abstract: Varying dynamics pose a fundamental difficulty when deploying safe control laws in the real world. Safety Index Synthesis (SIS) deeply relies on the system dynamics and once the dynamics change, the previously synthesized safety index becomes invalid. In this work, we show the real-time efficacy of Safety Index Adaptation (SIA) in varying dynamics. SIA enables real-time adaptation to the changing… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  26. arXiv:2409.09271  [pdf, other

    cs.SE cs.PL

    Python Symbolic Execution with LLM-powered Code Generation

    Authors: Wenhan Wang, Kaibo Liu, An Ran Chen, Ge Li, Zhi Jin, Gang Huang, Lei Ma

    Abstract: Symbolic execution is a key technology in software testing, which generates test cases by collecting symbolic path constraints and then solving constraints with SMT solvers. Symbolic execution has been proven helpful in generating high-coverage test cases, but its limitations, e.g., the difficulties in solving path constraints, prevent it from broader usage in software testing. Moreover, symbolic… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  27. arXiv:2409.08861  [pdf, other

    cs.LG math.OC stat.ML

    Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

    Authors: Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, Ricky T. Q. Chen

    Abstract: Dynamical generative models that produce samples through an iterative process, such as Flow Matching and denoising diffusion models, have seen widespread use, but there have not been many theoretically-sound methods for improving these models with reward fine-tuning. In this work, we cast reward fine-tuning as stochastic optimal control (SOC). Critically, we prove that a very specific memoryless n… ▽ More

    Submitted 16 October, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

  28. arXiv:2409.08750  [pdf, other

    cs.RO

    DexSim2Real$^{2}$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation

    Authors: Taoran Jiang, Liqian Ma, Yixuan Guan, Jiaojiao Meng, Weihang Chen, Zecui Zeng, Lusong Li, Dan Wu, Jing Xu, Rui Chen

    Abstract: Articulated object manipulation is ubiquitous in daily life. In this paper, we present DexSim2Real$^{2}$, a novel robot learning framework for goal-conditioned articulated object manipulation using both two-finger grippers and multi-finger dexterous hands. The key of our framework is constructing an explicit world model of unseen articulated objects through active one-step interactions. This expli… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Project Webpage: https://jiangtaoran.github.io/dexsim2real2_website/. arXiv admin note: text overlap with arXiv:2302.10693

  29. arXiv:2409.08601  [pdf, other

    cs.SD cs.MM eess.AS

    STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

    Authors: Yong Ren, Chenxing Li, Manjie Xu, Wei Liang, Yu Gu, Rilin Chen, Dong Yu

    Abstract: Visual and auditory perception are two crucial ways humans experience the world. Text-to-video generation has made remarkable progress over the past year, but the absence of harmonious audio in generated video limits its broader applications. In this paper, we propose Semantic and Temporal Aligned Video-to-Audio (STA-V2A), an approach that enhances audio generation from videos by extracting both l… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP2025

  30. arXiv:2409.08562  [pdf, other

    cs.CV

    CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting

    Authors: Runze Chen, Mingyu Xiao, Haiyong Luo, Fang Zhao, Fan Wu, Hao Xiong, Qi Liu, Meng Song

    Abstract: We introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. The dream of reconstructing historically significant but inaccessible scenes from collections of photographs has long captivated researchers. However, traditional 3D techniques struggle with missing camera poses… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  31. arXiv:2409.07642  [pdf, other

    cs.LG eess.SY

    Deep Learning of Dynamic Systems using System Identification Toolbox(TM)

    Authors: Tianyu Dai, Khaled Aljanaideh, Rong Chen, Rajiv Singh, Alec Stothert, Lennart Ljung

    Abstract: MATLAB(R) releases over the last 3 years have witnessed a continuing growth in the dynamic modeling capabilities offered by the System Identification Toolbox(TM). The emphasis has been on integrating deep learning architectures and training techniques that facilitate the use of deep neural networks as building blocks of nonlinear models. The toolbox offers neural state-space models which can be ex… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Journal ref: IFAC-PapersOnLine, July 2024, 20th IFAC Symposium on System Identification SYSID 2024

  32. arXiv:2409.07556  [pdf, other

    eess.AS cs.SD

    SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

    Authors: Helin Wang, Meng Yu, Jiarui Hai, Chen Chen, Yuchen Hu, Rilin Chen, Najim Dehak, Dong Yu

    Abstract: In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for stable, safe, and robust zero-shot text-based speech editing and text-to-speech synthesis. SSR-Speech is built on a Transformer decoder and incorporates classifier-free guidance to enhance the stability of the generation process. A watermark Encodec is proposed to embed frame-level watermarks into the edited r… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

  33. arXiv:2409.06946  [pdf, other

    cs.IT eess.SP

    Refracting Reconfigurable Intelligent Surface Assisted URLLC for Millimeter Wave High-Speed Train Communication Coverage Enhancement

    Authors: Changzhu Liu, Ruisi He, Yong Niu, Shiwen Mao, Bo Ai, Ruifeng Chen

    Abstract: High-speed train (HST) has garnered significant attention from both academia and industry due to the rapid development of railways worldwide. Millimeter wave (mmWave) communication, known for its large bandwidth is an effective way to address performance bottlenecks in cellular network based HST wireless communication systems. However, mmWave signals suffer from significant path loss when traversi… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 11 figures, accepted by IEEE Transactions on Vehicular Technology

  34. arXiv:2409.00926  [pdf, other

    cs.CV

    Towards Student Actions in Classroom Scenes: New Dataset and Baseline

    Authors: Zhuolin Tan, Chenqiang Gao, Anyong Qin, Ruixin Chen, Tiecheng Song, Feng Yang, Deyu Meng

    Abstract: Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label student action video (SAV) dataset for complex classroom scenes. The dataset consists of 4,324 carefully trimmed video clips from 758 different… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  35. arXiv:2408.15903  [pdf, other

    cs.CL

    LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments

    Authors: Ruirui Chen, Weifeng Jiang, Chengwei Qin, Ishaan Singh Rawal, Cheston Tan, Dongkyu Choi, Bo Xiong, Bo Ai

    Abstract: The rapid obsolescence of information in Large Language Models (LLMs) has driven the development of various techniques to incorporate new facts. However, existing methods for knowledge editing still face difficulties with multi-hop questions that require accurate fact identification and sequential logical reasoning, particularly among numerous fact updates. To tackle these challenges, this paper i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  36. arXiv:2408.15217  [pdf, other

    eess.IV cs.AI cs.CV

    Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

    Authors: Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

    Abstract: Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted by Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2024

  37. arXiv:2408.14754  [pdf, other

    physics.med-ph cs.AI cs.CV physics.ins-det

    Sequential-Scanning Dual-Energy CT Imaging Using High Temporal Resolution Image Reconstruction and Error-Compensated Material Basis Image Generation

    Authors: Qiaoxin Li, Ruifeng Chen, Peng Wang, Guotao Quan, Yanfeng Du, Dong Liang, Yinsheng Li

    Abstract: Dual-energy computed tomography (DECT) has been widely used to obtain quantitative elemental composition of imaged subjects for personalized and precise medical diagnosis. Compared with DECT leveraging advanced X-ray source and/or detector technologies, the use of the sequential-scanning data acquisition scheme to implement DECT may make a broader impact on clinical practice because this scheme re… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  38. arXiv:2408.13085  [pdf, other

    cs.CV cs.AI

    Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge

    Authors: Mingyu Xiao, Runze Chen, Haiyong Luo, Fang Zhao, Juan Wang, Xuepeng Ma

    Abstract: Map-free relocalization technology is crucial for applications in autonomous navigation and augmented reality, but relying on pre-built maps is often impractical. It faces significant challenges due to limitations in matching methods and the inherent lack of scale in monocular images. These issues lead to substantial rotational and metric errors and even localization failures in real-world scenari… ▽ More

    Submitted 18 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 17 pages,6 figures

  39. arXiv:2408.12354  [pdf, other

    eess.AS cs.SD

    LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

    Authors: Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang, Rilin Chen, Lirong Dai

    Abstract: Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from low efficiency caused by a mass of inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion mo… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted to ISCSLP 2024. arXiv admin note: text overlap with arXiv:2406.05325

  40. arXiv:2408.11843  [pdf, other

    cs.CL cs.AI

    Editable Fairness: Fine-Grained Bias Mitigation in Language Models

    Authors: Ruizhe Chen, Yichen Li, Jianfei Yang, Joey Tianyi Zhou, Zuozhu Liu

    Abstract: Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.09341

  41. arXiv:2408.10636  [pdf

    eess.IV cs.CV

    UWF-RI2FA: Generating Multi-frame Ultrawide-field Fluorescein Angiography from Ultrawide-field Retinal Imaging Improves Diabetic Retinopathy Stratification

    Authors: Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He

    Abstract: Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its… ▽ More

    Submitted 27 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 22 pages, 2 figures

  42. arXiv:2408.08146  [pdf, other

    cs.CL

    KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning

    Authors: Kaiqi Zhang, Jing Zhao, Rui Chen

    Abstract: Large Language Models (LLMs) exhibit high inference latency due to their autoregressive decoding nature. While the draft head in speculative decoding mitigates this issue, its full potential remains unexplored. In this paper, we introduce KOALA (K-layer Optimized Adversarial Learning Architecture), an orthogonal approach to the draft head. By transforming the conventional single-layer draft head i… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  43. arXiv:2408.07990  [pdf, other

    cs.CL

    FuseChat: Knowledge Fusion of Chat Models

    Authors: Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan

    Abstract: While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM developm… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Work in progress

  44. arXiv:2408.07146  [pdf, other

    cs.CV cs.AI

    Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces

    Authors: Zhiling Chen, Hanning Chen, Mohsen Imani, Ruimin Chen, Farhad Imani

    Abstract: Workplace accidents due to personal protective equipment (PPE) non-compliance raise serious safety concerns and lead to legal liabilities, financial penalties, and reputational damage. While object detection models have shown the capability to address this issue by identifying safety items, most existing models, such as YOLO, Faster R-CNN, and SSD, are limited in verifying the fine-grained attribu… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 20 pages, 7 figures

  45. arXiv:2408.06592  [pdf, other

    cs.CV

    ActiveNeRF: Learning Accurate 3D Geometry by Active Pattern Projection

    Authors: Jianyu Tao, Changping Hu, Edward Yang, Jing Xu, Rui Chen

    Abstract: NeRFs have achieved incredible success in novel view synthesis. However, the accuracy of the implicit geometry is unsatisfactory because the passive static environmental illumination has low spatial frequency and cannot provide enough information for accurate geometry reconstruction. In this work, we propose ActiveNeRF, a 3D geometry reconstruction framework, which improves the geometry quality of… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 18 pages, 10 figures

  46. arXiv:2408.03886  [pdf, other

    cs.IR

    Retrieval Augmentation via User Interest Clustering

    Authors: Hanjia Lyu, Hanqing Zeng, Yinglong Xia, Ren Chen, Jiebo Luo

    Abstract: Many existing industrial recommender systems are sensitive to the patterns of user-item engagement. Light users, who interact less frequently, correspond to a data sparsity problem, making it difficult for the system to accurately learn and represent their preferences. On the other hand, heavy users with rich interaction history often demonstrate a variety of niche interests that are hard to be pr… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  47. arXiv:2408.03262  [pdf, other

    cs.SE

    Towards Fixing Panic Bugs for Real-world Rust Programs

    Authors: Yunbo Ni, Yang Feng, Zixi Liu, Runtao Chen, Baowen Xu

    Abstract: The Rust programming language has garnered significant attention due to its robust safety features and memory management capabilities. Despite its guaranteed memory safety, Rust programs still suffer from runtime errors that are unmanageable, i.e., panic errors. Notably, over half of the bugs in rustc, Rust's own compiler, are attributable to crash stemming from panic errors. However, understandin… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  48. arXiv:2408.02859  [pdf, other

    eess.IV cs.AI cs.CV

    Multistain Pretraining for Slide Representation Learning in Pathology

    Authors: Guillaume Jaume, Anurag Vaidya, Andrew Zhang, Andrew H. Song, Richard J. Chen, Sharifa Sahai, Dandan Mo, Emilio Madrigal, Long Phi Le, Faisal Mahmood

    Abstract: Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learnin… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: ECCV'24

  49. arXiv:2408.02131  [pdf, other

    cs.CR cs.LG

    Model Hijacking Attack in Federated Learning

    Authors: Zheng Li, Siyuan Wu, Ruichuan Chen, Paarijaat Aditya, Istemi Ekin Akkus, Manohar Vanga, Min Zhang, Hao Li, Yang Zhang

    Abstract: Machine learning (ML), driven by prominent paradigms such as centralized and federated learning, has made significant progress in various critical applications ranging from autonomous driving to face recognition. However, its remarkable success has been accompanied by various attacks. Recently, the model hijacking attack has shown that ML models can be hijacked to execute tasks different from thei… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  50. arXiv:2407.21490  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation

    Authors: Junxuan Yu, Rusi Chen, Yongsong Zhou, Yanlin Chen, Yaofei Duan, Yuhao Huang, Han Zhou, Tan Tao, Xin Yang, Dong Ni

    Abstract: Echocardiography video is a primary modality for diagnosing heart diseases, but the limited data poses challenges for both clinical teaching and machine learning training. Recently, video generative models have emerged as a promising strategy to alleviate this issue. However, previous methods often relied on holistic conditions during generation, hindering the flexible movement control over specif… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI MLMI 2024