Skip to main content

Showing 1–50 of 479 results for author: Zheng, C

  1. arXiv:2410.13032  [pdf, other

    cs.AI cs.LG stat.ML

    Hypothesis Testing the Circuit Hypothesis in LLMs

    Authors: Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei

    Abstract: Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. But how can we evaluate this hypothesis? In this paper, we formalize a set of criteria that a circuit is hypothesized to meet and develop a suite of hypothe… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Code available here: https://github.com/blei-lab/circuitry

  2. arXiv:2410.10527  [pdf, other

    cs.CV

    Motion-guided small MAV detection in complex and non-planar scenes

    Authors: Hanqing Guo, Canlun Zheng, Shiyu Zhao

    Abstract: In recent years, there has been a growing interest in the visual detection of micro aerial vehicles (MAVs) due to its importance in numerous applications. However, the existing methods based on either appearance or motion features encounter difficulties when the background is complex or the MAV is too small. In this paper, we propose a novel motion-guided MAV detector that can accurately identify… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 8 pages, 6 figures

    Journal ref: Pattern Recognition Letters 2024

  3. arXiv:2410.10102  [pdf, other

    cs.GR math.NA

    Trust-Region Eigenvalue Filtering for Projected Newton

    Authors: Honglin Chen, Hsueh-Ti Derek Liu, Alec Jacobson, David I. W. Levin, Changxi Zheng

    Abstract: We introduce a novel adaptive eigenvalue filtering strategy to stabilize and accelerate the optimization of Neo-Hookean energy and its variants under the Projected Newton framework. For the first time, we show that Newton's method, Projected Newton with eigenvalue clamping and Projected Newton with absolute eigenvalue filtering can be unified using ideas from the generalized trust region method. B… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: SIGGRAPH Asia 2024 (Conference track). Project page: https://www.cs.columbia.edu/cg/trust-region/

  4. arXiv:2410.08935  [pdf, other

    cs.RO

    Voxel-SLAM: A Complete, Accurate, and Versatile LiDAR-Inertial SLAM System

    Authors: Zheng Liu, Haotian Li, Chongjian Yuan, Xiyuan Liu, Jiarong Lin, Rundong Li, Chunran Zheng, Bingyang Zhou, Wenyi Liu, Fu Zhang

    Abstract: In this work, we present Voxel-SLAM: a complete, accurate, and versatile LiDAR-inertial SLAM system that fully utilizes short-term, mid-term, long-term, and multi-map data associations to achieve real-time estimation and high precision mapping. The system consists of five modules: initialization, odometry, local mapping, loop closure, and global mapping, all employing the same map representation,… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  5. arXiv:2410.06854  [pdf, other

    cs.GR cs.HC

    Focal Surface Holographic Light Transport using Learned Spatially Adaptive Convolutions

    Authors: Chuanjun Zheng, Yicheng Zhan, Liang Shi, Ozan Cakmakci, Kaan Akşit

    Abstract: Computer-Generated Holography (CGH) is a set of algorithmic methods for identifying holograms that reconstruct Three-Dimensional (3D) scenes in holographic displays. CGH algorithms decompose 3D scenes into multiplanes at different depth levels and rely on simulations of light that propagated from a source plane to a targeted plane. Thus, for n planes, CGH typically optimizes holograms using n plan… ▽ More

    Submitted 14 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: SIGGRAPH Asia 2024 Technical Communications

  6. arXiv:2410.05739  [pdf, other

    cs.SD cs.AI eess.AS

    Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals

    Authors: Cheng Chi, Xiaoyu Li, Andong Li, Yuxuan Ke, Xiaodong Li, Chengshi Zheng

    Abstract: Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly.… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  7. arXiv:2410.04798  [pdf, other

    cs.CL

    DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

    Authors: Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

    Abstract: The attention mechanism is a fundamental component of the Transformer model, contributing to interactions among distinct tokens, in contrast to earlier feed-forward neural networks. In general, the attention scores are determined simply by the key-query products. However, this work's occasional trial (combining DAPE and NoPE) of including additional MLPs on attention scores without position encodi… ▽ More

    Submitted 10 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Tech Report. Compared to DAPE, this work (DAPE V2) further analyzes the length extrapolation problem and translate the length extrapolation issue into a well-understood feature map processing problem. arXiv admin note: text overlap with arXiv:2405.14722

  8. arXiv:2410.03090  [pdf, other

    cs.CL cs.LG

    UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference

    Authors: Jing Xiong, Jianghan Shen, Fanghua Ye, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo, Lingpeng Kong, Ngai Wong

    Abstract: Deploying large language models (LLMs) is challenging due to their high memory and computational demands, especially during long-context inference. While key-value (KV) caching accelerates inference by reusing previously computed keys and values, it also introduces significant memory overhead. Existing KV cache compression methods such as eviction and merging typically compress the KV cache after… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  9. arXiv:2410.02719  [pdf, other

    cs.CL

    UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

    Authors: Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong

    Abstract: We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) that utilizes Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. This span uncertainty enhances model calibration, improving robustness and mitigating semantic inconsistencies introduced by random chunking. Leveraging this insight, we propose an efficient un… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  10. arXiv:2410.00772  [pdf, other

    cs.CV cs.LG

    On the Generalization and Causal Explanation in Self-Supervised Learning

    Authors: Wenwen Qiang, Zeen Song, Ziyin Gu, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong

    Abstract: Self-supervised learning (SSL) methods learn from unlabeled data and achieve high generalization performance on downstream tasks. However, they may also suffer from overfitting to their training data and lose the ability to adapt to new tasks. To investigate this phenomenon, we conduct experiments on various SSL methods and datasets and make two observations: (1) Overfitting occurs abruptly in lat… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  11. arXiv:2409.19676  [pdf, other

    cs.CV cs.AI

    See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning

    Authors: Chengxin Zheng, Junzhong Ji, Yanzhao Shi, Xiaodan Zhang, Liangqiong Qu

    Abstract: Brain CT report generation is significant to aid physicians in diagnosing cranial diseases. Recent studies concentrate on handling the consistency between visual and textual pathological features to improve the coherence of report. However, there exist some challenges: 1) Redundant visual representing: Massive irrelevant areas in 3D scans distract models from representing salient visual contexts.… ▽ More

    Submitted 1 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: Our work has been accepted by EMNLP2024 findings

  12. arXiv:2409.17830  [pdf, other

    cs.CV

    Unsupervised Learning Based Multi-Scale Exposure Fusion

    Authors: Chaobing Zheng, Shiqian Wu, Zhenggguo Li

    Abstract: Unsupervised learning based multi-scale exposure fusion (ULMEF) is efficient for fusing differently exposed low dynamic range (LDR) images into a higher quality LDR image for a high dynamic range (HDR) scene. Unlike supervised learning, loss functions play a crucial role in the ULMEF. In this paper, novel loss functions are proposed for the ULMEF and they are defined by using all the images to be… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 11 pages

  13. arXiv:2409.16997  [pdf, other

    cs.LG cs.AI

    INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

    Authors: Shimao Chen, Zirui Liu, Zhiying Wu, Ce Zheng, Peizhuang Cong, Zihan Jiang, Yuhan Wu, Lei Su, Tong Yang

    Abstract: As the foundation of large language models (LLMs), self-attention module faces the challenge of quadratic time and memory complexity with respect to sequence length. FlashAttention accelerates attention computation and reduces its memory usage by leveraging the GPU memory hierarchy. A promising research direction is to integrate FlashAttention with quantization methods. This paper introduces INT-F… ▽ More

    Submitted 26 September, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

  14. arXiv:2409.15269  [pdf, other

    cs.CV

    ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

    Authors: Chen Guo, Tianjian Jiang, Manuel Kaufmann, Chengwei Zheng, Julien Valentin, Jie Song, Otmar Hilliges

    Abstract: While previous years have seen great progress in the 3D reconstruction of humans from monocular videos, few of the state-of-the-art methods are able to handle loose garments that exhibit large non-rigid surface deformations during articulation. This limits the application of such methods to humans that are dressed in standard pants or T-shirts. Our method, ReLoo, overcomes this limitation and reco… ▽ More

    Submitted 28 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: Project page: https://moygcc.github.io/ReLoo/

  15. arXiv:2409.14741  [pdf, other

    cs.CV cs.AI

    Less yet robust: crucial region selection for scene recognition

    Authors: Jianqi Zhang, Mengxuan Wang, Jingyao Wang, Lingyu Si, Changwen Zheng, Fanjiang Xu

    Abstract: Scene recognition, particularly for aerial and underwater images, often suffers from various types of degradation, such as blurring or overexposure. Previous works that focus on convolutional neural networks have been shown to be able to extract panoramic semantic features and perform well on scene recognition tasks. However, low-quality images still impede model performance due to the inappropria… ▽ More

    Submitted 20 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  16. arXiv:2409.14228  [pdf, other

    cs.HC

    Mentigo: An Intelligent Agent for Mentoring Students in the Creative Problem Solving Process

    Authors: Siyu Zha, Yujia Liu, Chengbo Zheng, Jiaqi XU, Fuze Yu, Jiangtao Gong, Yingqing XU

    Abstract: With the increasing integration of large lauguage models (LLMs) in education, there is growing interest in using AI agents to support student learning in creative tasks. This study presents an interactive Mentor Agent system named Mentigo, which is designed to assist middle school students in the creative problem solving (CPS) process. We created a comprehensive dataset of real classroom interacti… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: Comments: 19 pages, 5 figures. Submitted to CHI 2025

    MSC Class: 68U35 (Primary); 68T50 (Secondary) ACM Class: H.5.2; K.3.1

  17. arXiv:2409.11505  [pdf, other

    cs.IR

    Perceptions of Edinburgh: Capturing Neighbourhood Characteristics by Clustering Geoparsed Local News

    Authors: Andreas Grivas, Claire Grover, Richard Tobin, Clare Llewellyn, Eleojo Oluwaseun Abubakar, Chunyu Zheng, Chris Dibben, Alan Marshall, Jamie Pearce, Beatrice Alex

    Abstract: The communities that we live in affect our health in ways that are complex and hard to define. Moreover, our understanding of the place-based processes affecting health and inequalities is limited. This undermines the development of robust policy interventions to improve local health and well-being. News media provides social and community information that may be useful in health studies. Here we… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Preprint - paper under submission

  18. arXiv:2409.08474  [pdf, other

    cs.LG cs.CV

    Rethinking Meta-Learning from a Learning Lens

    Authors: Jingyao Wang, Wenwen Qiang, Jiangmeng Li, Lingyu Si, Changwen Zheng

    Abstract: Meta-learning has emerged as a powerful approach for leveraging knowledge from previous tasks to solve new tasks. The mainstream methods focus on training a well-generalized model initialization, which is then adapted to different tasks with limited data and updates. However, it pushes the model overfitting on the training tasks. Previous methods mainly attributed this to the lack of data and used… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  19. arXiv:2409.05310  [pdf, other

    cs.RO cs.CV

    Neural Surface Reconstruction and Rendering for LiDAR-Visual Systems

    Authors: Jianheng Liu, Chunran Zheng, Yunfei Wan, Bowen Wang, Yixi Cai, Fu Zhang

    Abstract: This paper presents a unified surface reconstruction and rendering framework for LiDAR-visual systems, integrating Neural Radiance Fields (NeRF) and Neural Distance Fields (NDF) to recover both appearance and structural information from posed images and point clouds. We address the structural visible gap between NeRF and NDF by utilizing a visible-aware occupancy map to classify space into the fre… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  20. arXiv:2409.04679  [pdf, other

    cs.CV

    Neural Augmentation Based Panoramic High Dynamic Range Stitching

    Authors: Chaobing Zheng, Yilun Xu, Weihai Chen, Shiqian Wu, Zhengguo Li

    Abstract: Due to saturated regions of inputting low dynamic range (LDR) images and large intensity changes among the LDR images caused by different exposures, it is challenging to produce an information enriched panoramic LDR image without visual artifacts for a high dynamic range (HDR) scene through stitching multiple geometrically synchronized LDR images with different exposures and pairwise overlapping f… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: 11 pages

  21. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 23 pages, 6 figures

  22. arXiv:2409.00992  [pdf, other

    cs.RO

    MFCalib: Single-shot and Automatic Extrinsic Calibration for LiDAR and Camera in Targetless Environments Based on Multi-Feature Edge

    Authors: Tianyong Ye, Wei Xu, Chunran Zheng, Yukang Cui

    Abstract: This paper presents MFCalib, an innovative extrinsic calibration technique for LiDAR and RGB camera that operates automatically in targetless environments with a single data capture. At the heart of this method is using a rich set of edge information, significantly enhancing calibration accuracy and robustness. Specifically, we extract both depth-continuous and depth-discontinuous edges, along wit… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 8 pages, 10 figures, accepted by IROS2024

  23. arXiv:2408.16228  [pdf, other

    cs.RO cs.LG

    Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

    Authors: Vivek Myers, Bill Chunyuan Zheng, Oier Mees, Sergey Levine, Kuan Fang

    Abstract: Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO)… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 27 pages, 14 figures

  24. arXiv:2408.14089  [pdf, other

    cs.IT eess.SP

    Mini-Slot-Assisted Short Packet URLLC:Differential or Coherent Detection?

    Authors: Canjian Zheng, Fu-Chun Zheng, Jingjing Luo, Pengcheng Zhu, Xiaohu You, Daquan Feng

    Abstract: One of the primary challenges in short packet ultra-reliable and low-latency communications (URLLC) is to achieve reliable channel estimation and data detection while minimizing the impact on latency performance. Given the small packet size in mini-slot-assisted URLLC, relying solely on pilot-based coherent detection is almost impossible to meet the seemingly contradictory requirements of high cha… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 14 pages, 8 figures, journal

  25. arXiv:2408.14035  [pdf, other

    cs.RO cs.CV

    FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

    Authors: Chunran Zheng, Wei Xu, Zuhao Zou, Tong Hua, Chongjian Yuan, Dongjiao He, Bingyang Zhou, Zheng Liu, Jiarong Lin, Fangcheng Zhu, Yunfan Ren, Rong Wang, Fanle Meng, Fu Zhang

    Abstract: This paper proposes FAST-LIVO2: a fast, direct LiDAR-inertial-visual odometry framework to achieve accurate and robust state estimation in SLAM tasks and provide great potential in real-time, onboard robotic applications. FAST-LIVO2 fuses the IMU, LiDAR and image measurements efficiently through an ESIKF. To address the dimension mismatch between the heterogeneous LiDAR and image measurements, we… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: 30 pages, 31 figures, due to the limitation that 'The abstract field cannot exceed 1,920 characters', the abstract presented here is shorter than the one in the PDF file

  26. arXiv:2408.13912  [pdf, other

    cs.CV cs.LG

    Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

    Authors: Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu

    Abstract: In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extend… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Our project page can be found at: https://splatt3r.active.vision/

  27. arXiv:2408.10519  [pdf, other

    cs.DC cs.DS

    Almost Optimal Algorithms for Token Collision in Anonymous Networks

    Authors: Sirui Bai, Xinyu Fu, Xudong Wu, Penghui Yao, Chaodong Zheng

    Abstract: In distributed systems, situations often arise where some nodes each holds a collection of tokens, and all nodes collectively need to determine whether all tokens are distinct. For example, if each token represents a logged-in user, the problem corresponds to checking whether there are duplicate logins. Similarly, if each token represents a data object or a timestamp, the problem corresponds to ch… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  28. arXiv:2408.09676  [pdf, other

    cs.CV

    Image-based Freeform Handwriting Authentication with Energy-oriented Self-Supervised Learning

    Authors: Jingyao Wang, Luntian Mou, Changwen Zheng, Wen Gao

    Abstract: Freeform handwriting authentication verifies a person's identity from their writing style and habits in messy handwriting data. This technique has gained widespread attention in recent years as a valuable tool for various fields, e.g., fraud prevention and cultural heritage protection. However, it still remains a challenging task in reality due to three reasons: (i) severe damage, (ii) complex hig… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by TMM

  29. VrdONE: One-stage Video Visual Relation Detection

    Authors: Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng, Huaidong Zhang, Shengfeng He

    Abstract: Video Visual Relation Detection (VidVRD) focuses on understanding how entities interact over time and space in videos, a key step for gaining deeper insights into video scenes beyond basic visual tasks. Traditional methods for VidVRD, challenged by its complexity, typically split the task into two parts: one for identifying what relation categories are present and another for determining their tem… ▽ More

    Submitted 16 October, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures, accepted by ACM Multimedia 2024

  30. arXiv:2408.07884  [pdf, other

    cs.CL

    Instruct Large Language Models to Generate Scientific Literature Survey Step by Step

    Authors: Yuxuan Lai, Yupeng Wu, Yidan Wang, Wenpeng Hu, Chen Zheng

    Abstract: Abstract. Automatically generating scientific literature surveys is a valuable task that can significantly enhance research efficiency. However, the diverse and complex nature of information within a literature survey poses substantial challenges for generative models. In this paper, we design a series of prompts to systematically leverage large language models (LLMs), enabling the creation of com… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: NLPCC 2024

  31. Flexible 3D Lane Detection by Hierarchical Shape MatchingFlexible 3D Lane Detection by Hierarchical Shape Matching

    Authors: Zhihao Guan, Ruixin Liu, Zejian Yuan, Ao Liu, Kun Tang, Tong Zhou, Erlong Li, Chao Zheng, Shuqi Mei

    Abstract: As one of the basic while vital technologies for HD map construction, 3D lane detection is still an open problem due to varying visual conditions, complex typologies, and strict demands for precision. In this paper, an end-to-end flexible and hierarchical lane detector is proposed to precisely predict 3D lane lines from point clouds. Specifically, we design a hierarchical network predicting flexib… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  32. arXiv:2408.04631  [pdf, other

    cs.CV cs.AI

    Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

    Authors: Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

    Abstract: We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories (i.e., drags), Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions. This is achieved by fine-tuning a large-scale pre-trained video diffusio… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Project page: https://vgg-puppetmaster.github.io/

  33. arXiv:2408.04276  [pdf, other

    cs.LG

    Early Risk Assessment Model for ICA Timing Strategy in Unstable Angina Patients Using Multi-Modal Machine Learning

    Authors: Candi Zheng, Kun Liu, Yang Wang, Shiyi Chen, Hongli Li

    Abstract: Background: Invasive coronary arteriography (ICA) is recognized as the gold standard for diagnosing cardiovascular diseases, including unstable angina (UA). The challenge lies in determining the optimal timing for ICA in UA patients, balancing the need for revascularization in high-risk patients against the potential complications in low-risk ones. Unlike myocardial infarction, UA does not have sp… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  34. WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization

    Authors: Liwenhan Xie, Chengbo Zheng, Haijun Xia, Huamin Qu, Chen Zhu-Tian

    Abstract: Large language models (LLMs) support data analysis through conversational user interfaces, as exemplified in OpenAI's ChatGPT (formally known as Advanced Data Analysis or Code Interpreter). Essentially, LLMs produce code for accomplishing diverse analysis tasks. However, presenting raw code can obscure the logic and hinder user verification. To empower users with enhanced comprehension and augment… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted in the 37th Annual ACM Symposium on User Interface Software and Technology (UIST'24)

  35. arXiv:2408.00447  [pdf, other

    cs.HC cs.AI cs.IR

    DiscipLink: Unfolding Interdisciplinary Information Seeking Process via Human-AI Co-Exploration

    Authors: Chengbo Zheng, Yuanhao Zhang, Zeyu Huang, Chuhan Shi, Minrui Xu, Xiaojuan Ma

    Abstract: Interdisciplinary studies often require researchers to explore literature in diverse branches of knowledge. Yet, navigating through the highly scattered knowledge from unfamiliar disciplines poses a significant challenge. In this paper, we introduce DiscipLink, a novel interactive system that facilitates collaboration between researchers and large language models (LLMs) in interdisciplinary inform… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  36. arXiv:2407.17303  [pdf

    cs.LG

    MoveLight: Enhancing Traffic Signal Control through Movement-Centric Deep Reinforcement Learning

    Authors: Junqi Shao, Chenhao Zheng, Yuxuan Chen, Yucheng Huang, Rui Zhang

    Abstract: This paper introduces MoveLight, a novel traffic signal control system that enhances urban traffic management through movement-centric deep reinforcement learning. By leveraging detailed real-time data and advanced machine learning techniques, MoveLight overcomes the limitations of traditional traffic signal control methods. It employs a lane-level control approach using the FRAP algorithm to achi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  37. arXiv:2407.14686  [pdf, other

    cs.HC cs.CY

    Using Case Studies to Teach Responsible AI to Industry Practitioners

    Authors: Julia Stoyanovich, Rodrigo Kreis de Paula, Armanda Lewis, Chloe Zheng

    Abstract: Responsible AI (RAI) is the science and the practice of making the design, development, and use of AI socially sustainable: of reaping the benefits of innovation while controlling the risks. Naturally, industry practitioners play a decisive role in our collective ability to achieve the goals of RAI. Unfortunately, we do not yet have consolidated educational materials and effective methodologies fo… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  38. arXiv:2407.14069  [pdf, other

    cs.CV

    Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective

    Authors: Zeen Song, Jingyao Wang, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

    Abstract: Video contrastive learning (v-CL) has gained prominence as a leading framework for unsupervised video representation learning, showcasing impressive performance across various tasks such as action classification and detection. In the field of video representation learning, a feature extractor should ideally capture both static and dynamic semantics. However, our series of experiments reveals that… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  39. arXiv:2407.14058  [pdf, other

    cs.LG

    On the Causal Sufficiency and Necessity of Multi-Modal Representation Learning

    Authors: Jingyao Wang, Wenwen Qiang, Jiangmeng Li, Lingyu Si, Changwen Zheng, Bing Su

    Abstract: An effective paradigm of multi-modal learning (MML) is to learn unified representations among modalities. From a causal perspective, constraining the consistency between different modalities can mine causal representations that convey primary events. However, such simple consistency may face the risk of learning insufficient or unnecessary information: a necessary but insufficient cause is invaria… ▽ More

    Submitted 30 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  40. arXiv:2407.13541  [pdf, other

    cs.CV

    On the Discriminability of Self-Supervised Representation Learning

    Authors: Zeen Song, Wenwen Qiang, Changwen Zheng, Fuchun Sun, Hui Xiong

    Abstract: Self-supervised learning (SSL) has recently achieved significant success in downstream visual tasks. However, a notable gap still exists between SSL and supervised learning (SL), especially in complex downstream tasks. In this paper, we show that the features learned by SSL methods suffer from the crowding problem, where features of different classes are not distinctly separated, and features with… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  41. arXiv:2407.12415  [pdf, other

    cs.LG

    Not All Frequencies Are Created Equal:Towards a Dynamic Fusion of Frequencies in Time-Series Forecasting

    Authors: Xingyu Zhang, Siyu Zhao, Zeen Song, Huijie Guo, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

    Abstract: Long-term time series forecasting is a long-standing challenge in various applications. A central issue in time series forecasting is that methods should expressively capture long-term dependency. Furthermore, time series forecasting methods should be flexible when applied to different scenarios. Although Fourier analysis offers an alternative to effectively capture reusable and periodic patterns… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Accpeted by ACMMM2024

  42. arXiv:2407.12322  [pdf, other

    cs.CV

    Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

    Authors: Wenhan Wu, Ce Zheng, Zihao Yang, Chen Chen, Srijan Das, Aidong Lu

    Abstract: Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that… ▽ More

    Submitted 29 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  43. arXiv:2407.07554  [pdf, other

    cs.GR cs.SD eess.AS

    Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation

    Authors: Zikai Huang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Chenxi Zheng, Jing Qin, Shengfeng He

    Abstract: Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  44. arXiv:2407.07078  [pdf, other

    cs.CV

    MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images

    Authors: Ziyang Xu, Huangxuan Zhao, Ziwei Cui, Wenyu Liu, Chuansheng Zheng, Xinggang Wang

    Abstract: Artificial intelligence has become a crucial tool for medical image analysis. As an advanced cerebral angiography technique, Digital Subtraction Angiography (DSA) poses a challenge where the radiation dose to humans is proportional to the image count. By reducing images and using AI interpolation instead, the radiation can be cut significantly. However, DSA images present more complex motion and s… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to ECAI2024

  45. arXiv:2407.02855  [pdf, other

    cs.CR cs.CL cs.LG

    Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

    Authors: Zhexin Zhang, Junxiao Yang, Pei Ke, Shiyao Cui, Chujie Zheng, Hongning Wang, Minlie Huang

    Abstract: LLMs are known to be vulnerable to jailbreak attacks, even after safety alignment. An important observation is that, while different types of jailbreak attacks can generate significantly different queries, they mostly result in similar responses that are rooted in the same harmful knowledge (e.g., detailed steps to make a bomb). Therefore, we conjecture that directly unlearn the harmful knowledge… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 15 pages

  46. arXiv:2406.18075  [pdf, other

    cs.SE

    A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter

    Authors: Mohamed Salah Bouafif, Chen Zheng, Ilham Ahmed Qasse, Ed Zulkoski, Mohammad Hamdaqa, Foutse Khomh

    Abstract: The surge in the adoption of smart contracts necessitates rigorous auditing to ensure their security and reliability. Manual auditing, although comprehensive, is time-consuming and heavily reliant on the auditor's expertise. With the rise of Large Language Models (LLMs), there is growing interest in leveraging them to assist auditors in the auditing process (co-auditing). However, the effectivenes… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  47. arXiv:2406.17182  [pdf, other

    cs.IR cs.LG

    Debiased Recommendation with Noisy Feedback

    Authors: Haoxuan Li, Chunyuan Zheng, Wenjie Wang, Hao Wang, Fuli Feng, Xiao-Hua Zhou

    Abstract: Ratings of a user to most items in recommender systems are usually missing not at random (MNAR), largely because users are free to choose which items to rate. To achieve unbiased learning of the prediction model under MNAR data, three typical solutions have been proposed, including error-imputation-based (EIB), inverse-propensity-scoring (IPS), and doubly robust (DR) methods. However, these method… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: KDD 24 Research Track Paper

  48. arXiv:2406.17098  [pdf, other

    cs.LG cs.AI

    Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

    Authors: Vivek Myers, Chongyi Zheng, Anca Dragan, Sergey Levine, Benjamin Eysenbach

    Abstract: Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temporal distances in stochastic settings have been stymied by an important limitation: these prior approaches do not satisfy the triangle inequality. This is not me… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  49. arXiv:2406.16815  [pdf, other

    cs.CV

    ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians

    Authors: Yufei Liu, Junshu Tang, Chu Zheng, Shijie Zhang, Jinkun Hao, Junwei Zhu, Dongjin Huang

    Abstract: High-fidelity 3D garment synthesis from text is desirable yet challenging for digital avatar creation. Recent diffusion-based approaches via Score Distillation Sampling (SDS) have enabled new possibilities but either intricately couple with human body or struggle to reuse. We introduce ClotheDreamer, a 3D Gaussian-based method for generating wearable, production-ready 3D garment assets from text p… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project Page: https://ggxxii.github.io/clothedreamer

  50. arXiv:2406.16495  [pdf, other

    cs.CL cs.AI

    OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

    Authors: Jingze Shi, Ting Xie, Bingheng Wu, Chunjun Zheng, Kai Wang

    Abstract: Recent research has shown that combining Mamba with Transformer architecture, which has selective state space and quadratic self-attention mechanism, outperforms using Mamba or Transformer architecture alone in language modeling tasks. The quadratic self-attention mechanism effectively alleviates the shortcomings of selective state space in handling long-term dependencies of any element in the seq… ▽ More

    Submitted 19 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.