Skip to main content

Showing 1–50 of 201 results for author: Gong, C

  1. arXiv:2410.13080  [pdf, other

    cs.CL

    Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models

    Authors: Linhao Luo, Zicheng Zhao, Chen Gong, Gholamreza Haffari, Shirui Pan

    Abstract: Large language models (LLMs) have demonstrated impressive reasoning abilities, but they still struggle with faithful reasoning due to knowledge gaps and hallucinations. To address these issues, knowledge graphs (KGs) have been utilized to enhance LLM reasoning through their structured knowledge. However, existing KG-enhanced methods, either retrieval-based or agent-based, encounter difficulties in… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 21 pages, 10 figures

  2. arXiv:2410.10547  [pdf, other

    cs.CV cs.AI

    Hybrid Transformer for Early Alzheimer's Detection: Integration of Handwriting-Based 2D Images and 1D Signal Features

    Authors: Changqing Gong, Huafeng Qin, Mounîm A. El-Yacoubi

    Abstract: Alzheimer's Disease (AD) is a prevalent neurodegenerative condition where early detection is vital. Handwriting, often affected early in AD, offers a non-invasive and cost-effective way to capture subtle motor changes. State-of-the-art research on handwriting, mostly online, based AD detection has predominantly relied on manually extracted features, fed as input to shallow machine learning models.… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  3. arXiv:2410.07538  [pdf, other

    cs.LG

    Rank Aggregation in Crowdsourcing for Listwise Annotations

    Authors: Wenshui Luo, Haoyu Liu, Yongliang Ding, Tao Zhou, Sheng wan, Runze Wu, Minmin Lin, Cong Zhang, Changjie Fan, Chen Gong

    Abstract: Rank aggregation through crowdsourcing has recently gained significant attention, particularly in the context of listwise ranking annotations. However, existing methods primarily focus on a single problem and partial ranks, while the aggregation of listwise full ranks across numerous problems remains largely unexplored. This scenario finds relevance in various applications, such as model quality a… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 19 pages

  4. arXiv:2410.00455  [pdf, other

    cs.DC

    Fine-Grained Vectorized Merge Sorting on RISC-V: From Register to Cache

    Authors: Jin Zhang, Jincheng Zhou, Xiang Zhang, Di Ma, Chunye Gong

    Abstract: Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group for wide instruction parallelism, optimizing merge sort with this vectorized property is becoming increasingly common. In this paper, we overhaul the divide-so… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  5. arXiv:2409.18512  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis

    Authors: Haoyu Wang, Chunyu Qiang, Tianrui Wang, Cheng Gong, Qiuyu Liu, Yu Jiang, Xiaobao Wang, Chenyang Wang, Chen Zhang

    Abstract: Recent advancements in speech synthesis models, trained on extensive datasets, have demonstrated remarkable zero-shot capabilities. These models can control content, timbre, and emotion in generated speech based on prompt inputs. Despite these advancements, the choice of prompts significantly impacts the output quality, yet most existing selection schemes do not adequately address the control of e… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  6. arXiv:2409.05249  [pdf, other

    cs.CR cs.DB cs.NI

    NetDPSyn: Synthesizing Network Traces under Differential Privacy

    Authors: Danyu Sun, Joann Qiongna Chen, Chen Gong, Tianhao Wang, Zhou Li

    Abstract: As the utilization of network traces for the network measurement research becomes increasingly prevalent, concerns regarding privacy leakage from network traces have garnered the public's attention. To safeguard network traces, researchers have proposed the trace synthesis that retains the essential properties of the raw data. However, previous works also show that synthesis traces with generative… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

    Comments: IMC 2024

  7. arXiv:2409.03970  [pdf, other

    cs.DC cs.DS

    A Hybrid Vectorized Merge Sort on ARM NEON

    Authors: Jincheng Zhou, Jin Zhang, Xiang Zhang, Tiaojie Xiao, Di Ma, Chunye Gong

    Abstract: Sorting algorithms are the most extensively researched topics in computer science and serve for numerous practical applications. Although various sorts have been proposed for efficiency, different architectures offer distinct flavors to the implementation of parallel sorting. In this paper, we propose a hybrid vectorized merge sort on ARM NEON, named NEON Merge Sort for short (NEON-MS). In detail,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by ICA3PP

  8. arXiv:2408.10653  [pdf, other

    cs.CV

    UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement

    Authors: Yingtie Lei, Jia Yu, Yihang Dong, Changwei Gong, Ziyang Zhou, Chi-Man Pun

    Abstract: Underwater image enhancement (UIE) plays a crucial role in various marine applications, but it remains challenging due to the complex underwater environment. Current learning-based approaches frequently lack explicit incorporation of prior knowledge about the physical processes involved in underwater image formation, resulting in limited optimization despite their impressive enhancement results. T… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by DSAA CIVIL 2024

  9. arXiv:2408.10264  [pdf, other

    cs.LG cs.AI cs.IR

    OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of Multimodal Scientific Data

    Authors: Chengyu Gong, Gefei Shen, Luanzheng Guo, Nathan Tallent, Dongfang Zhao

    Abstract: One of the most common operations in multimodal scientific data management is searching for the $k$ most similar items (or, $k$-nearest neighbors, KNN) from the database after being provided a new item. Although recent advances of multimodal machine learning models offer a \textit{semantic} index, the so-called \textit{embedding vectors} mapped from the original multimodal data, the dimension of t… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  10. Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI

    Authors: Lei Zhou, Yuzhong Zhang, Jiadong Zhang, Xuejun Qian, Chen Gong, Kun Sun, Zhongxiang Ding, Xing Wang, Zhenhui Li, Zaiyi Liu, Dinggang Shen

    Abstract: Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal trade-off between computati… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Journal ref: 2024,IEEE Transactions on Medical Imaging

  11. arXiv:2408.05758  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

    Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

    Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  12. arXiv:2408.01784  [pdf, other

    cs.IR

    Graph Stochastic Neural Process for Inductive Few-shot Knowledge Graph Completion

    Authors: Zicheng Zhao, Linhao Luo, Shirui Pan, Chengqi Zhang, Chen Gong

    Abstract: Knowledge graphs (KGs) store enormous facts as relationships between entities. Due to the long-tailed distribution of relations and the incompleteness of KGs, there is growing interest in few-shot knowledge graph completion (FKGC). Existing FKGC methods often assume the existence of all entities in KGs, which may not be practical since new relations and entities can emerge over time. Therefore, we… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  13. arXiv:2408.01548  [pdf, other

    cs.CV

    Trainable Pointwise Decoder Module for Point Cloud Segmentation

    Authors: Bike Chen, Chen Gong, Antti Tikanmäki, Juha Röning

    Abstract: Point cloud segmentation (PCS) aims to make per-point predictions and enables robots and autonomous driving cars to understand the environment. The range image is a dense representation of a large-scale outdoor point cloud, and segmentation models built upon the image commonly execute efficiently. However, the projection of the point cloud onto the range image inevitably leads to dropping points b… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: No comments

  14. arXiv:2407.13986  [pdf, other

    cs.CV

    Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks

    Authors: Cheng Gong, Yao Chen, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun, Le Zhang

    Abstract: Multi-exit network is a promising architecture for efficient model inference by sharing backbone networks and weights among multiple exits. However, the gradient conflict of the shared weights results in sub-optimal accuracy. This paper introduces Deep Feature Surgery (\methodname), which consists of feature partitioning and feature referencing approaches to resolve gradient conflict issues during… ▽ More

    Submitted 9 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  15. arXiv:2407.12857  [pdf, other

    cs.CL cs.DL cs.IR

    Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

    Authors: Jianxiang Yu, Zichen Ding, Jiaqi Tan, Kangyang Luo, Zhenmin Weng, Chenghua Gong, Long Zeng, Renjing Cui, Chengcheng Han, Qiushi Sun, Zhiyong Wu, Yunshi Lan, Xiang Li

    Abstract: In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing… ▽ More

    Submitted 1 October, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by EMNLP 2024

  16. arXiv:2407.12383  [pdf, other

    cs.CV

    Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

    Authors: Chao Gong, Kai Chen, Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang

    Abstract: Text-to-image models encounter safety issues, including concerns related to copyright and Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for erasing inappropriate concepts from diffusion models, they often exhibit incomplete erasure, consume a lot of computing resources, and inadvertently damage generation ability. In this work, we introduce Reliable and Efficient Con… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 accepted

  17. arXiv:2407.05869  [pdf, other

    cs.AI

    PORCA: Root Cause Analysis with Partially Observed Data

    Authors: Chang Gong, Di Yao, Jin Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, Jingping Bi

    Abstract: Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which… ▽ More

    Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  18. arXiv:2407.04029  [pdf, other

    cs.LG

    Robust Learning under Hybrid Noise

    Authors: Yang Wei, Shuo Chen, Shanshan Ye, Bo Han, Chen Gong

    Abstract: Feature noise and label noise are ubiquitous in practical scenarios, which pose great challenges for training a robust machine learning model. Most previous approaches usually deal with only a single problem of either feature noise or label noise. However, in real-world applications, hybrid noise, which contains both feature noise and label noise, is very common due to the unreliable data collecti… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  19. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  20. arXiv:2406.18924  [pdf, other

    cs.AI cs.LG cs.RO

    Learning Pareto Set for Multi-Objective Continuous Robot Control

    Authors: Tianye Shu, Ke Shang, Cheng Gong, Yang Nan, Hisao Ishibuchi

    Abstract: For a control problem with multiple conflicting objectives, there exists a set of Pareto-optimal policies called the Pareto set instead of a single optimal policy. When a multi-objective control problem is continuous and complex, traditional multi-objective reinforcement learning (MORL) algorithms search for many Pareto-optimal deep policies to approximate the Pareto set, which is quite resource-c… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  21. CausalMMM: Learning Causal Structure for Marketing Mix Modeling

    Authors: Chang Gong, Di Yao, Lei Zhang, Sheng Chen, Wenbin Li, Yueyang Su, Jingping Bi

    Abstract: In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better predictio… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: WSDM 2024, full version

  22. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks o… ▽ More

    Submitted 7 October, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  23. DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation

    Authors: Deguo Xia, Weiming Zhang, Xiyan Liu, Wei Zhang, Chenting Gong, Jizhou Huang, Mengmeng Yang, Diange Yang

    Abstract: Generating city-scale lane-level maps faces significant challenges due to the intricate urban environments, such as blurred or absent lane markings. Additionally, a standard lane-level map requires a comprehensive organization of lane groupings, encompassing lane direction, style, boundary, and topology, yet has not been thoroughly examined in prior research. These obstacles result in labor-intens… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024, camera-ready version

  24. arXiv:2406.08911  [pdf, other

    cs.CL eess.AS

    An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

    Authors: Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

    Abstract: Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  25. arXiv:2405.18739  [pdf, other

    cs.NI eess.SP

    FlocOff: Data Heterogeneity Resilient Federated Learning with Communication-Efficient Edge Offloading

    Authors: Mulei Ma, Chenyu Gong, Liekang Zeng, Yang Yang, Liantao Wu

    Abstract: Federated Learning (FL) has emerged as a fundamental learning paradigm to harness massive data scattered at geo-distributed edge devices in a privacy-preserving way. Given the heterogeneous deployment of edge devices, however, their data are usually Non-IID, introducing significant challenges to FL including degraded training accuracy, intensive communication costs, and high computing complexity.… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  26. arXiv:2405.16071  [pdf, other

    cs.CV

    DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution

    Authors: Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan

    Abstract: Region-level multi-modality methods can translate referred image regions to human preferred language descriptions. Unfortunately, most of existing methods using fixed visual inputs remain lacking the resolution adaptability to find out precise language descriptions. In this study, we propose a dynamic resolution approach, referred to as DynRefer, to pursue high-accuracy region-level referring thro… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/callsys/DynRefer

  27. arXiv:2405.10492  [pdf

    cs.CL cs.LG

    Automatic News Generation and Fact-Checking System Based on Language Processing

    Authors: Xirui Peng, Qiming Xu, Zheng Feng, Haopeng Zhao, Lianghao Tan, Yan Zhou, Zecheng Zhang, Chenwei Gong, Yingqiao Zheng

    Abstract: This paper explores an automatic news generation and fact-checking system based on language processing, aimed at enhancing the efficiency and quality of news production while ensuring the authenticity and reliability of the news content. With the rapid development of Natural Language Processing (NLP) and deep learning technologies, automatic news generation systems are capable of extracting key in… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    ACM Class: I.5; H.4

  28. arXiv:2405.10175  [pdf, other

    cs.CV cs.RO

    Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation

    Authors: Bike Chen, Chen Gong, Juha Röning

    Abstract: Point cloud segmentation (PCS) plays an essential role in robot perception and navigation tasks. To efficiently understand large-scale outdoor point clouds, their range image representation is commonly adopted. This image-like representation is compact and structured, making range image-based PCS models practical. However, undesirable missing values in the range images damage the shapes and patter… ▽ More

    Submitted 25 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: No Comments

  29. arXiv:2405.04513  [pdf, other

    cs.CL cs.AI cs.LG

    Switchable Decision: Dynamic Neural Generation Networks

    Authors: Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou

    Abstract: Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each d… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  30. arXiv:2404.17820  [pdf, other

    cs.RO cs.AI cs.LG

    Motion planning for off-road autonomous driving based on human-like cognition and weight adaptation

    Authors: Yuchun Wang, Cheng Gong, Jianwei Gong, Peng Jia

    Abstract: Driving in an off-road environment is challenging for autonomous vehicles due to the complex and varied terrain. To ensure stable and efficient travel, the vehicle requires consideration and balancing of environmental factors, such as undulations, roughness, and obstacles, to generate optimal trajectories that can adapt to changing scenarios. However, traditional motion planners often utilize a fi… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: Journal of Field Robotics,2024,1-22

  31. Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving

    Authors: C. Gong, C. Lu, Z. Li, Z. Liu, J. Gong, X. Chen

    Abstract: Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the d… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: IEEE Transactions on Vehicular Technology 2024 Pages 1-14

  32. arXiv:2404.12530  [pdf, other

    cs.LG cs.CR

    TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents

    Authors: Chen Gong, Kecen Li, Jin Yao, Tianhao Wang

    Abstract: Reinforcement learning (RL) trains an agent from experiences interacting with the environment. In scenarios where online interactions are impractical, offline RL, which trains the agent using pre-collected datasets, has become popular. While this new paradigm presents remarkable effectiveness across various real-world domains, like healthcare and energy management, there is a growing demand to ena… ▽ More

    Submitted 1 September, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted at NDSS 2025. The presented document here is the full version of our paper

  33. arXiv:2404.10464  [pdf, other

    cs.CL cs.AI

    DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

    Authors: Yu Li, Han Jiang, Chuanyang Gong, Zhihua Wei

    Abstract: Despite the remarkable achievements of language models (LMs) across a broad spectrum of tasks, their propensity for generating toxic outputs remains a prevalent concern. Current solutions involving finetuning or auxiliary models usually require extensive computational resources, hindering their practicality in large language models (LLMs). In this paper, we propose DeStein, a novel method that det… ▽ More

    Submitted 10 August, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  34. arXiv:2404.01892  [pdf, other

    cs.CV

    Minimize Quantization Output Error with Bias Compensation

    Authors: Cheng Gong, Haoshuai Zheng, Mengting Hu, Zheng Lin, Deng-Ping Fan, Yuzhi Zhang, Tao Li

    Abstract: Quantization is a promising method that reduces memory usage and computational intensity of Deep Neural Networks (DNNs), but it often leads to significant output error that hinder model deployment. In this paper, we propose Bias Compensation (BC) to minimize the output error, thus realizing ultra-low-precision quantization without model fine-tuning. Instead of optimizing the non-convex quantizatio… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

    Journal ref: CAAI Artificial Intelligence Research, 2024

  35. arXiv:2404.00901  [pdf, other

    cs.CV

    Slightly Shift New Classes to Remember Old Classes for Video Class-Incremental Learning

    Authors: Jian Jiao, Yu Dai, Hefei Mei, Heqian Qiu, Chuanyang Gong, Shiyuan Tang, Xinpeng Hao, Hongliang Li

    Abstract: Recent video class-incremental learning usually excessively pursues the accuracy of the newly seen classes and relies on memory sets to mitigate catastrophic forgetting of the old classes. However, limited storage only allows storing a few representative videos. So we propose SNRO, which slightly shifts the features of new classes to remember old classes. Specifically, SNRO contains Examples Spars… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  36. arXiv:2403.16995  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows

    Authors: Shujian Zhang, Lemeng Wu, Chengyue Gong, Xingchao Liu

    Abstract: Recent works have demonstrated success in controlling sentence attributes ($e.g.$, sentiment) and structure ($e.g.$, syntactic structure) based on the diffusion language model. A key component that drives theimpressive performance for generating high-quality samples from noise is iteratively denoise for thousands of steps. While beneficial, the complexity of starting from the noise and the learnin… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024

  37. arXiv:2403.05100  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume

    Authors: Ping Guo, Cheng Gong, Xi Lin, Zhiyuan Yang, Qingfu Zhang

    Abstract: The escalating threat of adversarial attacks on deep learning models, particularly in security-critical fields, has underscored the need for robust deep learning systems. Conventional robustness evaluations have relied on adversarial accuracy, which measures a model's performance under a specific perturbation intensity. However, this singular metric does not fully encapsulate the overall resilienc… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  38. arXiv:2402.15751  [pdf, other

    cs.LG cs.AI cs.CL

    Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

    Authors: Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You

    Abstract: While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient est… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  39. arXiv:2402.03667  [pdf, other

    cs.CL cs.AI

    Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning

    Authors: Yanfang Zhang, Yiliu Sun, Yibing Zhan, Dapeng Tao, Dacheng Tao, Chen Gong

    Abstract: Recently, increasing attention has been focused drawn on to improve the ability of Large Language Models (LLMs) to perform complex reasoning. However, previous methods, such as Chain-of-Thought and Self-Consistency, mainly follow Direct Reasoning (DR) frameworks, so they will meet difficulty in solving numerous real-world tasks which can hardly be solved via DR. Therefore, to strengthen the reason… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 20 pages,13 figures,4 tables

  40. arXiv:2401.17910  [pdf, other

    cs.CV

    ControlCap: Controllable Region-level Captioning

    Authors: Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

    Abstract: Region-level captioning is challenged by the caption degeneration issue, which refers to that pre-trained multimodal models tend to predict the most frequent captions but miss the less frequent ones. In this study, we propose a controllable region-level captioning (ControlCap) approach, which introduces control words to a multimodal model to address the caption degeneration issue. In specific, Con… ▽ More

    Submitted 9 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: https://github.com/callsys/ControlCap

  41. arXiv:2401.09769  [pdf, other

    cs.SI cs.AI cs.LG

    A Survey on Learning from Graphs with Heterophily: Recent Advances and Future Directions

    Authors: Chenghua Gong, Yao Cheng, Jianxiang Yu, Can Xu, Caihua Shan, Siqiang Luo, Xiang Li

    Abstract: Graphs are structured data that models complex relations between real-world entities. Heterophilic graphs, where linked nodes are prone to be with different labels or dissimilar features, have recently attracted significant attention and found many real-world applications. Meanwhile, increasing efforts have been made to advance learning from graphs with heterophily. Various graph heterophily measu… ▽ More

    Submitted 30 September, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 64 pages

  42. arXiv:2401.06826  [pdf, other

    cs.LG cs.AI cs.CV

    Direct Distillation between Different Domains

    Authors: Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama

    Abstract: Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  43. Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute

    Authors: Chaoqun Gong, Yuqin Dai, Ronghui Li, Achun Bao, Jun Li, Jian Yang, Yachao Zhang, Xiu Li

    Abstract: Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the co… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  44. Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation

    Authors: Yunqi Gu, Tao Zhou, Yizhe Zhang, Yi Zhou, Kelei He, Chen Gong, Huazhu Fu

    Abstract: Medical image segmentation plays a crucial role in computer-aided diagnosis. However, existing methods heavily rely on fully supervised training, which requires a large amount of labeled data with time-consuming pixel-wise annotations. Moreover, accurately segmenting lesions poses challenges due to variations in shape, size, and location. To address these issues, we propose a novel Dual-scale Enha… ▽ More

    Submitted 2 September, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: 12 pages 10 figures

  45. arXiv:2312.15195  [pdf, other

    cs.AI cs.LG eess.SY

    Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

    Authors: Xianjie Zhang, Jiahao Sun, Chen Gong, Kai Wang, Yifei Cao, Hao Chen, Hao Chen, Yu Liu

    Abstract: The emergence of on-demand ride pooling services allows each vehicle to serve multiple passengers at a time, thus increasing drivers' income and enabling passengers to travel at lower prices than taxi/car on-demand services (only one passenger can be assigned to a car at a time like UberX and Lyft). Although on-demand ride pooling services can bring so many benefits, ride pooling services need a w… ▽ More

    Submitted 7 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: Accepted by AAMAS 2024

  46. arXiv:2312.14398  [pdf, other

    cs.SD eess.AS

    ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

    Authors: Cheng Gong, Xin Wang, Erica Cooper, Dan Wells, Longbiao Wang, Jianwu Dang, Korin Richmond, Junichi Yamagishi

    Abstract: Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TTS systems are typically built using a single speaker's voices, but there is growing interest in developing systems that can synthesize voices for new… ▽ More

    Submitted 26 August, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by IEEE/ACM TASLP, 16 pages plus 1 page of bio and photos

  47. arXiv:2312.10758  [pdf, other

    cs.CV

    SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

    Authors: Xiaoqi An, Lin Zhao, Chen Gong, Nannan Wang, Di Wang, Jian Yang

    Abstract: High-resolution representation is essential for achieving good performance in human pose estimation models. To obtain such features, existing works utilize high-resolution input images or fine-grained image tokens. However, this dense high-resolution representation brings a significant computational burden. In this paper, we address the following question: "Only sparse human keypoint locations are… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  48. arXiv:2311.15200  [pdf, other

    cs.CV cs.LG

    SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

    Authors: Lei Wang, Yibing Zhan, Leilei Ma, Dapeng Tao, Liang Ding, Chen Gong

    Abstract: Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 13 pages, 10 figures

  49. arXiv:2311.12850  [pdf, other

    cs.CV cs.CR cs.LG

    PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

    Authors: Kecen Li, Chen Gong, Zhixiang Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang

    Abstract: Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data, but suffer from problems… ▽ More

    Submitted 7 October, 2024; v1 submitted 19 October, 2023; originally announced November 2023.

    Comments: Accepted at USENIX Security 2024. The first two authors contributed equally. We communicated with the author of DPSDA and have added an explanation for why the FID scores in the DPSDA table are lower than those reported in the original paper

  50. arXiv:2310.16391  [pdf, other

    cs.LG cs.CV

    Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization

    Authors: Zhuo Huang, Muyang Li, Li Shen, Jun Yu, Chen Gong, Bo Han, Tongliang Liu

    Abstract: Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features. Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task. However, in OOD problems, such solutions are suboptimal as the… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 27 pages, 9 figures

    MSC Class: Computer Vision and Pattern Recognition