Skip to main content

Showing 1–50 of 1,519 results for author: Wu, H

  1. arXiv:2410.15288  [pdf, other

    cs.CR

    Attention Is All You Need for LLM-based Code Vulnerability Localization

    Authors: Yue Li, Xiao Li, Hao Wu, Yue Zhang, Xiuzhen Cheng, Sheng Zhong, Fengyuan Xu

    Abstract: The rapid expansion of software systems and the growing number of reported vulnerabilities have emphasized the importance of accurately identifying vulnerable code segments. Traditional methods for vulnerability localization, such as manual code audits or rule-based tools, are often time-consuming and limited in scope, typically focusing on specific programming languages or types of vulnerabilitie… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  2. arXiv:2410.14993  [pdf, other

    cs.CV

    Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling

    Authors: Hao Wu, Donglin Bai, Shiqi Jiang, Qianxi Zhang, Yifan Yang, Ting Cao, Fengyuan Xu

    Abstract: Video understanding has become increasingly important with the rise of multi-modality applications. Understanding continuous video poses considerable challenges due to the fast expansion of streaming video, which contains multi-scale and untrimmed events. We introduce a novel system, C-VUE, to overcome these issues through adaptive state modeling. C-VUE has three key designs. The first is a long-r… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  3. arXiv:2410.14442  [pdf, other

    cs.CL

    A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference

    Authors: You Wu, Haoyi Wu, Kewei Tu

    Abstract: Recently, sharing key-value (KV) cache across layers has been found effective in efficient inference of large language models (LLMs). To systematically investigate different techniques of cross-layer KV sharing, we propose a unified framework that covers several recent methods and their novel variants. We conduct comprehensive experiments on all the configurations of the framework, evaluating thei… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  4. arXiv:2410.13798  [pdf, other

    cs.NE cs.AI cs.LG

    Learning Graph Quantized Tokenizers for Transformers

    Authors: Limei Wang, Kaveh Hassani, Si Zhang, Dongqi Fu, Baichuan Yuan, Weilin Cong, Zhigang Hua, Hao Wu, Ning Yao, Bo Long

    Abstract: Transformers serve as the backbone architectures of Foundational Models, where a domain-specific tokenizer helps them adapt to various domains. Graph Transformers (GTs) have recently emerged as a leading model in geometric deep learning, outperforming Graph Neural Networks (GNNs) in various graph learning tasks. However, the development of tokenizers for graphs has lagged behind other modalities,… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  5. arXiv:2410.13441  [pdf, other

    cs.AI cs.SE

    Instruction-Driven Game Engine: A Poker Case Study

    Authors: Hongqiu Wu, Xingyuan Liu, Yan Wang, Hai Zhao

    Abstract: The Instruction-Driven Game Engine (IDGE) project aims to democratize game development by enabling a large language model (LLM) to follow free-form game descriptions and generate game-play processes. The IDGE allows users to create games simply by natural language instructions, which significantly lowers the barrier for game development. We approach the learning process for IDGEs as a Next State P… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024 Demo. arXiv admin note: substantial text overlap with arXiv:2404.00276

  6. arXiv:2410.12425  [pdf, other

    cs.LG

    Perseus: Leveraging Common Data Patterns with Curriculum Learning for More Robust Graph Neural Networks

    Authors: Kaiwen Xia, Huijun Wu, Duanyu Li, Min Xie, Ruibo Wang, Wenzhe Zhang

    Abstract: Graph Neural Networks (GNNs) excel at handling graph data but remain vulnerable to adversarial attacks. Existing defense methods typically rely on assumptions like graph sparsity and homophily to either preprocess the graph or guide structure learning. However, preprocessing methods often struggle to accurately distinguish between normal edges and adversarial perturbations, leading to suboptimal r… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  7. arXiv:2410.12307  [pdf, other

    cs.LG cs.CV

    DAT: Improving Adversarial Robustness via Generative Amplitude Mix-up in Frequency Domain

    Authors: Fengpeng Li, Kemou Li, Haiwei Wu, Jinyu Tian, Jiantao Zhou

    Abstract: To protect deep neural networks (DNNs) from adversarial attacks, adversarial training (AT) is developed by incorporating adversarial examples (AEs) into model training. Recent studies show that adversarial attacks disproportionately impact the patterns within the phase of the sample's frequency spectrum -- typically containing crucial semantic information -- more than those in the amplitude, resul… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Journal ref: NeurIPS 2024

  8. arXiv:2410.12130  [pdf, other

    cs.CL cs.AI

    Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning

    Authors: Huiwen Wu, Xiaohan Li, Xiaogang Xu, Jiafei Wu, Deyi Zhang, Zhe Liu

    Abstract: The development of Large Language Models (LLMs) has significantly advanced various AI applications in commercial and scientific research fields, such as scientific literature summarization, writing assistance, and knowledge graph construction. However, a significant challenge is the high risk of hallucination during LLM inference, which can lead to security concerns like factual inaccuracies, inco… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  9. arXiv:2410.12122  [pdf, ps, other

    cs.IT math.NT

    Explicit Representatives and Sizes of Cyclotomic Cosets and their Application to Cyclic Codes over Finite Fields

    Authors: Li Zhu, Jinle Liu, Hongfeng Wu

    Abstract: Cyclotomic coset is a basic notion which has wide application in various computation problems. Let $q$ be a prime power, and $n$ be a positive integer coprime to $q$. In this paper we determine explicitly the representatives and the sizes of all $q$-cyclotomic cosets modulo $n$ in the general settings. Instead of the $q$-cyclotomic cosets modulo a fixed integer, we consider the profinite spaces of… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 30 pages

  10. arXiv:2410.11986  [pdf, ps, other

    cs.LG cs.DC

    Age-of-Gradient Updates for Federated Learning over Random Access Channels

    Authors: Yu Heng Wu, Houman Asgari, Stefano Rini, Andrea Munari

    Abstract: This paper studies the problem of federated training of a deep neural network (DNN) over a random access channel (RACH) such as in computer networks, wireless networks, and cellular systems. More precisely, a set of remote users participate in training a centralized DNN model using SGD under the coordination of a parameter server (PS). The local model updates are transmitted from the remote users… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  11. arXiv:2410.11766  [pdf, other

    cs.AR cs.AI cs.CV

    DPD-NeuralEngine: A 22-nm 6.6-TOPS/W/mm$^2$ Recurrent Neural Network Accelerator for Wideband Power Amplifier Digital Pre-Distortion

    Authors: Ang Li, Haolin Wu, Yizhuo Wu, Qinyu Chen, Leo C. N. de Vreede, Chang Gao

    Abstract: The increasing adoption of Deep Neural Network (DNN)-based Digital Pre-distortion (DPD) in modern communication systems necessitates efficient hardware implementations. This paper presents DPD-NeuralEngine, an ultra-fast, tiny-area, and power-efficient DPD accelerator based on a Gated Recurrent Unit (GRU) neural network (NN). Leveraging a co-designed software and hardware approach, our 22 nm CMOS… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 5 pages, 5 figures

  12. GS^3: Efficient Relighting with Triple Gaussian Splatting

    Authors: Zoubin Bi, Yixin Zeng, Chong Zeng, Fan Pei, Xiang Feng, Kun Zhou, Hongzhi Wu

    Abstract: We present a spatial and angular Gaussian based representation and a triple splatting process, for real-time, high-quality novel lighting-and-view synthesis from multi-view point-lit input images. To describe complex appearance, we employ a Lambertian plus a mixture of angular Gaussians as an effective reflectance function for each spatial Gaussian. To generate self-shadow, we splat all spatial Ga… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Accepted to SIGGRAPH Asia 2024. Project page: https://gsrelight.github.io/

    Journal ref: ACM SIGGRAPH Asia 2024 Conference Papers

  13. arXiv:2410.10775  [pdf, other

    cs.CR

    Browsing without Third-Party Cookies: What Do You See?

    Authors: Maxwell Lin, Shihan Lin, Helen Wu, Karen Wang, Xiaowei Yang

    Abstract: Third-party web cookies are often used for privacy-invasive behavior tracking. Partly due to privacy concerns, browser vendors have started to block all third-party cookies in recent years. To understand the effects of such third-party cookieless browsing, we crawled and measured the top 10,000 Tranco websites. We developed a framework to remove third-party cookies and analyze the differences betw… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: To appear in IMC '24

  14. arXiv:2410.10338  [pdf, other

    cs.NI cs.PF

    On Efficient Topology Management in Service-Oriented 6G Networks: An Edge Video Distribution Case Study

    Authors: Zied Ennaceur, Mounir Bensalem, Admela Jukan, Claus Keuker, Huanzhuo Wu, Rastin Pries

    Abstract: An efficient topology management in future 6G networks is one of the fundamental challenges for a dynamic network creation based on location services, whereby each autonomous network entity, i.e., a sub-network, can be created for a specific application scenario. In this paper, we study the performance of a novel topology changes management system in a sample 6G network being dynamically organized… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  15. arXiv:2410.10260  [pdf, other

    cs.CV

    Slide-based Graph Collaborative Training for Histopathology Whole Slide Image Analysis

    Authors: Jun Shi, Tong Shu, Zhiguo Jiang, Wei Wang, Haibo Wu, Yushan Zheng

    Abstract: The development of computational pathology lies in the consensus that pathological characteristics of tumors are significant guidance for cancer diagnostics. Most existing research focuses on the inner-contextual information within each WSI yet ignores the possible inter-correlations between slides. As the development of tumors is a continuous process involving a series of histological, morphologi… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  16. arXiv:2410.09738  [pdf

    cs.SE

    Can Large Language Models Generate Geospatial Code?

    Authors: Shuyang Hou, Zhangxiao Shen, Jianyuan Liang, Anqi Zhao, Zhipeng Gui, Rui Li, Huayi Wu

    Abstract: With the growing demand for spatiotemporal data processing and geospatial modeling, automating geospatial code generation has become essential for productivity. Large language models (LLMs) show promise in code generation but face challenges like domain-specific knowledge gaps and "coding hallucinations." This paper introduces GeoCode-Eval (GCE), a framework for assessing LLMs' ability to generate… ▽ More

    Submitted 17 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

  17. arXiv:2410.07901  [pdf, other

    cs.CV

    Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization

    Authors: Hongtao Wu, Yijun Yang, Angelica I Aviles-Rivero, Jingjing Ren, Sixiang Chen, Haoyu Chen, Lei Zhu

    Abstract: Snow degradations present formidable challenges to the advancement of computer vision tasks by the undesirable corruption in outdoor scenarios. While current deep learning-based desnowing approaches achieve success on synthetic benchmark datasets, they struggle to restore out-of-distribution real-world snowy videos due to the deficiency of paired real-world training data. To address this bottlenec… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  18. arXiv:2410.07592  [pdf, other

    cs.AI

    Diversified and Adaptive Negative Sampling on Knowledge Graphs

    Authors: Ran Liu, Zhongzhou Liu, Xiaoli Li, Hao Wu, Yuan Fang

    Abstract: In knowledge graph embedding, aside from positive triplets (ie: facts in the knowledge graph), the negative triplets used for training also have a direct influence on the model performance. In reality, since knowledge graphs are sparse and incomplete, negative triplets often lack explicit labels, and thus they are often obtained from various sampling strategies (eg: randomly replacing an entity in… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 30 pages, 7 figures, Journal

  19. arXiv:2410.07589  [pdf, other

    cs.IR cs.CL

    No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users

    Authors: Mengxuan Hu, Hongyi Wu, Zihan Guan, Ronghang Zhu, Dongliang Guo, Daiqing Qi, Sheng Li

    Abstract: Retrieval-Augmented Generation (RAG) is widely adopted for its effectiveness and cost-efficiency in mitigating hallucinations and enhancing the domain-specific generation capabilities of large language models (LLMs). However, is this effectiveness and cost-efficiency truly a free lunch? In this study, we comprehensively investigate the fairness costs associated with RAG by proposing a practical th… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  20. arXiv:2410.06158  [pdf, other

    cs.RO cs.CV cs.LG

    GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

    Authors: Chi-Lam Cheang, Guangzeng Chen, Ya Jing, Tao Kong, Hang Li, Yifeng Li, Yuxiao Liu, Hongtao Wu, Jiafeng Xu, Yichu Yang, Hanbo Zhang, Minzhao Zhu

    Abstract: We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture the dynamics of the world. This large-scale pre-training, involving 38 million video clips and over 50 billion tokens, equips GR-2 with the ability to generalize across a wide range of robotic tasks and environments… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Tech Report. Authors are listed in alphabetical order. Project page: https://gr2-manipulation.github.io

  21. arXiv:2410.06115  [pdf, other

    cs.IT eess.SP

    A physics-based perspective for understanding and utilizing spatial resources of wireless channels

    Authors: Hui Xu, Jun Wei Wu, Zhen Jie Qi, Hao Tian Wu, Rui Wen Shao, Qiang Cheng, Jieao Zhu, Linglong Dai, Tie Jun Cui

    Abstract: To satisfy the increasing demands for transmission rates of wireless communications, it is necessary to use spatial resources of electromagnetic (EM) waves. In this context, EM information theory (EIT) has become a hot topic by integrating the theoretical framework of deterministic mathematics and stochastic statistics to explore the transmission mechanisms of continuous EM waves. However, the pre… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 31pages, 8 figures

  22. arXiv:2410.05993  [pdf, other

    cs.CV

    Aria: An Open Multimodal Native Mixture-of-Experts Model

    Authors: Dongxu Li, Yudong Liu, Haoning Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Guoyin Wang, Bei Chen, Junnan Li

    Abstract: Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles for adoptions, let alone adaptations. To fill this gap, we introduce Aria, an open multimodal native model with best-in-class performance across a wi… ▽ More

    Submitted 10 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

  23. arXiv:2410.05863  [pdf, other

    cs.IR

    Enhancing Playback Performance in Video Recommender Systems with an On-Device Gating and Ranking Framework

    Authors: Yunfei Yang, Zhenghao Qi, Honghuan Wu, Qi Song, Tieyao Zhang, Hao Li, Yimin Tu, Kaiqiao Zhan, Ben Wang

    Abstract: Video recommender systems (RSs) have gained increasing attention in recent years. Existing mainstream RSs focus on optimizing the matching function between users and items. However, we noticed that users frequently encounter playback issues such as slow loading or stuttering while browsing the videos, especially in weak network conditions, which will lead to a subpar browsing experience, and may c… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: CIKM 2024 applied research track, 7 pages

  24. arXiv:2410.05474  [pdf, other

    cs.CV cs.MM eess.IV

    R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

    Authors: Chunyi Li, Jianbo Zhang, Zicheng Zhang, Haoning Wu, Yuan Tian, Wei Sun, Guo Lu, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: The outstanding performance of Large Multimodal Models (LMMs) has made them widely applied in vision-related tasks. However, various corruptions in the real world mean that images will not be as ideal as in simulations, presenting significant challenges for the practical application of LMMs. To address this issue, we introduce R-Bench, a benchmark focused on the **Real-world Robustness of LMMs**.… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  25. arXiv:2410.04636  [pdf, other

    eess.IV cs.AI cs.CV

    Multi-Tiered Self-Contrastive Learning for Medical Microwave Radiometry (MWR) Breast Cancer Detection

    Authors: Christoforos Galazis, Huiyi Wu, Igor Goryanin

    Abstract: The pursuit of enhanced breast cancer detection and monitoring techniques is a paramount healthcare objective, driving the need for innovative imaging technologies and diagnostic approaches. This study introduces a novel multi-tiered self-contrastive model tailored for the application of microwave radiometry (MWR) breast cancer detection. Our approach encompasses three distinct models: Local-MWR (… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  26. arXiv:2410.03806  [pdf, other

    cs.LG cs.CL

    Metadata Matters for Time Series: Informative Forecasting with Transformers

    Authors: Jiaxiang Dong, Haixu Wu, Yuxuan Wang, Li Zhang, Jianmin Wang, Mingsheng Long

    Abstract: Time series forecasting is prevalent in extensive real-world applications, such as financial analysis and energy planning. Previous studies primarily focus on time series modality, endeavoring to capture the intricate variations and dependencies inherent in time series. Beyond numerical time series data, we notice that metadata (e.g.~dataset and variate descriptions) also carries valuable informat… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  27. arXiv:2410.03777  [pdf, other

    cs.CL cs.AI

    Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling

    Authors: Yuxuan Yao, Han Wu, Mingyang Liu, Sichun Luo, Xiongwei Han, Jie Liu, Zhijiang Guo, Linqi Song

    Abstract: Large language models (LLMs) exhibit varying strengths and weaknesses across different tasks, prompting recent studies to explore the benefits of ensembling models to leverage their complementary advantages. However, existing LLM ensembling methods often overlook model compatibility and struggle with inefficient alignment of probabilities across the entire vocabulary. In this study, we empirically… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  28. arXiv:2410.02743  [pdf, other

    cs.CL

    MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

    Authors: Yekun Chai, Haoran Sun, Huang Fang, Shuohuan Wang, Yu Sun, Hua Wu

    Abstract: Reinforcement learning from human feedback (RLHF) has demonstrated effectiveness in aligning large language models (LLMs) with human preferences. However, token-level RLHF suffers from the credit assignment problem over long sequences, where delayed rewards make it challenging for the model to discern which actions contributed to successful outcomes. This hinders learning efficiency and slows conv… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  29. arXiv:2410.01610  [pdf, other

    cs.CL cs.AI cs.LG

    Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging

    Authors: Tingfeng Hui, Zhenyu Zhang, Shuohuan Wang, Yu Sun, Hua Wu, Sen Su

    Abstract: Mixture-of-Experts (MoE) shines brightly in large language models (LLMs) and demonstrates outstanding performance in plentiful natural language processing tasks. However, existing methods transforming LLMs from dense to MoE face significant data requirements and typically rely on large-scale post-training. In this paper, we propose Upcycling Instruction Tuning (UpIT), a data-efficient approach for… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: work in progress

  30. arXiv:2410.01240  [pdf

    cs.CL cs.HC

    Automatic deductive coding in discourse analysis: an application of large language models in learning analytics

    Authors: Lishan Zhang, Han Wu, Xiaoshan Huang, Tengfei Duan, Hanxiang Du

    Abstract: Deductive coding is a common discourse analysis method widely used by learning science and learning analytics researchers for understanding teaching and learning interactions. It often requires researchers to manually label all discourses to be analyzed according to a theoretically guided coding scheme, which is time-consuming and labor-intensive. The emergence of large language models such as GPT… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: 20 pages

  31. arXiv:2410.00428  [pdf, other

    cs.DC cs.AI cs.LG

    LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management

    Authors: Yi Xiong, Hao Wu, Changxu Shao, Ziqing Wang, Rui Zhang, Yuhong Guo, Junping Zhao, Ke Zhang, Zhenxuan Pan

    Abstract: The expanding context windows in large language models (LLMs) have greatly enhanced their capabilities in various applications, but they also introduce significant challenges in maintaining low latency, particularly in Time to First Token (TTFT). This paper identifies that the sharp rise in TTFT as context length increases is predominantly driven by queuing delays, which are caused by the growing… ▽ More

    Submitted 9 October, 2024; v1 submitted 1 October, 2024; originally announced October 2024.

    Comments: 11 pages, 7 figures, 1 table

    ACM Class: I.2.11; C.4

  32. arXiv:2409.20310  [pdf, other

    cs.LG

    A SSM is Polymerized from Multivariate Time Series

    Authors: Haixiang Wu

    Abstract: For multivariate time series (MTS) tasks, previous state space models (SSMs) followed the modeling paradigm of Transformer-based methods. However, none of them explicitly model the complex dependencies of MTS: the Channel Dependency variations with Time (CDT). In view of this, we delve into the derivation of SSM, which involves approximating continuously updated functions by orthogonal function ba… ▽ More

    Submitted 30 September, 2024; v1 submitted 30 September, 2024; originally announced September 2024.

  33. arXiv:2409.20063  [pdf, other

    cs.CV

    Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs

    Authors: Zicheng Zhang, Ziheng Jia, Haoning Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: With the rising interest in research on Large Multi-modal Models (LMMs) for video understanding, many studies have emphasized general video comprehension capabilities, neglecting the systematic exploration into video quality understanding. To address this oversight, we introduce Q-Bench-Video in this paper, a new benchmark specifically designed to evaluate LMMs' proficiency in discerning video qua… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  34. arXiv:2409.19804  [pdf, other

    cs.CL

    Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems

    Authors: Xuyang Wu, Shuowei Li, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

    Abstract: RAG (Retrieval-Augmented Generation) have recently gained significant attention for their enhanced ability to integrate external knowledge sources in open-domain question answering (QA) tasks. However, it remains unclear how these models address fairness concerns, particularly with respect to sensitive attributes such as gender, geographic location, and other demographic factors. First, as languag… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: Under review

  35. arXiv:2409.19691  [pdf, other

    cs.CL

    CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays

    Authors: Nuowei Liu, Xinhao Chen, Hongyi Wu, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaopeng Bai, Shaoguang Mao, Yan Xia

    Abstract: Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks. In this paper, we propose the Chinese Essay Rhetoric Dataset (CERD), consisting of 4 commonly used coarse-grained categories including… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  36. arXiv:2409.19674  [pdf, other

    cs.IT math.NA

    Alternating Maximization Algorithm for Mismatch Capacity with Oblivious Relaying

    Authors: Xinwei Li, Lingyi Chen, Shitong Wu, Huihui Wu, Hao Wu, Wenyi Zhang

    Abstract: Reliable communication over a discrete memoryless channel with the help of a relay has aroused interest due to its widespread applications in practical scenarios. By considering the system with a mismatched decoder, previous works have provided optimization models to evaluate the mismatch capacity in these scenarios. The proposed models, however, are difficult due to the complicated structure of t… ▽ More

    Submitted 15 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

  37. arXiv:2409.19608  [pdf, other

    cs.CV

    Causal Deciphering and Inpainting in Spatio-Temporal Dynamics via Diffusion Model

    Authors: Yifan Duan, Jian Zhao, pengcheng, Junyuan Mao, Hao Wu, Jingyu Xu, shilong wang, Caoyuan Ma, Kai Wang, Kun Wang, Xuelong Li

    Abstract: Spatio-temporal (ST) prediction has garnered a De facto attention in earth sciences, such as meteorological prediction, human mobility perception. However, the scarcity of data coupled with the high expenses involved in sensor deployment results in notable data imbalances. Furthermore, models that are excessively customized and devoid of causal connections further undermine the generalizability an… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  38. arXiv:2409.19592  [pdf, other

    cs.CV cs.LG cs.MA

    DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model

    Authors: Ruiqing Mao, Haotian Wu, Yukuan Jia, Zhaojun Nan, Yuxuan Sun, Sheng Zhou, Deniz Gündüz, Zhisheng Niu

    Abstract: Collaborative perception (CP) is emerging as a promising solution to the inherent limitations of stand-alone intelligence. However, current wireless communication systems are unable to support feature-level and raw-level collaborative algorithms due to their enormous bandwidth demands. In this paper, we propose DiffCP, a novel CP paradigm that utilizes a specialized diffusion model to efficiently… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: 7 pages, 4 figures

  39. arXiv:2409.17525  [pdf

    q-bio.NC cs.CL

    When A Man Says He Is Pregnant: ERP Evidence for A Rational Account of Speaker-contextualized Language Comprehension

    Authors: Hanlin Wu, Zhenguang G. Cai

    Abstract: Spoken language is often, if not always, understood in a context that includes the identities of speakers. For instance, we can easily make sense of an utterance such as "I'm going to have a manicure this weekend" or "The first time I got pregnant I had a hard time" when the utterance is spoken by a woman, but it would be harder to understand when it is spoken by a man. Previous event-related pote… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  40. arXiv:2409.16904  [pdf, other

    cs.LG cs.AI

    Discriminative Anchor Learning for Efficient Multi-view Clustering

    Authors: Yalan Qin, Nan Pu, Hanzhou Wu, Nicu Sebe

    Abstract: Multi-view clustering aims to study the complementary information across views and discover the underlying structure. For solving the relatively high computational cost for the existing approaches, works based on anchor have been presented recently. Even with acceptable clustering performance, these methods tend to map the original representation from multiple views into a fixed shared graph based… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: This work has been accepted by TMM

  41. arXiv:2409.16784  [pdf, other

    cs.RO cs.LG

    World Model-based Perception for Visual Legged Locomotion

    Authors: Hang Lai, Jiahang Cao, Jiafeng Xu, Hongtao Wu, Yunfeng Lin, Tao Kong, Yong Yu, Weinan Zhang

    Abstract: Legged locomotion over various terrains is challenging and requires precise perception of the robot and its surroundings from both proprioception and vision. However, learning directly from high-dimensional visual input is often data-inefficient and intricate. To address this issue, traditional methods attempt to learn a teacher policy with access to privileged information first and then learn a s… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: under review

  42. arXiv:2409.16295  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget

    Authors: Andy T. Liu, Yi-Cheng Lin, Haibin Wu, Stefan Winkler, Hung-yi Lee

    Abstract: Despite their impressive success, training foundation models remains computationally costly. This paper investigates how to efficiently train speech foundation models with self-supervised learning (SSL) under a limited compute budget. We examine critical factors in SSL that impact the budget, including model architecture, model size, and data size. Our goal is to make analytical steps toward under… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: To appear in SLT 2024

  43. arXiv:2409.15897  [pdf, ps, other

    eess.AS cs.SD

    ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

    Authors: Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharhi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe

    Abstract: Neural codecs have become crucial to recent speech and audio generation research. In addition to signal compression capabilities, discrete codecs have also been found to enhance downstream training efficiency and compatibility with autoregressive language models. However, as extensive downstream applications are investigated, challenges have arisen in ensuring fair comparisons across diverse appli… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT

  44. arXiv:2409.15781  [pdf, other

    cs.CV

    Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?

    Authors: Likun Zhang, Hao Wu, Lingcui Zhang, Fengyuan Xu, Jin Cao, Fenghua Li, Ben Niu

    Abstract: The emergence of text-to-image models has recently sparked significant interest, but the attendant is a looming shadow of potential infringement by violating the user terms. Specifically, an adversary may exploit data created by a commercial model to train their own without proper authorization. To address such risk, it is crucial to investigate the attribution of a suspicious model's training dat… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  45. arXiv:2409.15259  [pdf, other

    cs.CV cs.AI

    S$^2$AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance

    Authors: Yuanhang Li, Qi Mao, Lan Chen, Zhen Fang, Lei Tian, Xinyan Xiao, Libiao Jin, Hua Wu

    Abstract: Recent advancements in text-to-video (T2V) generation using diffusion models have garnered significant attention. However, existing T2V models primarily focus on simple scenes featuring a single object performing a single motion. Challenges arise in scenarios involving multiple objects with distinct motions, often leading to incorrect video-text alignment between subjects and their corresponding m… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  46. arXiv:2409.14836  [pdf, other

    cs.CL cs.AI cs.LG

    Orthogonal Finetuning for Direct Preference Optimization

    Authors: Chenxu Yang, Ruipeng Jia, Naibin Gu, Zheng Lin, Siyuan Chen, Chao Pang, Weichong Yin, Yu Sun, Hua Wu, Weiping Wang

    Abstract: DPO is an effective preference optimization algorithm. However, the DPO-tuned models tend to overfit on the dispreferred samples, manifested as overly long generations lacking diversity. While recent regularization approaches have endeavored to alleviate this issue by modifying the objective function, they achieved that at the cost of alignment performance degradation. In this paper, we innovative… ▽ More

    Submitted 23 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  47. arXiv:2409.14264  [pdf, ps, other

    math.NT cs.CR cs.IT

    The Differential and Boomerang Properties of a Class of Binomials

    Authors: Sihem Mesnager, Huawei Wu

    Abstract: Let $q$ be an odd prime power with $q\equiv 3\ ({\rm{mod}}\ 4)$. In this paper, we study the differential and boomerang properties of the function $F_{2,u}(x)=x^2\big(1+uη(x)\big)$ over $\mathbb{F}_{q}$, where $u\in\mathbb{F}_{q}^*$ and $η$ is the quadratic character of $\mathbb{F}_{q}$. We determine the differential uniformity of $F_{2,u}$ for any $u\in\mathbb{F}_{q}^*$ and determine the differen… ▽ More

    Submitted 25 September, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

  48. arXiv:2409.14085  [pdf, other

    eess.AS cs.SD

    Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

    Authors: Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kaiwei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, Hung-yi Lee

    Abstract: Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content, paralinguistics, speaker characteristics, and audio information even at low bitrates. Recently, numerous advanced neural codec models have been proposed. However, codec mo… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

  49. arXiv:2409.13621  [pdf, other

    cs.CL cs.AI cs.IR

    Advancing Event Causality Identification via Heuristic Semantic Dependency Inquiry Network

    Authors: Haoran Li, Qiang Gao, Hongmei Wu, Li Huang

    Abstract: Event Causality Identification (ECI) focuses on extracting causal relations between events in texts. Existing methods for ECI primarily rely on causal features and external knowledge. However, these approaches fall short in two dimensions: (1) causal features between events in a text often lack explicit clues, and (2) external knowledge may introduce bias, while specific problems require tailored… ▽ More

    Submitted 2 October, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 camera-ready version. Code is released at https://github.com/hrlics/SemDI

  50. arXiv:2409.13321  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation

    Authors: Jinge Wu, Yunsoo Kim, Daqian Shi, David Cliffton, Fenglin Liu, Honghan Wu

    Abstract: Inspired by the success of large language models (LLMs), there is growing research interest in developing LLMs in the medical domain to assist clinicians. However, for hospitals, using closed-source commercial LLMs involves privacy issues, and developing open-source public LLMs requires large-scale computational resources, which are usually limited, especially in resource-efficient regions and low… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.