Skip to main content

Showing 1–50 of 705 results for author: Yang, Q

  1. arXiv:2410.13298  [pdf, other

    cs.CL cs.AI

    Advancing Large Language Model Attribution through Self-Improving

    Authors: Lei Huang, Xiaocheng Feng, Weitao Ma, Liang Zhao, Yuchun Fan, Weihong Zhong, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin

    Abstract: Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Accepted by EMNLP 2024 Main Conference

  2. arXiv:2410.13259  [pdf, other

    cs.CL

    From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition

    Authors: Qiyuan Yang, Pengda Wang, Luke D. Plonsky, Frederick L. Oswald, Hanjie Chen

    Abstract: We examine the language capabilities of language models (LMs) from the critical perspective of human language acquisition. Building on classical language development theories, we propose a three-stage framework to assess the abilities of LMs, ranging from preliminary word understanding to complex grammar and complex logical reasoning. Using this framework, we evaluate the generative capacities of… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  3. arXiv:2410.12938  [pdf, other

    cs.LG physics.ao-ph

    Multi-modal graph neural networks for localized off-grid weather forecasting

    Authors: Qidong Yang, Jonathan Giezendanner, Daniel Salles Civitarese, Johannes Jakubik, Eric Schmitt, Anirban Chandra, Jeremy Vila, Detlef Hohl, Chris Hill, Campbell Watson, Sherrie Wang

    Abstract: Urgent applications like wildfire management and renewable energy generation require precise, localized weather forecasts near the Earth's surface. However, weather forecast products from machine learning or numerical weather models are currently generated on a global regular grid, on which a naive interpolation cannot accurately reflect fine-grained weather patterns close to the ground. In this w… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  4. arXiv:2410.12530  [pdf, other

    cs.DC cs.LG

    Disentangling data distribution for Federated Learning

    Authors: Xinyuan Zhao, Hanlin Gu, Lixin Fan, Qiang Yang, Yuxing Han

    Abstract: Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients, without compromising data privacy. Yet the wide applicability of FL is hindered by entanglement of data distributions across different clients. This paper demonstrates for the first time that by disentangling data distributions FL can in principle… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  5. arXiv:2410.10696  [pdf, other

    cs.CV cs.GR

    TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model

    Authors: Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Shengyi He, Zhiliang Xu, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie, Youjian Zhao, Ziwei Liu

    Abstract: Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by recent advances in diffusion models, we propose the M… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: Accepted to SIGGRAPH Asia 2024 (conference track). Project page: https://guanjz20.github.io/projects/TALK-Act

  6. arXiv:2410.10481  [pdf, other

    cs.LG cs.AI cs.CR

    Model-Based Differentially Private Knowledge Transfer for Large Language Models

    Authors: Zhaomin Wu, Jizhou Guo, Junyi Hou, Bingsheng He, Lixin Fan, Qiang Yang

    Abstract: As large language models (LLMs) become increasingly prevalent in web services, effectively leveraging domain-specific knowledge while ensuring privacy has become critical. Existing methods, such as retrieval-augmented generation (RAG) and differentially private data synthesis, often compromise either the utility of domain knowledge or the privacy of sensitive data, limiting their applicability in… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  7. arXiv:2410.08889  [pdf, other

    cs.CV

    Exploiting Memory-aware Q-distribution Prediction for Nuclear Fusion via Modern Hopfield Network

    Authors: Qingchuan Ma, Shiao Wang, Tong Zheng, Xiaodong Dai, Yifeng Wang, Qingquan Yang, Xiao Wang

    Abstract: This study addresses the critical challenge of predicting the Q-distribution in long-term stable nuclear fusion task, a key component for advancing clean energy solutions. We introduce an innovative deep learning framework that employs Modern Hopfield Networks to incorporate associative memory from historical shots. Utilizing a newly compiled dataset, we demonstrate the effectiveness of our approa… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  8. arXiv:2410.08879  [pdf, other

    cs.CV

    Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion

    Authors: Shiao Wang, Yifeng Wang, Qingchuan Ma, Xiao Wang, Ning Yan, Qingquan Yang, Guosheng Xu, Jin Tang

    Abstract: Q-distribution prediction is a crucial research direction in controlled nuclear fusion, with deep learning emerging as a key approach to solving prediction challenges. In this paper, we leverage deep learning techniques to tackle the complexities of Q-distribution prediction. Specifically, we explore multimodal fusion methods in computer vision, integrating 2D line image data with the original 1D… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  9. arXiv:2410.08739  [pdf, other

    cs.CV eess.SY

    MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation

    Authors: Qihang Yang, Yang Zhao, Hong Cheng

    Abstract: Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion e… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  10. arXiv:2410.06490  [pdf, other

    cs.LG cs.AI

    FedL2G: Learning to Guide Local Training in Heterogeneous Federated Learning

    Authors: Jianqing Zhang, Yang Liu, Yang Hua, Jian Cao, Qiang Yang

    Abstract: Data and model heterogeneity are two core issues in Heterogeneous Federated Learning (HtFL). In scenarios with heterogeneous model architectures, aggregating model parameters becomes infeasible, leading to the use of prototypes (i.e., class representative feature vectors) for aggregation and guidance. However, they still experience a mismatch between the extra guiding objective and the client's or… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  11. arXiv:2410.04785  [pdf, other

    eess.AS cs.SD

    Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet

    Authors: Xiang Hao, Chenxiang Ma, Qu Yang, Jibin Wu, Kay Chen Tan

    Abstract: Speech enhancement is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved speech enhancement performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultra-low-power speech… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: under review

  12. arXiv:2410.04087  [pdf, other

    cs.CL cs.AI

    GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization

    Authors: Yangfan Ye, Xiachong Feng, Xiaocheng Feng, Weitao Ma, Libo Qin, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin

    Abstract: News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024 main conference, long paper

  13. arXiv:2410.01490  [pdf, other

    cs.CL

    Extending Context Window of Large Language Models from a Distributional Perspective

    Authors: Yingsheng Wu, Yuxuan Gu, Xiaocheng Feng, Weihong Zhong, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin

    Abstract: Scaling the rotary position embedding (RoPE) has become a common method for extending the context window of RoPE-based large language models (LLMs). However, existing scaling methods often rely on empirical approaches and lack a profound understanding of the internal distribution within RoPE, resulting in suboptimal performance in extending the context window length. In this paper, we propose to o… ▽ More

    Submitted 3 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: 14 pages, 8 figures, Accepted to EMNLP2024

  14. The Future of HCI-Policy Collaboration

    Authors: Qian Yang, Richmond Y Wong, Steven J Jackson, Sabine Junginger, Margaret D Hagan, Thomas Gilbert, John Zimmerman

    Abstract: Policies significantly shape computation's societal impact, a crucial HCI concern. However, challenges persist when HCI professionals attempt to integrate policy into their work or affect policy outcomes. Prior research considered these challenges at the ``border'' of HCI and policy. This paper asks: What if HCI considers policy integral to its intellectual concerns, placing system-people-policy i… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24)

  15. arXiv:2409.19622  [pdf, other

    cs.CR

    Programming on Bitcoin: A Survey of Layer 1 and Layer 2 Technologies in Bitcoin Ecosystem

    Authors: Guofu Liao, Taotao Wang, Qing Yang, Yihan Xia, Long Shi, Xiang Zhao, Xiaoxiao Wu, Shengli Zhang, Anthony Chan, Richard Yuen

    Abstract: This paper surveys innovative protocols that enhance the programming functionality of the Bitcoin blockchain, a key part of the "Bitcoin Ecosystem." Bitcoin utilizes the Unspent Transaction Output (UTXO) model and a stack-based script language for efficient peer-to-peer payments, but it faces limitations in programming capability and throughput. The 2021 Taproot upgrade introduced the Schnorr sign… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  16. arXiv:2409.17509  [pdf, other

    cs.CR

    BioZero: An Efficient and Privacy-Preserving Decentralized Biometric Authentication Protocol on Open Blockchain

    Authors: Junhao Lai, Taotao Wang, Shengli Zhang, Qing Yang, Soung Chang Liew

    Abstract: Digital identity plays a vital role in enabling secure access to resources and services in the digital world. Traditional identity authentication methods, such as password-based and biometric authentications, have limitations in terms of security, privacy, and scalability. Decentralized authentication approaches leveraging blockchain technology have emerged as a promising solution. However, existi… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: 14 pages, 3 figures

  17. arXiv:2409.16312  [pdf, other

    q-bio.QM cs.AI eess.SP

    SEE: Semantically Aligned EEG-to-Text Translation

    Authors: Yitian Tao, Yan Liang, Luoyu Wang, Yongqing Li, Qing Yang, Han Zhang

    Abstract: Decoding neurophysiological signals into language is of great research interest within brain-computer interface (BCI) applications. Electroencephalography (EEG), known for its non-invasiveness, ease of use, and cost-effectiveness, has been a popular method in this field. However, current EEG-to-Text decoding approaches face challenges due to the huge domain gap between EEG recordings and raw texts… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: 4 pages

  18. arXiv:2409.14385  [pdf, other

    cs.CV

    Prior Knowledge Distillation Network for Face Super-Resolution

    Authors: Qiu Yang, Xiao Sun, Xin-yu Li, Feng-Qi Cui, Yu-Tong Guo, Shuang-Zhen Hu, Ping Luo, Si-Ying Li

    Abstract: The purpose of face super-resolution (FSR) is to reconstruct high-resolution (HR) face images from low-resolution (LR) inputs. With the continuous advancement of deep learning technologies, contemporary prior-guided FSR methods initially estimate facial priors and then use this information to assist in the super-resolution reconstruction process. However, ensuring the accuracy of prior estimation… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

  19. arXiv:2409.13199  [pdf, other

    cs.CL

    CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information

    Authors: Yuxin Wang, Minghua Ma, Zekun Wang, Jingchang Chen, Huiming Fan, Liping Shan, Qing Yang, Dongliang Xu, Ming Liu, Bing Qin

    Abstract: The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical sp… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Work in progress

  20. arXiv:2409.13191  [pdf

    cs.CL cs.AI cs.CE cs.LG

    An adapted large language model facilitates multiple medical tasks in diabetes care

    Authors: Lai Wei, Zhen Ying, Muyang He, Yutong Chen, Qian Yang, Yanzhe Hong, Jiaping Lu, Xiaoying Li, Weiran Huang, Ying Chen

    Abstract: Diabetes is a chronic disease that poses a significant global health burden, and optimizing diabetes management requires multi-stakeholder collaboration. Large language models (LLMs) have shown promise in various healthcare scenarios, but their effectiveness across a diverse range of diabetes tasks remains unproven. In this study, we introduced a framework to train and validate diabetes-specific L… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  21. arXiv:2409.11764  [pdf, other

    cs.RO cs.AI

    One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation

    Authors: Finn Lukas Busch, Timon Homberger, Jesús Ortega-Peimbert, Quantao Yang, Olov Andersson

    Abstract: The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown f… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  22. arXiv:2409.06135  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

    Authors: Qi Yang, Binjie Mao, Zili Wang, Xing Nie, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

    Abstract: Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a particular type of automatic foley task, presents inherent challenges related to audio-visual synchronization. These challenges encompass maintaining the content consistency between the input video and the generated a… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 14 pages, 11 figures

  23. arXiv:2409.05871  [pdf, other

    cs.RO cs.LG

    Multi-feature Compensatory Motion Analysis for Reaching Motions Over a Discretely Sampled Workspace

    Authors: Qihan Yang, Yuri Gloumakov, Adam J. Spiers

    Abstract: The absence of functional arm joints, such as the wrist, in upper extremity prostheses leads to compensatory motions in the users' daily activities. Compensatory motions have been previously studied for varying task protocols and evaluation metrics. However, the movement targets' spatial locations in previous protocols were not standardised and incomparable between studies, and the evaluation metr… ▽ More

    Submitted 23 August, 2024; originally announced September 2024.

    Comments: 7 pages, 12 figures. Accepted by IEEE RAS EMBS 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob 2024)

  24. arXiv:2409.03319  [pdf, other

    cs.ET

    Semantic Communication for Efficient Point Cloud Transmission

    Authors: Shangzhuo Xie, Qianqian Yang, Yuyi Sun, Tianxiao Han, Zhaohui Yang, Zhiguo Shi

    Abstract: As three-dimensional acquisition technologies like LiDAR cameras advance, the need for efficient transmission of 3D point clouds is becoming increasingly important. In this paper, we present a novel semantic communication (SemCom) approach for efficient 3D point cloud transmission. Different from existing methods that rely on downsampling and feature extraction for compression, our approach utiliz… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  25. arXiv:2409.02702  [pdf, other

    cs.SI cs.AI

    Incorporating Like-Minded Peers to Overcome Friend Data Sparsity in Session-Based Social Recommendations

    Authors: Chunyan An, Yunhan Li, Qiang Yang, Winston K. G. Seah, Zhixu Li, Conghao Yang

    Abstract: Session-based Social Recommendation (SSR) leverages social relationships within online networks to enhance the performance of Session-based Recommendation (SR). However, existing SSR algorithms often encounter the challenge of "friend data sparsity". Moreover, significant discrepancies can exist between the purchase preferences of social network friends and those of the target user, reducing the i… ▽ More

    Submitted 6 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: None

  26. Multi-Sources Fusion Learning for Multi-Points NLOS Localization in OFDM System

    Authors: Bohao Wang, Zitao Shuai, Chongwen Huang, Qianqian Yang, Zhaohui Yang, Richeng Jin, Ahmed Al Hammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Accurate localization of mobile terminals is a pivotal aspect of integrated sensing and communication systems. Traditional fingerprint-based localization methods, which infer coordinates from channel information within pre-set rectangular areas, often face challenges due to the heterogeneous distribution of fingerprints inherent in non-line-of-sight (NLOS) scenarios, particularly within orthogonal… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 12 pages, 14 figures, accepted by IEEE Journal of Selected Topics in Signal Processing (JSTSP). arXiv admin note: substantial text overlap with arXiv:2401.12538

  27. arXiv:2409.01128  [pdf, other

    cs.LG cs.CV

    Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning

    Authors: Jinglin Liang, Jin Zhong, Hanlin Gu, Zhongqi Lu, Xingxing Tang, Gang Dai, Shuangping Huang, Lixin Fan, Qiang Yang

    Abstract: Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experien… ▽ More

    Submitted 3 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024 Oral

  28. arXiv:2408.16532  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.SP

    WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

    Authors: Shengpeng Ji, Ziyue Jiang, Xize Cheng, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao

    Abstract: Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domai… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Working in progress. arXiv admin note: text overlap with arXiv:2402.12208

  29. arXiv:2408.15428  [pdf, other

    cs.CV

    HEAD: A Bandwidth-Efficient Cooperative Perception Approach for Heterogeneous Connected and Autonomous Vehicles

    Authors: Deyuan Qu, Qi Chen, Yongqi Zhu, Yihao Zhu, Sergei S. Avedisov, Song Fu, Qing Yang

    Abstract: In cooperative perception studies, there is often a trade-off between communication bandwidth and perception performance. While current feature fusion solutions are known for their excellent object detection performance, transmitting the entire sets of intermediate feature maps requires substantial bandwidth. Furthermore, these fusion approaches are typically limited to vehicles that use identical… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024 Workshop

  30. arXiv:2408.14518  [pdf, other

    cs.RO cs.LG

    A Survey on Reinforcement Learning Applications in SLAM

    Authors: Mohammad Dehghani Tezerjani, Mohammad Khoshnazar, Mohammadhamed Tangestanizadeh, Qing Yang

    Abstract: The emergence of mobile robotics, particularly in the automotive industry, introduces a promising era of enriched user experiences and adept handling of complex navigation challenges. The realization of these advancements necessitates a focused technological effort and the successful execution of numerous intricate tasks, particularly in the critical domain of Simultaneous Localization and Mapping… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  31. arXiv:2408.12672  [pdf

    cs.DC cs.CV

    Research on Improved U-net Based Remote Sensing Image Segmentation Algorithm

    Authors: Qiming Yang, Zixin Wang, Shinan Liu, Zizheng Li

    Abstract: In recent years, although U-Net network has made significant progress in the field of image segmentation, it still faces performance bottlenecks in remote sensing image segmentation. In this paper, we innovatively propose to introduce SimAM and CBAM attention mechanism in U-Net, and the experimental results show that after adding SimAM and CBAM modules alone, the model improves 17.41% and 12.23% i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  32. arXiv:2408.11446  [pdf, other

    cs.ET

    Green Probabilistic Semantic Communication over Wireless Networks

    Authors: Ruopeng Xu, Zhaohui Yang, Yijie Mao, Chongwen Huang, Qianqian Yang, Lexi Xu, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduce… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  33. arXiv:2408.10714  [pdf, other

    cs.NE

    Physics-Driven AI Correction in Laser Absorption Sensing Quantification

    Authors: Ruiyuan Kang, Panos Liatsis, Meixia Geng, Qingjie Yang

    Abstract: Laser absorption spectroscopy (LAS) quantification is a popular tool used in measuring temperature and concentration of gases. It has low error tolerance, whereas current ML-based solutions cannot guarantee their measure reliability. In this work, we propose a new framework, SPEC, to address this issue. In addition to the conventional ML estimator-based estimation mode, SPEC also includes a Physic… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 13 pages

    MSC Class: 68T05 ACM Class: I.2.1

  34. arXiv:2408.10046  [pdf, other

    cs.LG cs.CV

    Exploiting Fine-Grained Prototype Distribution for Boosting Unsupervised Class Incremental Learning

    Authors: Jiaming Liu, Hongyuan Liu, Zhili Qin, Wei Han, Yulu Fan, Qinli Yang, Junming Shao

    Abstract: The dynamic nature of open-world scenarios has attracted more attention to class incremental learning (CIL). However, existing CIL methods typically presume the availability of complete ground-truth labels throughout the training process, an assumption rarely met in practical applications. Consequently, this paper explores a more challenging problem of unsupervised class incremental learning (UCIL… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  35. arXiv:2408.09768  [pdf, other

    cs.AI

    MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal Malfunctions

    Authors: Qinchen Yang, Zejun Xie, Hua Wei, Desheng Zhang, Yu Yang

    Abstract: Urban traffic is subject to disruptions that cause extended waiting time and safety issues at signalized intersections. While numerous studies have addressed the issue of intelligent traffic systems in the context of various disturbances, traffic signal malfunction, a common real-world occurrence with significant repercussions, has received comparatively limited attention. The primary objective of… ▽ More

    Submitted 12 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: Paper accepted to CIKM24 Full Research track

  36. arXiv:2408.09530  [pdf, other

    cs.AI

    PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding

    Authors: Dawei Dai, Yuanhui Zhang, Long Xu, Qianlan Yang, Xiaojing Shen, Shuyin Xia, Guoyin Wang

    Abstract: The previous advancements in pathology image understanding primarily involved developing models tailored to specific tasks. Recent studies has demonstrated that the large vision-language model can enhance the performance of various downstream tasks in medical image understanding. In this study, we developed a domain-specific large language-vision assistant (PA-LLaVA) for pathology image understand… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figs

  37. arXiv:2408.08696  [pdf, other

    cs.CL cs.LG

    Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

    Authors: Xianzhen Luo, Yixuan Wang, Qingfu Zhu, Zhiming Zhang, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: The rapid growth in the parameters of large language models (LLMs) has made inference latency a fundamental bottleneck, limiting broader application of LLMs. Speculative decoding represents a lossless approach to accelerate inference through a guess-and-verify paradigm, leveraging the parallel capabilities of modern hardware. Some speculative decoding methods rely on additional structures to guess… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: under review

  38. arXiv:2408.08527  [pdf, other

    cs.CV cs.AI

    Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading

    Authors: Li Pan, Yupei Zhang, Qiushi Yang, Tan Li, Xiaohan Xing, Maximus C. F. Yeung, Zhen Chen

    Abstract: Recently, multimodal deep learning, which integrates histopathology slides and molecular biomarkers, has achieved a promising performance in glioma grading. Despite great progress, due to the intra-modality complexity and inter-modality heterogeneity, existing studies suffer from inadequate histopathology representation learning and inefficient molecular-pathology knowledge alignment. These two is… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  39. arXiv:2408.07500  [pdf, other

    cs.CV

    Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

    Authors: Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

    Abstract: In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) L… ▽ More

    Submitted 2 September, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Published at ECCV 2024

  40. arXiv:2408.04777  [pdf

    eess.IV cs.CV

    Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets

    Authors: Hao Li, Han Liu, Heinrich von Busch, Robert Grimm, Henkjan Huisman, Angela Tong, David Winkel, Tobias Penzkofer, Ivan Shabunin, Moon Hyung Choi, Qingsong Yang, Dieter Szolar, Steven Shea, Fergus Coakley, Mukesh Harisinghani, Ipek Oguz, Dorin Comaniciu, Ali Kamen, Bin Lou

    Abstract: Our hypothesis is that UDA using diffusion-weighted images, generated with a unified model, offers a promising and reliable strategy for enhancing the performance of supervised learning models in multi-site prostate lesion detection, especially when various b-values are present. This retrospective study included data from 5,150 patients (14,191 samples) collected across nine different imaging cent… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accept at Radiology: Artificial Intelligence. Journal reference and external DOI will be added once published

    Journal ref: Radiology: Artificial Intelligence 2024;6(5):e230521

  41. arXiv:2408.04499  [pdf, other

    cs.LG

    Knowledge-Aided Semantic Communication Leveraging Probabilistic Graphical Modeling

    Authors: Haowen Wan, Qianqian Yang, Jiancheng Tang, Zhiguo shi

    Abstract: In this paper, we propose a semantic communication approach based on probabilistic graphical model (PGM). The proposed approach involves constructing a PGM from a training dataset, which is then shared as common knowledge between the transmitter and receiver. We evaluate the importance of various semantic features and present a PGM-based compression algorithm designed to eliminate predictable port… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  42. arXiv:2408.04168  [pdf, other

    cs.AI

    Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions

    Authors: Qingbin Zeng, Qinglong Yang, Shunan Dong, Heming Du, Liang Zheng, Fengli Xu, Yong Li

    Abstract: This paper considers a scenario in city navigation: an AI agent is provided with language descriptions of the goal location with respect to some well-known landmarks; By only observing the scene around, including recognizing landmarks and road network connections, the agent has to make decisions to navigate to the goal location without instructions. This problem is very challenging, because it req… ▽ More

    Submitted 17 October, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  43. AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

    Authors: Mingcan Xiang, Steven Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu

    Abstract: In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 13 pages, 9 figures, Published at ACM Multimedia (ACM MM) 2024

  44. arXiv:2408.02907  [pdf, other

    cs.CL

    Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

    Authors: Tiezheng Guo, Chen Wang, Yanyi Liu, Jiawei Tang, Pan Li, Sai Xu, Qingwen Yang, Xianlin Gao, Zhi Li, Yingyou Wen

    Abstract: Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we pr… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  45. arXiv:2408.01791  [pdf

    cs.NI

    Implementing NAT Hole Punching with QUIC

    Authors: Jinyu Liang, Wei Xu, Taotao Wang, Qing Yang, Shengli Zhang

    Abstract: The widespread adoption of Network Address Translation (NAT) technology has led to a significant number of network end nodes being located in private networks behind NAT devices, impeding direct communication between these nodes. To solve this problem, a technique known as "hole punching" has been devised for NAT traversal to facilitate peer-to-peer communication among end nodes located in distinc… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for oral presentation at the VTC2024-Fall Conference

  46. arXiv:2408.01708  [pdf, other

    cs.CV

    AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation

    Authors: Zili Wang, Qi Yang, Linsu Shi, Jiazhong Yu, Qinghua Liang, Fei Li, Shiming Xiang

    Abstract: Recently, transformer-based models have demonstrated remarkable performance on audio-visual segmentation (AVS) tasks. However, their expensive computational cost makes real-time inference impractical. By characterizing attention maps of the network, we identify two key obstacles in AVS models: 1) attention dissipation, corresponding to the over-concentrated attention weights by Softmax within rest… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  47. arXiv:2408.00381  [pdf, other

    cs.IT eess.SY

    Statistical AoI Guarantee Optimization for Supporting xURLLC in ISAC-enabled V2I Networks

    Authors: Yanxi Zhang, Mingwu Yao, Qinghai Yang, Dongqi Yan, Xu Zhang, Xu Bao, Muyu Mei

    Abstract: This paper addresses the critical challenge of supporting next-generation ultra-reliable and low-latency communication (xURLLC) within integrated sensing and communication (ISAC)-enabled vehicle-to-infrastructure (V2I) networks. We incorporate channel evaluation and retransmission mechanisms for real-time reliability enhancement. Using stochastic network calculus (SNC), we establish a theoretical… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  48. arXiv:2407.16341  [pdf, other

    cs.CV

    Motion Capture from Inertial and Vision Sensors

    Authors: Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei

    Abstract: Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 17 pages,9 figures

  49. arXiv:2407.15488  [pdf, other

    cs.CV

    DiffX: Guide Your Layout to Cross-Modal Generative Modeling

    Authors: Zeyu Wang, Jingyu Lin, Yifei Qian, Yi Huang, Shicen Tian, Bosong Chai, Juncan Deng, Qu Yang, Lan Du, Cunjian Chen, Kejie Huang

    Abstract: Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse viewpoints, such as chromatic contrast, thermal illumination, and depth information. In this paper, we introduce a novel diffusion model for general layout-guid… ▽ More

    Submitted 20 October, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  50. arXiv:2407.15435  [pdf, other

    cs.CV

    Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures

    Authors: Ruizhe Wang, Chunliang Hua, Tomakayev Shingys, Mengyuan Niu, Qingxin Yang, Lizhong Gao, Yi Zheng, Junyan Yang, Qiao Wang

    Abstract: The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream t… ▽ More

    Submitted 25 September, 2024; v1 submitted 22 July, 2024; originally announced July 2024.