Skip to main content

Showing 1–50 of 261 results for author: Sun, P

  1. arXiv:2410.10676  [pdf, other

    cs.SD cs.CV eess.AS

    Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

    Authors: Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo

    Abstract: Recently, diffusion models have achieved great success in mono-channel audio generation. However, when it comes to stereo audio generation, the soundscapes often have a complex scene of multiple objects and directions. Controlling stereo audio with spatial contexts remains challenging due to high data costs and unstable generative models. To the best of our knowledge, this work represents the firs… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  2. arXiv:2410.10291  [pdf, other

    cs.CL cs.AI cs.MM

    Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

    Authors: Xiangru Zhu, Penglei Sun, Yaoxian Song, Yanghua Xiao, Zhixu Li, Chengyu Wang, Jun Huang, Bei Yang, Xiaoxiao Xu

    Abstract: Accurate interpretation and visualization of human instructions are crucial for text-to-image (T2I) synthesis. However, current models struggle to capture semantic variations from word order changes, and existing evaluations, relying on indirect metrics like text-image similarity, fail to reliably assess these challenges. This often obscures poor performance on complex or uncommon linguistic patte… ▽ More

    Submitted 18 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

    Comments: The only change in the current version update is the replacement of the template with a more precise one

  3. arXiv:2410.09347  [pdf, other

    cs.CV cs.LG eess.IV

    Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment

    Authors: Huayu Chen, Hang Su, Peize Sun, Jun Zhu

    Abstract: Classifier-Free Guidance (CFG) is a critical technique for enhancing the sample quality of visual generative models. However, in autoregressive (AR) multi-modal generation, CFG introduces design inconsistencies between language and visual content, contradicting the design philosophy of unifying different modalities for visual AR. Motivated by language model alignment methods, we propose \textit{Co… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  4. arXiv:2410.05759  [pdf, other

    cs.NE

    3D UAV Trajectory Planning for IoT Data Collection via Matrix-Based Evolutionary Computation

    Authors: Pei-Fa Sun, Yujae Song, Kang-Yu Gao, Yu-Kai Wang, Changjun Zhou, Sang-Woon Jeon, Jun Zhang

    Abstract: UAVs are increasingly becoming vital tools in various wireless communication applications including internet of things (IoT) and sensor networks, thanks to their rapid and agile non-terrestrial mobility. Despite recent research, planning three-dimensional (3D) UAV trajectories over a continuous temporal-spatial domain remains challenging due to the need to solve computationally intensive optimizat… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  5. arXiv:2410.02705  [pdf, other

    cs.CV

    ControlAR: Controllable Image Generation with Autoregressive Models

    Authors: Zongming Li, Tianheng Cheng, Shoufa Chen, Peize Sun, Haocheng Shen, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

    Abstract: Autoregressive (AR) models have reformulated image generation as next-token prediction, demonstrating remarkable potential and emerging as strong competitors to diffusion models. However, control-to-image generation, akin to ControlNet, remains largely unexplored within AR models. Although a natural approach, inspired by advancements in Large Language Models, is to tokenize control images into tok… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: Preprint. Work in progress

  6. arXiv:2409.20034  [pdf, other

    cs.CV

    Camera Calibration using a Collimator System

    Authors: Shunkun Liang, Banglei Guan, Zhenbao Yu, Pengju Sun, Yang Shang

    Abstract: Camera calibration is a crucial step in photogrammetry and 3D vision applications. In practical scenarios with a long working distance to cover a wide area, target-based calibration methods become complicated and inflexible due to site limitations. This paper introduces a novel camera calibration method using a collimator system, which can provide a reliable and controllable calibration environmen… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV2024 (oral presentation)

  7. arXiv:2409.18084  [pdf, other

    cs.RO cs.AI

    GSON: A Group-based Social Navigation Framework with Large Multimodal Model

    Authors: Shangyi Luo, Ji Zhu, Peng Sun, Yuhong Deng, Cunjun Yu, Anxing Xiao, Xueqian Wang

    Abstract: As the number of service robots and autonomous vehicles in human-centered environments grows, their requirements go beyond simply navigating to a destination. They must also take into account dynamic social contexts and ensure respect and comfort for others in shared spaces, which poses significant challenges for perception and planning. In this paper, we present a group-based social navigation fr… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  8. arXiv:2409.03200  [pdf, other

    cs.CV

    Active Fake: DeepFake Camouflage

    Authors: Pu Sun, Honggang Qi, Yuezun Li

    Abstract: DeepFake technology has gained significant attention due to its ability to manipulate facial attributes with high realism, raising serious societal concerns. Face-Swap DeepFake is the most harmful among these techniques, which fabricates behaviors by swapping original faces with synthesized ones. Existing forensic methods, primarily based on Deep Neural Networks (DNNs), effectively expose these ma… ▽ More

    Submitted 16 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  9. arXiv:2408.17175  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

    Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for audio tokenization. However, these codecs were or… ▽ More

    Submitted 19 September, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

  10. arXiv:2408.10006  [pdf, other

    cs.LG

    Unlocking the Power of LSTM for Long Term Time Series Forecasting

    Authors: Yaxuan Kong, Zepu Wang, Yuqi Nie, Tian Zhou, Stefan Zohren, Yuxuan Liang, Peng Sun, Qingsong Wen

    Abstract: Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  11. arXiv:2408.09790  [pdf, other

    cs.LG

    Structure-enhanced Contrastive Learning for Graph Clustering

    Authors: Xunlian Wu, Jingqi Hu, Anqi Zhang, Yining Quan, Qiguang Miao, Peng Gang Sun

    Abstract: Graph clustering is a crucial task in network analysis with widespread applications, focusing on partitioning nodes into distinct groups with stronger intra-group connections than inter-group ones. Recently, contrastive learning has achieved significant progress in graph clustering. However, most methods suffer from the following issues: 1) an over-reliance on meticulously designed data augmentati… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  12. arXiv:2408.07385  [pdf, other

    cs.IT eess.SP

    Iterative Equalization of CPM With Unitary Approximate Message Passing

    Authors: Zilong Liu, Yi Song, Qinghua Guo, Peng Sun, Kexian Gong, Zhongyong Wang

    Abstract: Continuous phase modulation (CPM) has extensive applications in wireless communications due to its high spectral and power efficiency. However, its nonlinear characteristics pose significant challenges for detection in frequency selective fading channels. This paper proposes an iterative receiver tailored for the detection of CPM signals over frequency selective fading channels. This design levera… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  13. arXiv:2408.06400  [pdf, other

    physics.ao-ph cs.LG

    MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model

    Authors: Haoyu Qin, Yungang Chen, Qianchuan Jiang, Pengchao Sun, Xiancai Ye, Chao Lin

    Abstract: Deep Learning based Weather Prediction (DLWP) models have been improving rapidly over the last few years, surpassing state of the art numerical weather forecasts by significant margins. While much of the optimization effort is focused on training curriculum to extend forecast range in the global context, two aspects remains less explored: limited area modeling and better backbones for weather fore… ▽ More

    Submitted 14 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Typo and grammar; Minor elaboration and clarifications; Use full organization name in the author section

  14. arXiv:2408.04193  [pdf, other

    cs.LG cs.AI

    Uncertainty-Aware Crime Prediction With Spatial Temporal Multivariate Graph Neural Networks

    Authors: Zepu Wang, Xiaobo Ma, Huajie Yang, Weimin Lvu, Peng Sun, Sharath Chandra Guntuku

    Abstract: Crime forecasting is a critical component of urban analysis and essential for stabilizing society today. Unlike other time series forecasting problems, crime incidents are sparse, particularly in small regions and within specific time periods. Traditional spatial-temporal deep learning models often struggle with this sparsity, as they typically cannot effectively handle the non-Gaussian nature of… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  15. arXiv:2407.20018  [pdf, other

    cs.DC

    Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

    Authors: Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun

    Abstract: Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with their sophisticated capabilities. Training these models requires vast GPU clusters and significant computing time, posing major challenges in terms of scalability, efficiency, and reliability. This survey explores recent advancements in training systems for LLMs, including innovations in training infrastructur… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  16. arXiv:2407.17398  [pdf, other

    cs.CV

    3D Question Answering for City Scene Understanding

    Authors: Penglei Sun, Yaoxian Song, Xiang Liu, Xiaofei Yang, Qiang Wang, Tiefeng Li, Yang Yang, Xiaowen Chu

    Abstract: 3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. While existing research has primarily focused on indoor household tasks and outdoor roadside autonomous driving tasks, there has been limited exploration of city-level scene understanding tasks. Furthermore, existing research faces c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  17. arXiv:2407.16638  [pdf, other

    cs.CV

    Unveiling and Mitigating Bias in Audio Visual Segmentation

    Authors: Peiwen Sun, Honggang Zhang, Di Hu

    Abstract: Community researchers have developed a range of advanced audio-visual segmentation models aimed at improving the quality of sounding objects' masks. While masks created by these models may initially appear plausible, they occasionally exhibit anomalies with incorrect grounding logic. We attribute this to real-world inherent preferences and distributions as a simpler signal for learning than the co… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 24 (ORAL)

  18. arXiv:2407.14106  [pdf, other

    cs.DC cs.AI cs.LG

    TorchGT: A Holistic System for Large-scale Graph Transformer Training

    Authors: Meng Zhang, Jie Sun, Qinghao Hu, Peng Sun, Zeke Wang, Yonggang Wen, Tianwei Zhang

    Abstract: Graph Transformer is a new architecture that surpasses GNNs in graph learning. While there emerge inspiring algorithm advancements, their practical adoption is still limited, particularly on real-world graphs involving up to millions of nodes. We observe existing graph transformers fail on large-scale graphs mainly due to heavy computation, limited scalability and inferior model quality. Motivated… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024

  19. arXiv:2407.11820  [pdf, other

    cs.CV cs.AI

    Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

    Authors: Juncheng Ma, Peiwen Sun, Yaoting Wang, Di Hu

    Abstract: Audio-Visual Segmentation (AVS) aims to achieve pixel-level localization of sound sources in videos, while Audio-Visual Semantic Segmentation (AVSS), as an extension of AVS, further pursues semantic understanding of audio-visual scenes. However, since the AVSS task requires the establishment of audio-visual correspondence and semantic understanding simultaneously, we observe that previous methods… ▽ More

    Submitted 12 September, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024 poster. Project url: https://gewu-lab.github.io/stepping_stones

  20. arXiv:2407.10957  [pdf, other

    cs.CV cs.AI

    Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

    Authors: Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu

    Abstract: Traditional reference segmentation tasks have predominantly focused on silent visual scenes, neglecting the integral role of multimodal perception and interaction in human experiences. In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues. Such expressions… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  21. arXiv:2407.10947  [pdf, other

    cs.CV

    Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

    Authors: Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu

    Abstract: The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual space using audio cues. However, in this work, it is recognized that previous AVS methods show a heavy reliance on detrimental segmentation preferences related to audible objects, rather than precise audio guidance. We argue that the primary reason is that audio lacks robust semantics compared to vision, especi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  22. arXiv:2407.07577  [pdf, other

    cs.CV cs.AI

    IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

    Authors: Yatai Ji, Shilong Zhang, Jie Wu, Peize Sun, Weifeng Chen, Xuefeng Xiao, Sidi Yang, Yujiu Yang, Ping Luo

    Abstract: The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and i… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  23. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  24. arXiv:2407.02846  [pdf, other

    cs.CV

    Multi-Task Domain Adaptation for Language Grounding with 3D Objects

    Authors: Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu

    Abstract: The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a… ▽ More

    Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  25. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin Jin, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  26. arXiv:2406.06525  [pdf, other

    cs.CV

    Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

    Authors: Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan

    Abstract: We introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Codes and models: \url{https://github.com/FoundationVision/LlamaGen}

  27. arXiv:2406.03248  [pdf, other

    cs.IR cs.CL

    Large Language Models as Evaluators for Recommendation Explanations

    Authors: Xiaoyu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie Sun, Min Zhang

    Abstract: The explainability of recommender systems has attracted significant attention in academia and industry. Many efforts have been made for explainable recommendations, yet evaluating the quality of the explanations remains a challenging and unresolved issue. In recent years, leveraging LLMs as evaluators presents a promising avenue in Natural Language Processing tasks (e.g., sentiment classification,… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  28. arXiv:2405.20718  [pdf, other

    cs.IR cs.AI

    Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias

    Authors: Miaomiao Cai, Lei Chen, Yifan Wang, Haoyue Bai, Peijie Sun, Le Wu, Min Zhang, Meng Wang

    Abstract: Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias,… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  29. arXiv:2405.18058  [pdf, other

    cs.IR

    ReChorus2.0: A Modular and Task-Flexible Recommendation Library

    Authors: Jiayu Li, Hanyu Li, Zhiyu He, Weizhi Ma, Peijie Sun, Min Zhang, Shaoping Ma

    Abstract: With the applications of recommendation systems rapidly expanding, an increasing number of studies have focused on every aspect of recommender systems with different data inputs, models, and task settings. Therefore, a flexible library is needed to help researchers implement the experimental strategies they require. Existing open libraries for recommendation scenarios have enabled reproducing vari… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures. Under review

  30. arXiv:2405.14736  [pdf, other

    cs.CV cs.LG

    GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

    Authors: Xinyi Shang, Peng Sun, Tao Lin

    Abstract: Recent advancements in dataset distillation have demonstrated the significant benefits of employing soft labels generated by pre-trained teacher models. In this paper, we introduce a novel perspective by emphasizing the full utilization of labels. We first conduct a comprehensive comparison of various loss functions for soft label utilization in dataset distillation, revealing that the model train… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: https://github.com/LINs-lab/GIFT

  31. arXiv:2405.14669  [pdf, other

    cs.LG cs.AI

    Efficiency for Free: Ideal Data Are Transportable Representations

    Authors: Peng Sun, Yi Jiang, Tao Lin

    Abstract: Data, the seminal opportunity and challenge in modern machine learning, currently constrains the scalability of representation learning and impedes the pace of model evolution. Existing paradigms tackle the issue of learning efficiency over massive datasets from the perspective of self-supervised learning and dataset distillation independently, while neglecting the untapped potential of accelerati… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/LINs-lab/ReLA

  32. arXiv:2405.11272  [pdf, other

    cs.IR cs.AI

    Double Correction Framework for Denoising Recommendation

    Authors: Zhuangzhuang He, Yifan Wang, Yonghui Yang, Peijie Sun, Le Wu, Haoyue Bai, Jinqi Gong, Richang Hong, Min Zhang

    Abstract: As its availability and generality in online services, implicit feedback is more commonly used in recommender systems. However, implicit feedback usually presents noisy samples in real-world recommendation scenarios (such as misclicks or non-preferential behaviors), which will affect precise user preference learning. To overcome the noisy samples problem, a popular solution is based on dropping no… ▽ More

    Submitted 27 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  33. arXiv:2405.10347  [pdf, other

    cs.CV cs.AI cs.CY

    Networking Systems for Video Anomaly Detection: A Tutorial and Survey

    Authors: Jing Liu, Yang Liu, Jieyu Lin, Jielin Li, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C. M. Leung

    Abstract: The increasing prevalence of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD has… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Submitted to ACM Computing Surveys, under review,for more information and supplementary material, please see https://github.com/fdjingliu/NSVAD

  34. arXiv:2405.09276  [pdf, other

    cs.LG cs.AI cs.DC

    Dual-Segment Clustering Strategy for Federated Learning in Heterogeneous Environments

    Authors: Pengcheng Sun, Erwu Liu, Wei Ni, Kanglei Yu, Rui Wang, Abbas Jamalipour

    Abstract: Federated learning (FL) is a distributed machine learning paradigm with high efficiency and low communication load, only transmitting parameters or gradients of network. However, the non-independent and identically distributed (Non-IID) data characteristic has a negative impact on this paradigm. Furthermore, the heterogeneity of communication quality will significantly affect the accuracy of param… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  35. arXiv:2405.02811  [pdf, other

    cs.CV

    PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection

    Authors: Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan

    Abstract: 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D det… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  36. arXiv:2405.02357  [pdf, other

    cs.LG

    Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks

    Authors: Zijian Zhang, Yujie Sun, Zepu Wang, Yuqi Nie, Xiaobo Ma, Peng Sun, Ruolin Li

    Abstract: Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban plan… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 9 pages

  37. arXiv:2405.02320  [pdf, other

    cs.IT cs.AI

    A SER-based Device Selection Mechanism in Multi-bits Quantization Federated Learning

    Authors: Pengcheng Sun, Erwu Liu, Rui Wang

    Abstract: The quality of wireless communication will directly affect the performance of federated learning (FL), so this paper analyze the influence of wireless communication on FL through symbol error rate (SER). In FL system, non-orthogonal multiple access (NOMA) can be used as the basic communication framework to reduce the communication congestion and interference caused by multiple users, which takes a… ▽ More

    Submitted 20 April, 2024; originally announced May 2024.

  38. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  39. arXiv:2404.14700  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    FlashSpeech: Efficient Zero-Shot Speech Synthesis

    Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Efficient zero-shot speech synthesis

  40. arXiv:2404.13940  [pdf, other

    cs.CL

    A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models

    Authors: Jiayin Wang, Fengran Mo, Weizhi Ma, Peijie Sun, Min Zhang, Jian-Yun Nie

    Abstract: Large language models (LLMs) are essential tools that users employ across various scenarios, so evaluating their performance and guiding users in selecting the suitable service is important. Although many benchmarks exist, they mainly focus on specific predefined model abilities, such as world knowledge, reasoning, etc. Based on these ability scores, it is hard for users to determine which LLM bes… ▽ More

    Submitted 20 September, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  41. arXiv:2404.11903  [pdf, other

    cs.CV

    Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition

    Authors: Xunsong Li, Pengzhan Sun, Yangcen Liu, Lixin Duan, Wen Li

    Abstract: The interactions between human and objects are important for recognizing object-centric actions. Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to an action recognition model for extracting video features and learning the object relations for action recognition. However, since the action prior is unknown… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE Transactions on Multimedia

  42. arXiv:2404.11051  [pdf

    cs.CV

    WPS-Dataset: A benchmark for wood plate segmentation in bark removal processing

    Authors: Rijun Wang, Guanghao Zhang, Fulong Liang, Bo Wang, Xiangwei Mou, Yesheng Chen, Peng Sun, Canjin Wang

    Abstract: Using deep learning methods is a promising approach to improving bark removal efficiency and enhancing the quality of wood products. However, the lack of publicly available datasets for wood plate segmentation in bark removal processing poses challenges for researchers in this field. To address this issue, a benchmark for wood plate segmentation in bark removal processing named WPS-dataset is prop… ▽ More

    Submitted 25 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Report number: b06d7e0b-306f-476a-a72d-59a8793ac232 | v.1.2

  43. arXiv:2404.09526  [pdf, other

    cs.DC cs.LG

    LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

    Authors: Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin

    Abstract: The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same request. Restricted by static parallelism strategies, existing LLM serving systems cannot efficiently utilize the underlying resources to serve variable-length requests in different phases. To address this… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  44. Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty

    Authors: Peijie Sun, Yifan Wang, Min Zhang, Chuhan Wu, Yan Fang, Hong Zhu, Yuan Fang, Meng Wang

    Abstract: With the surge in mobile gaming, accurately predicting user spending on newly downloaded games has become paramount for maximizing revenue. However, the inherently unpredictable nature of user behavior poses significant challenges in this endeavor. To address this, we propose a robust model training and evaluation framework aimed at standardizing spending data to mitigate label variance and extrem… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages,6 figures, WWW 2024 Industry Track, with three accept, two weak accept scores

  45. arXiv:2404.05403  [pdf, other

    cs.CR cs.AI

    SoK: Gradient Leakage in Federated Learning

    Authors: Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren

    Abstract: Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual effic… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  46. arXiv:2404.01008  [pdf, other

    cs.IR

    EEG-SVRec: An EEG Dataset with User Multidimensional Affective Engagement Labels in Short Video Recommendation

    Authors: Shaorun Zhang, Zhiyu He, Ziyi Ye, Peijie Sun, Qingyao Ai, Min Zhang, Yiqun Liu

    Abstract: In recent years, short video platforms have gained widespread popularity, making the quality of video recommendations crucial for retaining users. Existing recommendation systems primarily rely on behavioral data, which faces limitations when inferring user preferences due to issues such as data sparsity and noise from accidental interactions or personal habits. To address these challenges and pro… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  47. arXiv:2404.00774  [pdf, other

    cs.LG

    SOAR: Improved Indexing for Approximate Nearest Neighbor Search

    Authors: Philip Sun, David Simcha, Dave Dopson, Ruiqi Guo, Sanjiv Kumar

    Abstract: This paper introduces SOAR: Spilling with Orthogonality-Amplified Residuals, a novel data indexing technique for approximate nearest neighbor (ANN) search. SOAR extends upon previous approaches to ANN search, such as spill trees, that utilize multiple redundant representations while partitioning the data to reduce the probability of missing a nearest neighbor during search. Rather than training an… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: Advances in Neural Information Processing Systems 36 (2023) 3189-3204

  48. arXiv:2403.20296  [pdf, other

    cs.IR

    Aiming at the Target: Filter Collaborative Information for Cross-Domain Recommendation

    Authors: Hanyu Li, Weizhi Ma, Peijie Sun, Jiayu Li, Cunxiang Yin, Yancheng He, Guoqiang Xu, Min Zhang, Shaoping Ma

    Abstract: Cross-domain recommender (CDR) systems aim to enhance the performance of the target domain by utilizing data from other related domains. However, irrelevant information from the source domain may instead degrade target domain performance, which is known as the negative transfer problem. There have been some attempts to address this problem, mostly by designing adaptive representations for overlapp… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR 2024

  49. arXiv:2403.18348  [pdf, other

    cs.IR

    Sequential Recommendation with Latent Relations based on Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Qingyao Ai, Yiqun Liu, Mingchen Cai, Min Zhang

    Abstract: Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeli… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR 2024

  50. arXiv:2403.18325  [pdf, other

    cs.IR

    Common Sense Enhanced Knowledge-based Recommendation with Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Min Zhang, Qingyao Ai, Yiqun Liu, Mingchen Cai

    Abstract: Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the for… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by DASFAA 2024