Skip to main content

Showing 1–50 of 597 results for author: Yan, S

  1. arXiv:2410.15682  [pdf, other

    cs.CV cs.RO

    RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration

    Authors: Pengcheng Shi, Shaocheng Yan, Yilin Xiao, Xinyi Liu, Yongjun Zhang, Jiayuan Li

    Abstract: Correspondence-based point cloud registration (PCR) plays a key role in robotics and computer vision. However, challenges like sensor noises, object occlusions, and descriptor limitations inevitably result in numerous outliers. RANSAC family is the most popular outlier removal solution. However, the requisite iterations escalate exponentially with the outlier ratio, rendering it far inferior to ex… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 8 pages, 8 figures

  2. arXiv:2410.15038  [pdf, other

    cs.CV cs.AI

    A General-Purpose Multimodal Foundation Model for Dermatology

    Authors: Siyuan Yan, Zhen Yu, Clare Primiero, Cristina Vico-Alonso, Zhonghua Wang, Litao Yang, Philipp Tschandl, Ming Hu, Gin Tan, Vincent Tang, Aik Beng Ng, David Powell, Paul Bonnington, Simon See, Monika Janda, Victoria Mar, Harald Kittler, H. Peter Soyer, Zongyuan Ge

    Abstract: Diagnosing and treating skin diseases require advanced visual skills across multiple domains and the ability to synthesize information from various imaging modalities. Current deep learning models, while effective at specific tasks such as diagnosing skin cancer from dermoscopic images, fall short in addressing the complex, multimodal demands of clinical practice. Here, we introduce PanDerm, a mul… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 56 pages; Technical report

  3. arXiv:2410.14400  [pdf, other

    cs.CV

    Variable Aperture Bokeh Rendering via Customized Focal Plane Guidance

    Authors: Kang Chen, Shijun Yan, Aiwen Jiang, Han Li, Zhifeng Wang

    Abstract: Bokeh rendering is one of the most popular techniques in photography. It can make photographs visually appealing, forcing users to focus their attentions on particular area of image. However, achieving satisfactory bokeh effect usually presents significant challenge, since mobile cameras with restricted optical systems are constrained, while expensive high-end DSLR lens with large aperture should… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  4. arXiv:2410.13613  [pdf, other

    cs.CV cs.GR

    MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes

    Authors: Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, Jun Zhang

    Abstract: 4D Gaussian Splatting (4DGS) has recently emerged as a promising technique for capturing complex dynamic 3D scenes with high fidelity. It utilizes a 4D Gaussian representation and a GPU-friendly rasterizer, enabling rapid rendering speeds. Despite its advantages, 4DGS faces significant challenges, notably the requirement of millions of 4D Gaussians, each with extensive associated attributes, leadi… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  5. arXiv:2410.12269  [pdf, other

    cs.CV

    LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment

    Authors: Juelin Zhu, Shen Yan, Long Wang, Shengyue Zhang, Yu Liu, Maojun Zhang

    Abstract: We propose a new method named LoD-Loc for visual localization in the air. Unlike existing localization algorithms, LoD-Loc does not rely on complex 3D representations and can estimate the pose of an Unmanned Aerial Vehicle (UAV) using a Level-of-Detail (LoD) 3D map. LoD-Loc mainly achieves this goal by aligning the wireframe derived from the LoD projected model with that predicted by the neural ne… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024; for Project page, see https://victorzoo.github.io/LoD-Loc.github.io/

  6. arXiv:2410.11842  [pdf, other

    cs.CV cs.AI cs.LG

    MoH: Multi-Head Attention as Mixture-of-Head Attention

    Authors: Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan

    Abstract: In this work, we upgrade the multi-head attention mechanism, the core of the Transformer model, to improve efficiency while maintaining or surpassing the previous accuracy level. We show that multi-head attention can be expressed in the summation form. Drawing on the insight that not all attention heads hold equal significance, we propose Mixture-of-Head attention (MoH), a new architecture that tr… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 23 pages, code: https://github.com/SkyworkAI/MoH

  7. arXiv:2410.11402  [pdf, other

    cs.RO

    M2Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes

    Authors: Sixu Yan, Zeyu Zhang, Muzhi Han, Zaijin Wang, Qi Xie, Zhitian Li, Zhehan Li, Hangxin Liu, Xinggang Wang, Song-Chun Zhu

    Abstract: Recent advances in diffusion models have opened new avenues for research into embodied AI agents and robotics. Despite significant achievements in complex robotic locomotion and skills, mobile manipulation-a capability that requires the coordination of navigation and manipulation-remains a challenge for generative AI techniques. This is primarily due to the high-dimensional action space, extended… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  8. arXiv:2410.11181  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

    Authors: Sheng Yan, Cunhang fan, Hongyu Zhang, Xiaoke Yang, Jianhua Tao, Zhao Lv

    Abstract: At a cocktail party, humans exhibit an impressive ability to direct their attention. The auditory attention detection (AAD) approach seeks to identify the attended speaker by analyzing brain signals, such as EEG signals. However, current AAD algorithms overlook the spatial distribution information within EEG signals and lack the ability to capture long-range latent dependencies, limiting the model… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  9. arXiv:2410.09008  [pdf, other

    cs.CL

    SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

    Authors: Ling Yang, Zhaochen Yu, Tianjun Zhang, Minkai Xu, Joseph E. Gonzalez, Bin Cui, Shuicheng Yan

    Abstract: Large language models (LLMs) like GPT-4, PaLM, and LLaMA have shown significant improvements in various reasoning tasks. However, smaller models such as Llama-3-8B and DeepSeekMath-Base still struggle with complex mathematical reasoning because they fail to effectively identify and correct reasoning errors. Recent reflection-based methods aim to address these issues by enabling self-reflection and… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: Project: https://github.com/YangLing0818/SuperCorrect-llm

  10. arXiv:2410.08261  [pdf, other

    cs.CV

    Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

    Authors: Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan

    Abstract: Diffusion models, such as Stable Diffusion, have made significant strides in visual generation, yet their paradigm remains fundamentally different from autoregressive language models, complicating the development of unified language-vision models. Recent efforts like LlamaGen have attempted autoregressive image generation using discrete VQVAE tokens, but the large number of tokens involved renders… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  11. arXiv:2410.08190  [pdf, other

    cs.CV cs.CR cs.GR cs.LG

    Poison-splat: Computation Cost Attack on 3D Gaussian Splatting

    Authors: Jiahao Lu, Yifan Zhang, Qiuhong Shen, Xinchao Wang, Shuicheng Yan

    Abstract: 3D Gaussian splatting (3DGS), known for its groundbreaking performance and efficiency, has become a dominant 3D representation and brought progress to many 3D vision tasks. However, in this work, we reveal a significant security vulnerability that has been largely overlooked in 3DGS: the computation cost of training 3DGS could be maliciously tampered by poisoning the input data. By developing an a… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Our code is available at https://github.com/jiahaolu97/poison-splat

  12. arXiv:2410.07348  [pdf, other

    cs.LG cs.AI

    MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

    Authors: Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan

    Abstract: In this work, we aim to simultaneously enhance the effectiveness and efficiency of Mixture-of-Experts (MoE) methods. To achieve this, we propose MoE++, a general and heterogeneous MoE framework that integrates both Feed-Forward Network~(FFN) and zero-computation experts. Specifically, we introduce three types of zero-computation experts: the zero expert, copy expert, and constant expert, which cor… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 23 pages, Code: https://github.com/SkyworkAI/MoE-plus-plus

  13. arXiv:2410.06678  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes

    Authors: Zeyu Zhang, Sixu Yan, Muzhi Han, Zaijin Wang, Xinggang Wang, Song-Chun Zhu, Hangxin Liu

    Abstract: We propose M^3Bench, a new benchmark of whole-body motion generation for mobile manipulation tasks. Given a 3D scene context, M^3Bench requires an embodied agent to understand its configuration, environmental constraints and task objectives, then generate coordinated whole-body motion trajectories for object rearrangement tasks. M^3Bench features 30k object rearrangement tasks across 119 diverse s… ▽ More

    Submitted 14 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Code and data set will be released after acceptance

  14. arXiv:2410.02010  [pdf, other

    eess.IV cs.CV

    MONICA: Benchmarking on Long-tailed Medical Image Classification

    Authors: Lie Ju, Siyuan Yan, Yukun Zhou, Yang Nan, Xiaodan Xing, Peibo Duan, Zongyuan Ge

    Abstract: Long-tailed learning is considered to be an extremely challenging problem in data imbalance learning. It aims to train well-generalized models from a large number of images that follow a long-tailed class distribution. In the medical field, many diagnostic imaging exams such as dermoscopy and chest radiography yield a long-tailed distribution of complex clinical findings. Recently, long-tailed lea… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  15. arXiv:2409.20441  [pdf, other

    cs.CL

    Instance-adaptive Zero-shot Chain-of-Thought Prompting

    Authors: Xiaosong Yuan, Chen Shen, Shaotian Yan, Xiaofeng Zhang, Liang Xie, Wenxiao Wang, Renchu Guan, Ying Wang, Jieping Ye

    Abstract: Zero-shot Chain-of-Thought (CoT) prompting emerges as a simple and effective strategy for enhancing the performance of large language models (LLMs) in real-world reasoning tasks. Nonetheless, the efficacy of a singular, task-level prompt uniformly applied across the whole of instances is inherently limited since one prompt cannot be a good partner for all, a more appropriate approach should consid… ▽ More

    Submitted 1 October, 2024; v1 submitted 30 September, 2024; originally announced September 2024.

    Comments: 13 pages, 6 figures

  16. arXiv:2409.19962  [pdf, other

    cs.CE

    Two-Stage Optimization for Efficient V2G Coordination in Distribution Power System

    Authors: Pengchao Tian, Siqi Yan, Bikang Pan, Ye Shi

    Abstract: With the growing popularity of electric vehicles (EVs), maintaining power grid stability has become a significant challenge. To address this issue, EV scheduling control strategies have been developed to manage vehicle-to-grid (V2G) in coordination with the optimal power flow. In existing studies, such coordination optimization is formulated as a mixed-integer nonlinear programming (MINP), which i… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  17. arXiv:2409.17564  [pdf, other

    cs.CV

    General Compression Framework for Efficient Transformer Object Tracking

    Authors: Lingyi Hong, Jinglun Li, Xinyu Zhou, Shilin Yan, Pinxue Guo, Kaixun Jiang, Zhaoyu Chen, Shuyong Gao, Wei Zhang, Hong Lu, Wenqiang Zhang

    Abstract: Transformer-based trackers have established a dominant role in the field of visual object tracking. While these trackers exhibit promising performance, their deployment on resource-constrained devices remains challenging due to inefficiencies. To improve the inference efficiency and reduce the computation cost, prior approaches have aimed to either design lightweight trackers or distill knowledge… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  18. arXiv:2409.17534  [pdf, other

    cs.AI

    Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization

    Authors: Ruijie Xu, Zhihan Liu, Yongfei Liu, Shipeng Yan, Zhaoran Wang, Zhi Zhang, Xuming He

    Abstract: We address the challenge of online Reinforcement Learning from Human Feedback (RLHF) with a focus on self-rewarding alignment methods. In online RLHF, obtaining feedback requires interaction with the environment, which can be costly when using additional reward models or the GPT-4 API. Current self-rewarding approaches rely heavily on the discriminator's judgment capabilities, which are effective… ▽ More

    Submitted 14 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

  19. arXiv:2409.15196  [pdf, other

    cs.CV cs.AI

    HOTVCOM: Generating Buzzworthy Comments for Videos

    Authors: Yuyan Chen, Yiwen Qian, Songzhou Yan, Jiyuan Jia, Zhixu Li, Yanghua Xiao, Xiaobo Li, Ming Yang, Qingpei Guo

    Abstract: In the era of social media video platforms, popular ``hot-comments'' play a crucial role in attracting user impressions of short-form videos, making them vital for marketing and branding purpose. However, existing research predominantly focuses on generating descriptive comments or ``danmaku'' in English, offering immediate reactions to specific video moments. Addressing this gap, our study introd… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: Accepted to ACL 2024 (Findings)

  20. arXiv:2409.14762  [pdf, other

    cs.CL cs.AI

    Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?

    Authors: Yuyan Chen, Tianhao Yu, Yueze Li, Songzhou Yan, Sijia Liu, Jiaqing Liang, Yanghua Xiao

    Abstract: The evaluation of the problem-solving capability under incomplete information scenarios of Large Language Models (LLMs) is increasingly important, encompassing capabilities such as questioning, knowledge search, error detection, and path planning. Current research mainly focus on LLMs' problem-solving capability such as ``Twenty Questions''. However, these kinds of games do not require recognizing… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: Accepted to ACL 2024 (Findings)

  21. arXiv:2409.13359  [pdf, other

    cs.CL cs.AI

    EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models

    Authors: Yuyan Chen, Hao Wang, Songzhou Yan, Sijia Liu, Yueze Li, Yi Zhao, Yanghua Xiao

    Abstract: Emotional intelligence in large language models (LLMs) is of great importance in Natural Language Processing. However, the previous research mainly focus on basic sentiment analysis tasks, such as emotion recognition, which is not enough to evaluate LLMs' overall emotional intelligence. Therefore, this paper presents a novel framework named EmotionQueen for evaluating the emotional intelligence of… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted to ACL 2024 (Findings)

  22. arXiv:2409.12532  [pdf, other

    cs.CV

    Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation

    Authors: Chenyu Wang, Shuo Yan, Yixuan Chen, Yujiang Wang, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Robert P. Dick, Qin Lv, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Video generation using diffusion-based models is constrained by high computational costs due to the frame-wise iterative diffusion process. This work presents a Diffusion Reuse MOtion (Dr. Mo) network to accelerate latent video generation. Our key discovery is that coarse-grained noises in earlier denoising steps have demonstrated high motion consistency across consecutive video frames. Following… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  23. arXiv:2409.10593  [pdf, other

    cs.LG cs.AI cs.CL

    CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

    Authors: Luning Wang, Shiyao Li, Xuefei Ning, Zhihang Yuan, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Large Language Models (LLMs) have been widely adopted to process long-context tasks. However, the large memory overhead of the key-value (KV) cache poses significant challenges in long-context scenarios. Existing training-free KV cache compression methods typically focus on quantization and token pruning, which have compression limits, and excessive sparsity can lead to severe performance degradat… ▽ More

    Submitted 18 October, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 4th NeurIPS Efficient Natural Language and Speech Processing Workshop (ENLSP-IV 2024)

  24. arXiv:2409.09427  [pdf, other

    cs.MM

    Prototypical Prompting for Text-to-image Person Re-identification

    Authors: Shuanglin Yan, Jun Liu, Neng Dong, Liyan Zhang, Jinhui Tang

    Abstract: In this paper, we study the problem of Text-to-Image Person Re-identification (TIReID), which aims to find images of the same identity described by a text sentence from a pool of candidate images. Benefiting from Vision-Language Pre-training, such as CLIP (Contrastive Language-Image Pretraining), the TIReID techniques have achieved remarkable progress recently. However, most existing methods only… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: Accepted by ACM MM2024

  25. arXiv:2409.07775  [pdf, other

    cs.AI cs.CR

    A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning

    Authors: Yinbo Yu, Saihao Yan, Jiajia Liu

    Abstract: Recent studies have shown that cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor trigger is observed, it will perform abnormal actions leading to failures or malicious goals. However, existing proposed backdoors suffer from several issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is trained or activated… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: 6 pages, IEEE Globecom 2024

  26. arXiv:2409.04777  [pdf, other

    cs.LG math.OC

    Optimization Hyper-parameter Laws for Large Language Models

    Authors: Xingyu Xie, Kuangyu Ding, Shuicheng Yan, Kim-Chuan Toh, Tianwen Wei

    Abstract: Large Language Models have driven significant AI advancements, yet their training is resource-intensive and highly sensitive to hyper-parameter selection. While scaling laws provide valuable guidance on model size and data requirements, they fall short in choosing dynamic hyper-parameters, such as learning-rate (LR) schedules, that evolve during training. To bridge this gap, we present Optimizatio… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  27. arXiv:2408.15777  [pdf, other

    cs.CV

    A Survey on Facial Expression Recognition of Static and Dynamic Emotions

    Authors: Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan

    Abstract: Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies. As the FER field evolves from controlled laboratory environments to more complex in-the-wild scenarios, advanced methods have been rapidly developed and new… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  28. arXiv:2408.13574  [pdf, other

    cs.CV

    PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model

    Authors: Hao Yang, Qianyu Zhou, Haijia Sun, Xiangtai Li, Fengqi Liu, Xuequan Lu, Lizhuang Ma, Shuicheng Yan

    Abstract: Domain Generalization (DG) has been recently explored to improve the generalizability of point cloud classification (PCC) models toward unseen domains. However, they often suffer from limited receptive fields or quadratic complexity due to the use of convolution neural networks or vision Transformers. In this paper, we present the first work that studies the generalizability of state space models… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  29. arXiv:2408.12003  [pdf

    cs.CL

    RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy and Personalization

    Authors: Jinhu Qi, Shuai Yan, Yibo Zhang, Wentao Zhang, Rong Jin, Yuwei Hu, Ke Wang

    Abstract: With the development of the modern social economy, tourism has become an important way to meet people's spiritual needs, bringing development opportunities to the tourism industry. However, existing large language models (LLMs) face challenges in personalized recommendation capabilities and the generation of content that can sometimes produce hallucinations. This study proposes an optimization sch… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted by AIPR 2024

    ACM Class: I.2.7

  30. arXiv:2408.10947  [pdf, other

    cs.AI cs.CL cs.CY

    Dr.Academy: A Benchmark for Evaluating Questioning Capability in Education for Large Language Models

    Authors: Yuyan Chen, Chenwei Wu, Songzhou Yan, Panjun Liu, Haoyu Zhou, Yanghua Xiao

    Abstract: Teachers are important to imparting knowledge and guiding learners, and the role of large language models (LLMs) as potential educators is emerging as an important area of study. Recognizing LLMs' capability to generate educational content can lead to advances in automated and personalized learning. While LLMs have been tested for their comprehension and problem-solving skills, their capability in… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2024

  31. arXiv:2408.10455  [pdf, other

    cs.AI

    IDEA: Enhancing the Rule Learning Ability of Large Language Model Agent through Induction, Deduction, and Abduction

    Authors: Kaiyu He, Mian Zhang, Shuo Yan, Peilin Wu, Zhiyu Zoey Chen

    Abstract: While large language models (LLMs) have been thoroughly evaluated for deductive and inductive reasoning, their proficiency in abductive reasoning and holistic rule learning in interactive environments remains less explored. We introduce RULEARN, a novel benchmark specifically designed to assess the rule-learning abilities of LLM agents in interactive settings. In RULEARN, agents strategically inte… ▽ More

    Submitted 2 October, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  32. arXiv:2408.05159  [pdf, other

    cs.CV

    EasyInv: Toward Fast and Better DDIM Inversion

    Authors: Ziyue Zhang, Mingbao Lin, Shuicheng Yan, Rongrong Ji

    Abstract: This paper introduces EasyInv, an easy yet novel approach that significantly advances the field of DDIM Inversion by addressing the inherent inefficiencies and performance limitations of traditional iterative optimization methods. At the core of our EasyInv is a refined strategy for approximating inversion noise, which is pivotal for enhancing the accuracy and reliability of the inversion process.… ▽ More

    Submitted 13 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: 9 pages not including reference

  33. arXiv:2408.03221  [pdf, other

    cs.NI eess.SP

    DRL-Assisted Dynamic QoT-Aware Service Provisioning in Multi-Band Elastic Optical Networks

    Authors: Yiran Teng, Carlos Natalino, Farhad Arpanaei, Alfonso Sánchez-Macián, Paolo Monti, Shuangyi Yan, Dimitra Simeonidou

    Abstract: We propose a DRL-assisted approach for service provisioning in multi-band elastic optical networks. Our simulation environment uses an accurate QoT estimator based on the GN/EGN model. Results show that the proposed approach reduces request blocking by 50% compared with heuristics from the literature.

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by 50th European Conference on Optical Communications (ECOC 2O24)

  34. Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving Priors

    Authors: Kunkun Hao, Yonggang Luo, Wen Cui, Yuqiao Bai, Jucheng Yang, Songyang Yan, Yuxi Pan, Zijiang Yang

    Abstract: Evaluating the decision-making system is indispensable in developing autonomous vehicles, while realistic and challenging safety-critical test scenarios play a crucial role. Obtaining these scenarios is non-trivial, thanks to the long-tailed distribution, sparsity, and rarity in real-world data sets. To tackle this problem, in this paper, we introduce a natural adversarial scenario generation solu… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: Published in IEEE Transactions on Intelligent Vehicles, 2023

    Journal ref: IEEE Transactions on Intelligent Vehicles (2023)

  35. arXiv:2408.01044  [pdf, other

    cs.CV

    Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model

    Authors: Yang Jin, Lei Zhang, Shi Yan, Bin Fan, Binglu Wang

    Abstract: Gaze object prediction (GOP) aims to predict the category and location of the object that a human is looking at. Previous methods utilized box-level supervision to identify the object that a person is looking at, but struggled with semantic ambiguity, ie, a single box may contain several items since objects are close together. The Vision foundation model (VFM) has improved in object segmentation u… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024

  36. arXiv:2407.20585  [pdf, other

    cs.NI eess.SP

    A UAV-Enabled Time-Sensitive Data Collection Scheme for Grassland Monitoring Edge Networks

    Authors: Dongbin Jiao, Zihao Wang, Wen Fan, Weibo Yang, Peng Yang, Zhanhuan Shang, Shi Yan

    Abstract: Grassland monitoring is essential for the sustainable development of grassland resources. Traditional Internet of Things (IoT) devices generate critical ecological data, making data loss unacceptable, but the harsh environment complicates data collection. Unmanned Aerial Vehicle (UAV) and mobile edge computing (MEC) offer efficient data collection solutions, enhancing performance on resource-limit… ▽ More

    Submitted 10 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  37. arXiv:2407.17152  [pdf, other

    cs.CV cs.AI

    XMeCap: Meme Caption Generation with Sub-Image Adaptability

    Authors: Yuyan Chen, Songzhou Yan, Zhihong Zhu, Zhixu Li, Yanghua Xiao

    Abstract: Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines. While advances have been made in natural language processing, real-world humor often thrives in a multi-modal context, encapsulated distinctively by memes. This paper poses a particular emphasis on the impact of multi-images on meme captioning. After that, we introduce the \textsc{XMeCap} framewo… ▽ More

    Submitted 20 September, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted to MM 2024

  38. arXiv:2407.16406  [pdf, other

    cs.CV cs.LG

    Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Yan Wang, Jiawen Yu, Ziheng Zhou, Xuan Tong, Shaoqi Yan, Qing Zhao, Shuyong Gao, Wenqiang Zhang

    Abstract: Affective Forecasting, a research direction in psychology that predicts individuals future emotions, is often constrained by numerous external factors like social influence and temporal distance. To address this, we transform Affective Forecasting into a Deep Learning problem by designing an Emotion Forecasting paradigm based on two-party interactions. We propose a novel Emotion Forecasting (EF) t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  39. arXiv:2407.15590  [pdf, other

    cs.CV

    All rivers run into the sea: Unified Modality Brain-like Emotional Central Mechanism

    Authors: Xinji Mai, Junxiong Lin, Haoran Wang, Zeng Tao, Yan Wang, Shaoqi Yan, Xuan Tong, Jiawen Yu, Boyang Wang, Ziheng Zhou, Qing Zhao, Shuyong Gao, Wenqiang Zhang

    Abstract: In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UM… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  40. arXiv:2407.14710  [pdf, other

    cs.LG cs.CR

    Universally Harmonizing Differential Privacy Mechanisms for Federated Learning: Boosting Accuracy and Convergence

    Authors: Shuya Feng, Meisam Mohammady, Hanbin Hong, Shenao Yan, Ashish Kundu, Binghui Wang, Yuan Hong

    Abstract: Differentially private federated learning (DP-FL) is a promising technique for collaborative model training while ensuring provable privacy for clients. However, optimizing the tradeoff between privacy and accuracy remains a critical challenge. To our best knowledge, we propose the first DP-FL framework (namely UDP-FL), which universally harmonizes any randomization mechanism (e.g., an optimal one… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  41. arXiv:2407.13561  [pdf, other

    cs.CL

    Research on Tibetan Tourism Viewpoints information generation system based on LLM

    Authors: Jinhu Qi, Shuai Yan, Wentao Zhang, Yibo Zhang, Zirui Liu, Ke Wang

    Abstract: Tibet, ensconced within China's territorial expanse, is distinguished by its labyrinthine and heterogeneous topography, a testament to its profound historical heritage, and the cradle of a unique religious ethos. The very essence of these attributes, however, has impeded the advancement of Tibet's tourism service infrastructure, rendering existing smart tourism services inadequate for the region's… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Journal ref: ICWOC 2024

  42. arXiv:2407.13431  [pdf, other

    cs.LG cs.AI

    Improving Out-of-Distribution Generalization of Trajectory Prediction for Autonomous Driving via Polynomial Representations

    Authors: Yue Yao, Shengchao Yan, Daniel Goehring, Wolfram Burgard, Joerg Reichardt

    Abstract: Robustness against Out-of-Distribution (OoD) samples is a key performance indicator of a trajectory prediction model. However, the development and ranking of state-of-the-art (SotA) models are driven by their In-Distribution (ID) performance on individual competition datasets. We present an OoD testing protocol that homogenizes datasets and prediction tasks across two large-scale motion datasets.… ▽ More

    Submitted 26 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  43. arXiv:2407.11325  [pdf, other

    cs.CV

    VISA: Reasoning Video Object Segmentation via Large Language Models

    Authors: Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves

    Abstract: Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS). This task aims to generate a sequence of segmentation masks in response to implic… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  44. arXiv:2407.11096  [pdf, other

    cs.LG cs.AI

    Static and multivariate-temporal attentive fusion transformer for readmission risk prediction

    Authors: Zhe Sun, Runzhi Li, Jing Wang, Gang Chen, Siyu Yan, Lihong Ma

    Abstract: Background: Accurate short-term readmission prediction of ICU patients is significant in improving the efficiency of resource assignment by assisting physicians in making discharge decisions. Clinically, both individual static static and multivariate temporal data collected from ICU monitors play critical roles in short-term readmission prediction. Informative static and multivariate temporal feat… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  45. arXiv:2407.09862  [pdf, other

    cs.CV

    ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency

    Authors: Shaocheng Yan, Pengcheng Shi, Jiayuan Li

    Abstract: Recent advances in point cloud registration mostly leverage geometric information. Although these methods have yielded promising results, they still struggle with problems of low overlap, thus limiting their practical usage. In this paper, we propose ML-SemReg, a plug-and-play point cloud registration framework that fully exploits semantic information. Our key insight is that mismatches can be cat… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  46. arXiv:2407.08348  [pdf, other

    cs.AI cs.CL cs.LG

    Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

    Authors: Liang Zeng, Liangjun Zhong, Liang Zhao, Tianwen Wei, Liu Yang, Jujie He, Cheng Cheng, Rui Hu, Yang Liu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  47. arXiv:2407.05021  [pdf, other

    cs.CV

    Incremental Multiview Point Cloud Registration

    Authors: Xiaoya Cheng, Yu Liu, Maojun Zhang, Shen Yan

    Abstract: In this paper, we present a novel approach for multiview point cloud registration. Different from previous researches that typically employ a global scheme for multiview registration, we propose to adopt an incremental pipeline to progressively align scans into a canonical coordinate system. Specifically, drawing inspiration from image-based 3D reconstruction, our approach first builds a sparse sc… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  48. arXiv:2407.00945  [pdf, other

    cs.LG

    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  49. arXiv:2407.00904  [pdf, other

    cs.CE

    Background-aware Multi-source Fusion Financial Trend Forecasting Mechanism

    Authors: Fengting Mo, Shanshan Yan, Yinhao Xiao

    Abstract: Stock prices, as an economic indicator, reflect changes in economic development and market conditions. Traditional stock price prediction models often only consider time-series data and are limited by the mechanisms of the models themselves. Some deep learning models have high computational costs, depend on a large amount of high-quality data, and have poor interpretations, making it difficult to… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  50. arXiv:2407.00497  [pdf, other

    cs.CL

    LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

    Authors: Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuanjing Huang, Shuicheng Yan

    Abstract: This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.