Skip to main content

Showing 1–50 of 721 results for author: Liang, J

  1. arXiv:2410.15205  [pdf, other

    cs.MA

    DTPPO: Dual-Transformer Encoder-based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

    Authors: Anning Wei, Jintao Liang, Kaiyuan Lin, Ziyue Li, Rui Zhao

    Abstract: Existing multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-based Proximal Policy Optimization (DTPPO) method. DTPPO enhances multi-UAV collaboration through a Spatial Transformer, which models inter-agent dyn… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  2. arXiv:2410.15057  [pdf, other

    stat.ML cs.LG stat.ME

    Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation

    Authors: Chuhan Xie, Kaicheng Jin, Jiadong Liang, Zhihua Zhang

    Abstract: We study time-uniform statistical inference for parameters in stochastic approximation (SA), which encompasses a bunch of applications in optimization and machine learning. To that end, we analyze the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems. We then construct three types of asymptotic confidence sequences that are… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 35 pages, 4 figures

  3. arXiv:2410.15019  [pdf, other

    cs.CL

    A Survey of Ontology Expansion for Conversational Understanding

    Authors: Jinggui Liang, Yuxia Wu, Yuan Fang, Hao Fei, Lizi Liao

    Abstract: In the rapidly evolving field of conversational AI, Ontology Expansion (OnExp) is crucial for enhancing the adaptability and robustness of conversational agents. Traditional models rely on static, predefined ontologies, limiting their ability to handle new and unforeseen user needs. This survey paper provides a comprehensive review of the state-of-the-art techniques in OnExp for conversational und… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: Accepted by EMNLP 2024, code and data are available at this https URL: https://github.com/liangjinggui/Ontology-Expansion

  4. arXiv:2410.13720  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Movie Gen: A Cast of Media Foundation Models

    Authors: Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le , et al. (63 additional authors not shown)

    Abstract: We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization,… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  5. arXiv:2410.13413  [pdf, other

    cs.CL cs.AI

    Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models

    Authors: Chengyu Du, Jinyi Han, Yizhou Ying, Aili Chen, Qianyu He, Haokun Zhao, Sirui Xia, Haoran Guo, Jiaqing Liang, Zulong Chen, Liangyue Li, Yanghua Xiao

    Abstract: Recent advancements in large language models (LLMs) have demonstrated that progressive refinement, rather than providing a single answer, results in more accurate and thoughtful outputs. However, existing methods often rely heavily on supervision signals to evaluate previous responses, making it difficult to assess output quality in more open-ended scenarios effectively. Additionally, these method… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 10 pages, 4 figures

  6. arXiv:2410.12800  [pdf

    cs.CY

    Reproducibility Needs Reshape Scientific Data Governance

    Authors: Paul Meijer, Yousef Aggoune, Madeline Ambrose, Aldan Beaubien, James Harvey, Nicole Howard, Neelima Inala, Ed Johnson, Autumn Kelsey, Melissa Kinsey, Jessica Liang, Paul Mariz, Stark Pister, Sathya Subramanian, Vitalii Tereshchenko, Anne Vetto

    Abstract: Scientific data governance should prioritize maximizing the utility of data throughout the research lifecycle. Research software systems that enable analysis reproducibility inform data governance policies and assist administrators in setting clear guidelines for data reuse, data retention, and the management of scientific computing needs. Proactive analysis reproducibility and data governance are… ▽ More

    Submitted 29 September, 2024; originally announced October 2024.

  7. arXiv:2410.11019  [pdf, other

    cs.CV

    ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion From Monocular Camera

    Authors: Jing Liang, He Yin, Xuewei Qi, Jong Jin Park, Min Sun, Rajasimman Madhivanan, Dinesh Manocha

    Abstract: We introduce ET-Former, a novel end-to-end algorithm for semantic scene completion using a single monocular camera. Our approach generates a semantic occupancy map from single RGB observation while simultaneously providing uncertainty estimates for semantic predictions. By designing a triplane-based deformable attention mechanism, our approach improves geometric understanding of the scene than oth… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  8. arXiv:2410.10626  [pdf, other

    cs.CL

    Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

    Authors: Guorui Zheng, Xidong Wang, Juhao Liang, Nuo Chen, Yuping Zheng, Benyou Wang

    Abstract: Adapting medical Large Language Models to local languages can reduce barriers to accessing healthcare services, but data scarcity remains a significant challenge, particularly for low-resource languages. To address this, we first construct a high-quality medical dataset and conduct analysis to ensure its quality. In order to leverage the generalization capability of multilingual LLMs to efficientl… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  9. arXiv:2410.09738  [pdf

    cs.SE

    Can Large Language Models Generate Geospatial Code?

    Authors: Shuyang Hou, Zhangxiao Shen, Jianyuan Liang, Anqi Zhao, Zhipeng Gui, Rui Li, Huayi Wu

    Abstract: With the growing demand for spatiotemporal data processing and geospatial modeling, automating geospatial code generation has become essential for productivity. Large language models (LLMs) show promise in code generation but face challenges like domain-specific knowledge gaps and "coding hallucinations." This paper introduces GeoCode-Eval (GCE), a framework for assessing LLMs' ability to generate… ▽ More

    Submitted 17 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

  10. arXiv:2410.06519  [pdf, other

    cs.CL

    SEGMENT+: Long Text Processing with Short-Context Language Models

    Authors: Wei Shi, Shuang Li, Kerun Yu, Jinglei Chen, Zujie Liang, Xinhui Wu, Yuxi Qian, Feng Wei, Bo Zheng, Jiaqing Liang, Jiangjie Chen, Yanghua Xiao

    Abstract: There is a growing interest in expanding the input capacity of language models (LMs) across various domains. However, simply increasing the context window does not guarantee robust performance across diverse long-input processing tasks, such as understanding extensive documents and extracting detailed information from lengthy and noisy data. In response, we introduce SEGMENT+, a general framework… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: EMNLP 2024

  11. arXiv:2410.02220  [pdf, other

    cs.CR cs.AI

    Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation

    Authors: Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Chenyu You, Muchao Ye, Zhaohan Xi

    Abstract: Large language models (LLMs) are extensively adapted for downstream applications through a process known as "customization," with fine-tuning being a common method for integrating domain-specific expertise. However, recent studies have revealed a vulnerability that tuning LLMs with malicious samples can compromise their robustness and amplify harmful content, an attack known as "jailbreaking." To… ▽ More

    Submitted 4 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

  12. arXiv:2410.00350  [pdf, other

    cs.CV cs.AI

    Efficient Training of Large Vision Models via Advanced Automated Progressive Learning

    Authors: Changlin Li, Jiawei Zhang, Sihao Lin, Zongxin Yang, Junwei Liang, Xiaodan Liang, Xiaojun Chang

    Abstract: The rapid advancements in Large Vision Models (LVMs), such as Vision Transformers (ViTs) and diffusion models, have led to an increasing demand for computational resources, resulting in substantial financial and environmental costs. This growing challenge highlights the necessity of developing efficient training methods for LVMs. Progressive learning, a training strategy in which model capacity gr… ▽ More

    Submitted 6 September, 2024; originally announced October 2024.

    Comments: Code: https://github.com/changlin31/AutoProg-Zero. arXiv admin note: substantial text overlap with arXiv:2203.14509

  13. arXiv:2409.19526  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

    Authors: Kuanrong Liu, Siyuan Liang, Jiawei Liang, Pengwen Dai, Xiaochun Cao

    Abstract: Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning t… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  14. arXiv:2409.19362  [pdf, other

    cs.CV cs.AI

    1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024

    Authors: Minqiang Zou, Zhi Lv, Riqiang Jin, Tian Zhan, Mochen Yu, Yao Tang, Jiajun Liang

    Abstract: Multi-view egocentric hand tracking is a challenging task and plays a critical role in VR interaction. In this report, we present a method that uses multi-view input images and camera extrinsic parameters to estimate both hand shape and pose. To reduce overfitting to the camera layout, we apply crop jittering and extrinsic parameter noise augmentation. Additionally, we propose an offline neural sm… ▽ More

    Submitted 8 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

    Comments: Accepted in ECCV2024 workshop

  15. arXiv:2409.18578  [pdf, other

    cs.LG cs.AI

    An Enhanced Federated Prototype Learning Method under Domain Shift

    Authors: Liang Kuang, Kuangpu Guo, Jian Liang, Jianguo Zhang

    Abstract: Federated Learning (FL) allows collaborative machine learning training without sharing private data. Numerous studies have shown that one significant factor affecting the performance of federated learning models is the heterogeneity of data across different clients, especially when the data is sampled from various domains. A recent paper introduces variance-aware dual-level prototype clustering an… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: 8 pages, 6 figures

  16. arXiv:2409.17805  [pdf, other

    cs.CV

    Cascade Prompt Learning for Vision-Language Model Adaptation

    Authors: Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li

    Abstract: Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks. However, current learnable prompt tokens are primarily used for the single phase of adapting to tasks (i.e., adapting prompt), easily leading to overfitting risks. In this work, we propose a novel Cascade Prompt Learning CasPL framework to en… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: ECCV2024

  17. arXiv:2409.16484  [pdf, other

    cs.RO

    BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

    Authors: Kasun Weerakoon, Mohamed Elnoor, Gershom Seneviratne, Vignesh Rajagopal, Senthil Hariharan Arul, Jing Liang, Mohamed Khalid M Jaffar, Dinesh Manocha

    Abstract: We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and assoc… ▽ More

    Submitted 2 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

  18. arXiv:2409.15968  [pdf, other

    cs.CV

    Adversarial Backdoor Defense in CLIP

    Authors: Junhao Kuang, Siyuan Liang, Jiawei Liang, Kuanrong Liu, Xiaochun Cao

    Abstract: Multimodal contrastive pretraining, exemplified by models like CLIP, has been found to be vulnerable to backdoor attacks. While current backdoor defense methods primarily employ conventional data augmentation to create augmented samples aimed at feature alignment, these methods fail to capture the distinct features of backdoor samples, resulting in suboptimal defense performance. Observations reve… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  19. arXiv:2409.15657  [pdf, other

    cs.AI cs.CL cs.LG

    M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

    Authors: Taowen Wang, Yiyang Liu, James Chenhao Liang, junhan zhao, Yiming Cui, Yuning Mao, Shaoliang Nie, Jiahao Liu, Fuli Feng, Zenglin Xu, Cheng Han, Lifu Huang, Qifan Wang, Dongfang Liu

    Abstract: Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across a wide range of domains, with increasing emphasis on enhancing their zero-shot generalization capabilities for unseen tasks across various modalities. Instruction tuning has emerged as an effective strategy for achieving zero-shot generalization by finetuning pretrained models on diverse multimodal tasks. As the sca… ▽ More

    Submitted 27 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024

  20. arXiv:2409.14820  [pdf, other

    cs.CL cs.AI

    Past Meets Present: Creating Historical Analogy with Large Language Models

    Authors: Nianqi Li, Siyu Yuan, Jiangjie Chen, Jiaqing Liang, Feng Wei, Zujie Liang, Deqing Yang, Yanghua Xiao

    Abstract: Historical analogies, which compare known past events with contemporary but unfamiliar events, are important abilities that help people make decisions and understand the world. However, research in applied history suggests that people have difficulty finding appropriate analogies. And previous studies in the AI community have also overlooked historical analogies. To fill this gap, in this paper, w… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  21. arXiv:2409.14762  [pdf, other

    cs.CL cs.AI

    Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?

    Authors: Yuyan Chen, Tianhao Yu, Yueze Li, Songzhou Yan, Sijia Liu, Jiaqing Liang, Yanghua Xiao

    Abstract: The evaluation of the problem-solving capability under incomplete information scenarios of Large Language Models (LLMs) is increasingly important, encompassing capabilities such as questioning, knowledge search, error detection, and path planning. Current research mainly focus on LLMs' problem-solving capability such as ``Twenty Questions''. However, these kinds of games do not require recognizing… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: Accepted to ACL 2024 (Findings)

  22. arXiv:2409.14262  [pdf, other

    cs.RO

    GND: Global Navigation Dataset with Multi-Modal Perception and Multi-Category Traversability in Outdoor Campus Environments

    Authors: Jing Liang, Dibyendu Das, Daeun Song, Md Nahid Hasan Shuvo, Mohammad Durrani, Karthik Taranath, Ivan Penskiy, Dinesh Manocha, Xuesu Xiao

    Abstract: Navigating large-scale outdoor environments requires complex reasoning in terms of geometric structures, environmental semantics, and terrain characteristics, which are typically captured by onboard sensors such as LiDAR and cameras. While current mobile robots can navigate such environments using pre-defined, high-precision maps based on hand-crafted rules catered for the specific environment, th… ▽ More

    Submitted 26 September, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

  23. arXiv:2409.13244  [pdf, other

    cs.RO cs.AI

    From Cognition to Precognition: A Future-Aware Framework for Social Navigation

    Authors: Zeying Gong, Tianshuai Hu, Ronghe Qiu, Junwei Liang

    Abstract: To navigate safely and efficiently in crowded spaces, robots should not only perceive the current state of the environment but also anticipate future human movements. In this paper, we propose a reinforcement learning architecture, namely Falcon, to tackle socially-aware navigation by explicitly predicting human trajectories and penalizing actions that block future human paths. To facilitate reali… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Social Navigation; Trajectory Prediction; Auxiliary Tasks

  24. arXiv:2409.12447  [pdf, other

    cs.SE cs.AI cs.HC

    Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts

    Authors: Jenny T. Liang, Melissa Lin, Nikitha Rao, Brad A. Myers

    Abstract: The introduction of generative pre-trained models, like GPT-4, has introduced a phenomenon known as prompt engineering, whereby model users repeatedly write and revise prompts while trying to achieve a task. Using these AI models for intelligent features in software applications require using APIs that are controlled through developer-written prompts. These prompts have powered AI experiences in p… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  25. arXiv:2409.11884  [pdf, other

    cs.LG

    Recent Advances in OOD Detection: Problems and Approaches

    Authors: Shuo Lu, Yingsheng Wang, Lijun Sheng, Aihua Zheng, Lingxiao He, Jian Liang

    Abstract: Out-of-distribution (OOD) detection aims to detect test samples outside the training category space, which is an essential component in building reliable machine learning systems. Existing reviews on OOD detection primarily focus on method taxonomy, surveying the field by categorizing various approaches. However, many recent works concentrate on non-traditional OOD detection scenarios, such as tes… ▽ More

    Submitted 21 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: First Submitted in May 2024

  26. arXiv:2409.11653  [pdf, other

    cs.LG cs.CV

    Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection

    Authors: Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, Jiajuan Liang, Jian Wu

    Abstract: Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep learning tasks, which reduces the need for human labor. Previous studies primarily focus on effectively utilising the labelled and unlabeled data to improve performance. However, we observe that how to select samples for labelling also significantly impacts performance, particularly under extremely low-budget settings. The… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Under Review

  27. arXiv:2409.09647  [pdf, other

    cs.SD cs.AI eess.AS

    Self-supervised Learning for Acoustic Few-Shot Classification

    Authors: Jingyong Liang, Bernd Meyer, Issac Ning Lee, Thanh-Toan Do

    Abstract: Labelled data are limited and self-supervised learning is one of the most important approaches for reducing labelling requirements. While it has been extensively explored in the image domain, it has so far not received the same amount of attention in the acoustic domain. Yet, reducing labelling is a key requirement for many acoustic applications. Specifically in bioacoustic, there are rarely suffi… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  28. arXiv:2409.07901  [pdf, other

    cs.MM

    Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection

    Authors: Jiehui Jia, Huan Zhang, Jinhua Liang

    Abstract: In the domain of human-computer interaction, accurately recognizing and interpreting human emotions is crucial yet challenging due to the complexity and subtlety of emotional expressions. This study explores the potential for detecting a rich and flexible range of emotions through a multimodal approach which integrates facial expressions, voice tones, and transcript from video clips. We propose a… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  29. arXiv:2409.07827  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings

    Authors: Tanisha Hisariya, Huan Zhang, Jinhua Liang

    Abstract: Rapid advancements in artificial intelligence have significantly enhanced generative tasks involving music and images, employing both unimodal and multimodal approaches. This research develops a model capable of generating music that resonates with the emotions depicted in visual arts, integrating emotion labeling, image captioning, and language models to transform visual inputs into musical compo… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  30. arXiv:2409.06979  [pdf, ps, other

    cs.IT quant-ph

    A High-Performance List Decoding Algorithm for Surface Codes with Erroneous Syndrome

    Authors: Jifan Liang, Qianfan Wang, Lvzhou Li, Xiao Ma

    Abstract: Quantum error-correcting codes (QECCs) are necessary for fault-tolerant quantum computation. Surface codes are a class of topological QECCs that have attracted significant attention due to their exceptional error-correcting capabilities and easy implementation. In the decoding process of surface codes, the syndromes are crucial for error correction, though they are not always correctly measured. M… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 17 pages, 10 figures

  31. arXiv:2409.06166  [pdf, other

    cs.CV

    Revisiting Prompt Pretraining of Vision-Language Models

    Authors: Zhenyuan Chen, Lingfeng Yang, Shuo Chen, Zhaowei Chen, Jiajun Liang, Xiang Li

    Abstract: Prompt learning is an effective method to customize Vision-Language Models (VLMs) for various downstream tasks, involving tuning very few parameters of input prompt tokens. Recently, prompt pretraining in large-scale dataset (e.g., ImageNet-21K) has played a crucial role in prompt learning for universal visual discrimination. However, we revisit and observe that the limited learnable prompts could… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  32. arXiv:2409.05681  [pdf, other

    cs.CV

    SX-Stitch: An Efficient VMS-UNet Based Framework for Intraoperative Scoliosis X-Ray Image Stitching

    Authors: Yi Li, Heting Gao, Mingde He, Jinqian Liang, Jason Gu, Wei Liu

    Abstract: In scoliosis surgery, the limited field of view of the C-arm X-ray machine restricts the surgeons' holistic analysis of spinal structures .This paper presents an end-to-end efficient and robust intraoperative X-ray image stitching method for scoliosis surgery,named SX-Stitch. The method is divided into two stages:segmentation and stitching. In the segmentation stage, We propose a medical image seg… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  33. arXiv:2409.04730  [pdf, other

    cs.RO

    IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity

    Authors: Derek Ming Siang Tan, Yixiao Ma, Jingsong Liang, Yi Cheng Chng, Yuhong Cao, Guillaume Sartoretti

    Abstract: Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or li… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

    Comments: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  34. arXiv:2409.01151  [pdf, other

    cs.CV cs.LG

    Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

    Authors: Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao

    Abstract: Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free represe… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  35. arXiv:2409.01128  [pdf, other

    cs.LG cs.CV

    Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning

    Authors: Jinglin Liang, Jin Zhong, Hanlin Gu, Zhongqi Lu, Xingxing Tang, Gang Dai, Shuangping Huang, Lixin Fan, Qiang Yang

    Abstract: Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experien… ▽ More

    Submitted 3 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024 Oral

  36. arXiv:2408.14975  [pdf, other

    cs.CV

    MegActor-$Σ$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

    Authors: Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin Wang

    Abstract: Diffusion models have demonstrated superior performance in the field of portrait animation. However, current approaches relied on either visual or audio modality to control character movements, failing to exploit the potential of mixed-modal control. This challenge arises from the difficulty in balancing the weak control strength of audio modality and the strong control strength of visual modality… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  37. arXiv:2408.14744  [pdf, other

    cs.CV cs.AI

    RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models

    Authors: Junyao Ge, Yang Zheng, Kaitai Guo, Jimin Liang

    Abstract: Abundant, well-annotated multimodal data in remote sensing are pivotal for aligning complex visual remote sensing (RS) scenes with human language, enabling the development of specialized vision language models across diverse RS interpretation tasks. However, annotating RS images with rich linguistic semantics at scale demands expertise in RS and substantial human labor, making it costly and often… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Submitted to ISPRS

    ACM Class: I.4.8; I.2.10

  38. arXiv:2408.12769  [pdf, other

    cs.CV cs.NI

    Enhancing Vehicle Environmental Awareness via Federated Learning and Automatic Labeling

    Authors: Chih-Yu Lin, Jin-Wei Liang

    Abstract: Vehicle environmental awareness is a crucial issue in improving road safety. Through a variety of sensors and vehicle-to-vehicle communication, vehicles can collect a wealth of data. However, to make these data useful, sensor data must be integrated effectively. This paper focuses on the integration of image data and vehicle-to-vehicle communication data. More specifically, our goal is to identify… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  39. arXiv:2408.11557  [pdf, other

    cs.IR

    A Quick, trustworthy spectral knowledge Q&A system leveraging retrieval-augmented generation on LLM

    Authors: Jiheng Liang, Ziru Yu, Zujie Xie, Xiangyang Yu

    Abstract: Large Language Model (LLM) has demonstrated significant success in a range of natural language processing (NLP) tasks within general domain. The emergence of LLM has introduced innovative methodologies across diverse fields, including the natural sciences. Researchers aim to implement automated, concurrent process driven by LLM to supplant conventional manual, repetitive and labor-intensive work.… ▽ More

    Submitted 11 October, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 16 pages,10 figures,3 tables

  40. arXiv:2408.10602  [pdf, other

    cs.CV cs.AI

    MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

    Authors: Jintao Cheng, Xingming Chen, Jinxin Liang, Xiaoyu Tang, Xieyuanli Chen, Dachuan Li

    Abstract: Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by f… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures

  41. arXiv:2408.09955  [pdf, other

    cs.MA

    MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems

    Authors: Qian Wang, Tianyu Wang, Qinbin Li, Jingsheng Liang, Bingsheng He

    Abstract: With the emergence of large language models (LLMs), LLM-powered multi-agent systems (LLM-MA systems) have been proposed to tackle real-world tasks. However, their agents mostly follow predefined Standard Operating Procedures (SOPs) that remain unchanged across the whole interaction, lacking autonomy and scalability. Additionally, current solutions often overlook the necessity for effective agent c… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  42. arXiv:2408.09103  [pdf

    cs.CE

    Provide Proactive Reproducible Analysis Transparency with Every Publication

    Authors: Paul Meijer, Nicole Howard, Jessica Liang, Autumn Kelsey, Sathya Subramanian, Ed Johnson, Paul Mariz, James Harvey, Madeline Ambrose, Vitalii Tereshchenko, Aldan Beaubien, Neelima Inala, Yousef Aggoune, Stark Pister, Anne Vetto, Melissa Kinsey, Tom Bumol, Ananda Goldrath, Xiaojun Li, Troy Torgerson, Peter Skene, Lauren Okada, Christian La France, Zach Thomson, Lucas Graybuck

    Abstract: The high incidence of irreproducible research has led to urgent appeals for transparency and equitable practices in open science. For the scientific disciplines that rely on computationally intensive analyses of large data sets, a granular understanding of the analysis methodology is an essential component of reproducibility. This paper discusses the guiding principles of a computational reproduci… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  43. arXiv:2408.05713  [pdf, other

    cs.CV

    SSL: A Self-similarity Loss for Improving Generative Image Super-resolution

    Authors: Du Chen, Zhengqiang Zhang, Jie Liang, Lei Zhang

    Abstract: Generative adversarial networks (GAN) and generative diffusion models (DM) have been widely used in real-world image super-resolution (Real-ISR) to enhance the image perceptual quality. However, these generative models are prone to generating visual artifacts and false image structures, resulting in unnatural Real-ISR results. Based on the fact that natural images exhibit high self-similarities, i… ▽ More

    Submitted 18 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  44. arXiv:2408.03768  [pdf, other

    cs.RO

    HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks

    Authors: Jingsong Liang, Yuhong Cao, Yixiao Ma, Hanqi Zhao, Guillaume Sartoretti

    Abstract: In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical atten… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Submitted to RA-L

  45. arXiv:2408.02693  [pdf, other

    physics.comp-ph cs.AI

    Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models

    Authors: Chuan Liu, Chunshu Wu, Shihui Cao, Mingkai Chen, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, Tong Geng

    Abstract: The rapid development of AI highlights the pressing need for sustainable energy, a critical global challenge for decades. Nuclear fusion, generally seen as an ultimate solution, has been the focus of intensive research for nearly a century, with investments reaching hundreds of billions of dollars. Recent advancements in Inertial Confinement Fusion have drawn significant attention to fusion resear… ▽ More

    Submitted 5 October, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  46. arXiv:2408.02454  [pdf, other

    cs.RO

    TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

    Authors: Daeun Song, Jing Liang, Xuesu Xiao, Dinesh Manocha

    Abstract: We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in challenging scenarios with unstructured off-road features like buildings, grass, and curbs. Our goal is to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating in crosswalks, sidewalks, e… ▽ More

    Submitted 7 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  47. arXiv:2408.01791  [pdf

    cs.NI

    Implementing NAT Hole Punching with QUIC

    Authors: Jinyu Liang, Wei Xu, Taotao Wang, Qing Yang, Shengli Zhang

    Abstract: The widespread adoption of Network Address Translation (NAT) technology has led to a significant number of network end nodes being located in private networks behind NAT devices, impeding direct communication between these nodes. To solve this problem, a technique known as "hole punching" has been devised for NAT traversal to facilitate peer-to-peer communication among end nodes located in distinc… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for oral presentation at the VTC2024-Fall Conference

  48. arXiv:2407.20203  [pdf, other

    cs.RO

    Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration

    Authors: Yixiao Ma, Jingsong Liang, Yuhong Cao, Derek Ming Siang Tan, Guillaume Sartoretti

    Abstract: Communication bandwidth is an important consideration in multi-robot exploration, where information exchange among robots is critical. While existing methods typically aim to reduce communication throughput, they either require significant computation or significantly compromise exploration efficiency. In this work, we propose a deep reinforcement learning framework based on communication and priv… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by DARS2024

  49. arXiv:2407.18242  [pdf, other

    cs.LG cs.AI cs.CL

    LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

    Authors: Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models. Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning. In this paper, we first uncover a fundamental connection between the optimization processes of LoRA and full fine-tuning: using LoRA for optimization is mathema… ▽ More

    Submitted 15 October, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  50. arXiv:2407.17457  [pdf, other

    cs.CV cs.RO

    CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

    Authors: Jing Liang, Zhuo Deng, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Arnie Sen, Dinesh Manocha

    Abstract: We present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking into a single end-to-end model. Unlike prior approaches that primarily focus on the RGB domain, CSCPR is designed to handle the RGB-D data. We extend the Context-of-Clusters (CoCs) for handling noisy colorized point clouds and introduce two n… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.