Skip to main content

Showing 1–50 of 372 results for author: Sha, L

  1. arXiv:2410.11617  [pdf, other

    cs.LG cs.AI cs.CV

    M$^{2}$M: Learning controllable Multi of experts and multi-scale operators are the Partial Differential Equations need

    Authors: Aoming Liang, Zhaoyang Mu, Pengxiao Lin, Cong Wang, Mingming Ge, Ling Shao, Dixia Fan, Hao Tang

    Abstract: Learning the evolutionary dynamics of Partial Differential Equations (PDEs) is critical in understanding dynamic systems, yet current methods insufficiently learn their representations. This is largely due to the multi-scale nature of the solution, where certain regions exhibit rapid oscillations while others evolve more slowly. This paper introduces a framework of multi-scale and multi-expert (M… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 30 pages, 16 figures

  2. arXiv:2410.10700  [pdf, other

    cs.CL cs.AI

    Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

    Authors: Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao

    Abstract: This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn interactions, where malicious users can obscure harmful intents across several queries. We introduce ActorAttack, a novel multi-turn attack method inspired by actor-network theory, which models a network of semantically linked actors as attack clues to generate diverse and effective attack paths toward harm… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  3. arXiv:2410.09962  [pdf, other

    cs.CV

    LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models

    Authors: Han Qiu, Jiaxing Huang, Peng Gao, Qin Qi, Xiaoqin Zhang, Ling Shao, Shijian Lu

    Abstract: Hallucination, a phenomenon where multimodal large language models~(MLLMs) tend to generate textual responses that are plausible but unaligned with the image, has become one major hurdle in various MLLM-related applications. Several benchmarks have been created to gauge the hallucination levels of MLLMs, by either raising discriminative questions about the existence of objects or introducing LLM e… ▽ More

    Submitted 15 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

  4. arXiv:2410.09804  [pdf, other

    cs.CR cs.AI cs.CL cs.LG cs.NE

    BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models

    Authors: Xinyuan Wang, Victor Shea-Jay Huang, Renmiao Chen, Hao Wang, Chengwei Pan, Lei Sha, Minlie Huang

    Abstract: While large language models (LLMs) exhibit remarkable capabilities across various tasks, they encounter potential security risks such as jailbreak attacks, which exploit vulnerabilities to bypass security measures and generate harmful outputs. Existing jailbreak strategies mainly focus on maximizing attack success rate (ASR), frequently neglecting other critical factors, including the relevance of… ▽ More

    Submitted 18 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

  5. arXiv:2410.07985  [pdf, other

    cs.CL

    Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

    Authors: Bofei Gao, Feifan Song, Zhe Yang, Zefan Cai, Yibo Miao, Qingxiu Dong, Lei Li, Chenghao Ma, Liang Chen, Runxin Xu, Zhengyang Tang, Benyou Wang, Daoguang Zan, Shanghaoran Quan, Ge Zhang, Lei Sha, Yichang Zhang, Xuancheng Ren, Tianyu Liu, Baobao Chang

    Abstract: Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging bench… ▽ More

    Submitted 10 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: 26 Pages, 17 Figures

  6. arXiv:2410.03421  [pdf, other

    cs.CL cs.AI

    One2set + Large Language Model: Best Partners for Keyphrase Generation

    Authors: Liangying Shao, Liang Zhang, Minlong Peng, Guoqi Ma, Hao Yue, Mingming Sun, Jinsong Su

    Abstract: Keyphrase generation (KPG) aims to automatically generate a collection of phrases representing the core concepts of a given document. The dominant paradigms in KPG include one2seq and one2set. Recently, there has been increasing interest in applying large language models (LLMs) to KPG. Our preliminary experiments reveal that it is challenging for a single model to excel in both recall and precisio… ▽ More

    Submitted 20 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted by EMNLP 2024 Main Conference

  7. arXiv:2410.02237  [pdf, other

    cs.CV

    Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features

    Authors: Chengkai Hou, Zhengrong Xue, Bingyang Zhou, Jinghan Ke, Lin Shao, Huazhe Xu

    Abstract: Detecting 3D keypoints with semantic consistency is widely used in many scenarios such as pose estimation, shape registration and robotics. Currently, most unsupervised 3D keypoint detection methods focus on the rigid-body objects. However, when faced with deformable objects, the keypoints they identify do not preserve semantic consistency well. In this paper, we introduce an innovative unsupervis… ▽ More

    Submitted 16 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

  8. arXiv:2410.01702  [pdf, other

    cs.RO

    D(R, O) Grasp: A Unified Representation of Robot and Object Interaction for Cross-Embodiment Dexterous Grasping

    Authors: Zhenyu Wei, Zhixuan Xu, Jingxiang Guo, Yiwen Hou, Chongkai Gao, Zhehao Cai, Jiayu Luo, Lin Shao

    Abstract: Dexterous grasping is a fundamental yet challenging skill in robotic manipulation, requiring precise interaction between robotic hands and objects. In this paper, we present D(R,O) Grasp, a novel framework that models the interaction between the robotic hand in its grasping pose and the object, enabling broad generalization across various robot hands and object geometries. Our model takes the robo… ▽ More

    Submitted 8 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

  9. arXiv:2409.17725  [pdf, other

    cs.RO

    Stable Object Placement Under Geometric Uncertainty via Differentiable Contact Dynamics

    Authors: Linfeng Li, Gang Yang, Lin Shao, David Hsu

    Abstract: From serving a cup of coffee to carefully rearranging delicate items, stable object placement is a crucial skill for future robots. This skill is challenging due to the required accuracy, which is difficult to achieve under geometric uncertainty. We leverage differentiable contact dynamics to develop a principled method for stable object placement under geometric uncertainty. We estimate the geome… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  10. arXiv:2409.05898  [pdf, other

    cs.LG cs.AI cs.RO

    Simplex-enabled Safe Continual Learning Machine

    Authors: Hongpeng Cao, Yanbing Mao, Yihao Cai, Lui Sha, Marco Caccamo

    Abstract: This paper proposes the SeC-Learning Machine: Simplex-enabled safe continual learning for safety-critical autonomous systems. The SeC-learning machine is built on Simplex logic (that is, ``using simplicity to control complexity'') and physics-regulated deep reinforcement learning (Phy-DRL). The SeC-learning machine thus constitutes HP (high performance)-Student, HA (high assurance)-Teacher, and Co… ▽ More

    Submitted 5 October, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  11. arXiv:2409.03788  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    HSF: Defending against Jailbreak Attacks with Hidden State Filtering

    Authors: Cheng Qian, Hainan Zhang, Lei Sha, Zhiming Zheng

    Abstract: With the growing deployment of LLMs in daily applications like chatbots and content generation, efforts to ensure outputs align with human values and avoid harmful content have intensified. However, increasingly sophisticated jailbreak attacks threaten this alignment, aiming to induce unsafe outputs. Current defense efforts either focus on prompt rewriting or detection, which are limited in effect… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 13 pages

  12. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 23 pages, 6 figures

  13. arXiv:2409.02689  [pdf

    physics.app-ph cs.ET

    Frequency-domain Parallel Computing Using Single On-Chip Nonlinear Acoustic-wave Device

    Authors: Jun Ji, Zichen Xi, Bernadeta R. Srijanto, Ivan I. Kravchenko, Ming Jin, Wenjie Xiong, Linbo Shao

    Abstract: Multiply-accumulation (MAC) is a crucial computing operation in signal processing, numerical simulations, and machine learning. This work presents a scalable, programmable, frequency-domain parallel computing leveraging gigahertz (GHz)-frequency acoustic-wave nonlinearities. By encoding data in the frequency domain, a single nonlinear acoustic-wave device can perform a billion arithmetic operation… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  14. arXiv:2407.14198  [pdf

    cs.CV eess.IV

    Double-Shot 3D Shape Measurement with a Dual-Branch Network

    Authors: Mingyang Lei, Jingfan Fan, Long Shao, Hong Song, Deqiang Xiao, Danni Ai, Tianyu Fu, Ying Gu, Jian Yang

    Abstract: The structured light (SL)-based 3D measurement techniques with deep learning have been widely studied, among which speckle projection profilometry (SPP) and fringe projection profilometry (FPP) are two popular methods. However, they generally use a single projection pattern for reconstruction, resulting in fringe order ambiguity or poor reconstruction accuracy. To alleviate these problems, we prop… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  15. arXiv:2407.04737   

    eess.SP cs.AI

    Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning

    Authors: Yuanyuan Duan, Haiyang Feng, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu

    Abstract: With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on… ▽ More

    Submitted 26 September, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: The data needs to be experimentally revalidated, and the experimental details require further optimization

  16. arXiv:2407.03245  [pdf, other

    cs.RO cs.AI eess.SY

    TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

    Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, Lin Shao

    Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More

    Submitted 19 October, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by CoRL 2024 as Oral presentation, camera-ready version

  17. arXiv:2406.11354  [pdf, other

    cs.CL cs.AI cs.CV

    Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

    Authors: Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin

    Abstract: Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.00954  [pdf, other

    cs.CL cs.AI

    Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification

    Authors: Shiqi Liu, Sannyuya Liu, Lele Sha, Zijie Zeng, Dragan Gasevic, Zhi Liu

    Abstract: Various machine learning approaches have gained significant popularity for the automated classification of educational text to identify indicators of learning engagement -- i.e. learning engagement classification (LEC). LEC can offer comprehensive insights into human learning processes, attracting significant interest from diverse research communities, including Natural Language Processing (NLP),… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: The manuscript has been submitted for peer review to the IEEE Transactions on Learning Technologies

  19. arXiv:2405.18111  [pdf, other

    cs.CL

    ATM: Adversarial Tuning Multi-agent System Makes a Robust Retrieval-Augmented Generator

    Authors: Junda Zhu, Lingyong Yan, Haibo Shi, Dawei Yin, Lei Sha

    Abstract: Large language models (LLMs) are proven to benefit a lot from retrieval-augmented generation (RAG) in alleviating hallucinations confronted with knowledge-intensive questions. RAG adopts information retrieval techniques to inject external knowledge from semantic-relevant documents as input contexts. However, since today's Internet is flooded with numerous noisy and fabricating content, it is inevi… ▽ More

    Submitted 8 October, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 18 pages

  20. arXiv:2405.12669  [pdf, other

    cs.CL

    A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

    Authors: Huangjun Shen, Liangying Shao, Wenbo Li, Zhibin Lan, Zhanyu Liu, Jinsong Su

    Abstract: In recent years, multi-modal machine translation has attracted significant interest in both academia and industry due to its superior performance. It takes both textual and visual modalities as inputs, leveraging visual context to tackle the ambiguities in source texts. In this paper, we begin by offering an exhaustive overview of 99 prior works, comprehensively summarizing representative studies… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  21. arXiv:2405.07696  [pdf, other

    cs.CV

    MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

    Authors: Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu

    Abstract: Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimensions, depths, and orientations. We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders that addresses t… ▽ More

    Submitted 15 October, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024

  22. arXiv:2405.07162  [pdf, other

    cs.RO cs.AI

    Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

    Authors: Yuwei Zeng, Yao Mu, Lin Shao

    Abstract: Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to lear… ▽ More

    Submitted 15 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  23. arXiv:2405.06964  [pdf, other

    cs.RO cs.AI

    ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots

    Authors: Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianrun Hu, Tianyi Chen, Zhouliang Yu, Lin Shao

    Abstract: To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to dev… ▽ More

    Submitted 25 September, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

  24. arXiv:2405.01066  [pdf, other

    cs.CV cs.AI cs.HC

    HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images

    Authors: Zixun Jiao, Xihan Wang, Zhaoqiang Xia, Lianhe Shao, Quanli Gao

    Abstract: Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high comput… ▽ More

    Submitted 14 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 12 pages, 6 figures

  25. arXiv:2404.12879  [pdf, other

    cs.CL

    Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation

    Authors: Guanhua Chen, Wenhan Yu, Lei Sha

    Abstract: While Retrieval-Augmented Generation (RAG) plays a crucial role in the application of Large Language Models (LLMs), existing retrieval methods in knowledge-dense domains like law and medicine still suffer from a lack of multi-perspective views, which are essential for improving interpretability and reliability. Previous research on multi-view retrieval often focused solely on different semantic fo… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  26. arXiv:2404.06939   

    cs.ET cs.AI

    Fast System Technology Co-Optimization Framework for Emerging Technology Based on Graph Neural Networks

    Authors: Tianliang Ma, Guangxi Fan, Xuguang Sun, Zhihui Deng, Kainlu Low, Leilai Shao

    Abstract: This paper proposes a fast system technology co-optimization (STCO) framework that optimizes power, performance, and area (PPA) for next-generation IC design, addressing the challenges and opportunities presented by novel materials and device architectures. We focus on accelerating the technology level of STCO using AI techniques, by employing graph neural network (GNN)-based approaches for both T… ▽ More

    Submitted 25 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: We found some errors in Figure.3 ,and we need some time to reconduct experiments. Therefore, we want to withdrawal our article now

  27. arXiv:2404.04943  [pdf

    cs.LG cs.AI cs.AR

    Chiplet Placement Order Exploration Based on Learning to Rank with Graph Representation

    Authors: Zhihui Deng, Yuanyuan Duan, Leilai Shao, Xiaolei Zhu

    Abstract: Chiplet-based systems, integrating various silicon dies manufactured at different integrated circuit technology nodes on a carrier interposer, have garnered significant attention in recent years due to their cost-effectiveness and competitive performance. The widespread adoption of reinforcement learning as a sequential placement method has introduced a new challenge in determining the optimal pla… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 6 pages, 8 figures and 6 tables, accepted by the Conference ISEDA

  28. arXiv:2404.03179  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization

    Authors: Tiantian Geng, Teng Wang, Yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng, Ling shao

    Abstract: Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio… ▽ More

    Submitted 11 August, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  29. arXiv:2404.01780  [pdf, other

    astro-ph.IM astro-ph.GA cs.CV

    CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

    Authors: Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban, Yuedong Fang, Qi Guo, Dezi Liu, Guoliang Li, Lin Lin, Ming Li, Ran Li, Xiaobo Li, Yu Luo, Xianmin Meng, Jundan Nie, Zhaoxiang Qi, Yisheng Qiu, Li Shao, Hao Tian , et al. (7 additional authors not shown)

    Abstract: Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to extract them from billions of astronomical objects. The existing mainstream lens-finding tools are based on machine learning algorithms and applied to… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: The paper is accepted by the AJ. The complete code could be downloaded with DOI of: 10.12149/101393. Comments are welcome

  30. arXiv:2403.19460  [pdf, other

    cs.RO cs.AI

    RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation

    Authors: Chongkai Gao, Zhengrong Xue, Shuying Deng, Tianhai Liang, Siqi Yang, Lin Shao, Huazhe Xu

    Abstract: We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target poses of objects for manipulation without any object segmentation. RiEMann learns a manipulation task from scratch with 5 to 10 demonstrations, gener… ▽ More

    Submitted 3 October, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  31. arXiv:2403.19346  [pdf, other

    cs.CL

    Large Language Models Are Unconscious of Unreasonability in Math Problems

    Authors: Jingyuan Ma, Damai Dai, Lei Sha, Zhifang Sui

    Abstract: Large language models (LLMs) demonstrate substantial capabilities in solving math problems. However, they tend to produce hallucinations when given questions containing unreasonable errors. In this paper, we study the behavior of LLMs when faced with unreasonable math problems and further explore their potential to address these problems. We construct the Unreasonable Math Problem (UMP) benchmark… ▽ More

    Submitted 1 October, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 11 pages, 3 figures

  32. arXiv:2403.07807  [pdf, other

    cs.CV

    StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting

    Authors: Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu

    Abstract: We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and dec… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  33. arXiv:2403.06444  [pdf, other

    cs.CV

    Latent Semantic Consensus For Deterministic Geometric Model Fitting

    Authors: Guobao Xiao, Jun Yu, Jiayi Ma, Deng-Ping Fan, Ling Shao

    Abstract: Estimating reliable geometric model parameters from the data with severe outliers is a fundamental and important task in computer vision. This paper attempts to sample high-quality subsets and select model instances to estimate parameters in the multi-structural data. To address this, we propose an effective method called Latent Semantic Consensus (LSC). The principle of LSC is to preserve the lat… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  34. arXiv:2403.03506  [pdf, other

    cs.CL cs.AI

    Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

    Authors: Zijie Zeng, Shiqi Liu, Lele Sha, Zhuang Li, Kaixun Yang, Sannyuya Liu, Dragan Gašević, Guanliang Chen

    Abstract: This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different ty… ▽ More

    Submitted 23 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Camera-Ready version of our IJCAI 2024 accepted paper (Special Track: AI and Social Good)

  35. arXiv:2402.16444  [pdf, other

    cs.CL

    ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

    Authors: Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang

    Abstract: The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs' responses in an aligned, customizable and explainable manner. In this paper, we propose ShieldLM, an LLM-based safety detector, which aligns with general human safety standards, supports customizable detection rules, and… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 17 pages

  36. arXiv:2402.16006  [pdf, other

    cs.CL

    ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings

    Authors: Hao Wang, Hao Li, Minlie Huang, Lei Sha

    Abstract: The safety defense methods of Large language models(LLMs) stays limited because the dangerous prompts are manually curated to just few known attack types, which fails to keep pace with emerging varieties. Recent studies found that attaching suffixes to harmful instructions can hack the defense of LLMs and lead to dangerous outputs. However, similar to traditional text adversarial attacks, this app… ▽ More

    Submitted 3 June, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  37. arXiv:2402.05176  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.SR cs.LG

    cecilia: A Machine Learning-Based Pipeline for Measuring Metal Abundances of Helium-rich Polluted White Dwarfs

    Authors: M. Badenas-Agusti, J. Viaña, A. Vanderburg, S. Blouin, P. Dufour, S. Xu, L. Sha

    Abstract: Over the past several decades, conventional spectral analysis techniques of polluted white dwarfs have become powerful tools to learn about the geology and chemistry of extrasolar bodies. Despite their proven capabilities and extensive legacy of scientific discoveries, these techniques are however still limited by their manual, time-intensive, and iterative nature. As a result, they are susceptibl… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 28 pages, 16 figures, 5 tables. Accepted for publication in MNRAS

  38. arXiv:2402.04160  [pdf, other

    cs.CL

    Harnessing the Plug-and-Play Controller by Prompting

    Authors: Hao Wang, Lei Sha

    Abstract: Controllable text generation is a growing field within natural language generation (NLG) that focuses on producing text that meets specific constraints in real-world applications. Previous approaches, such as plug-and-play controllers (PPCs), aimed to steer the properties of generated text in a flexible manner. However, these methods often compromised the integrity of the language model's decoding… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: The Third Version of the Generation, Evaluation & Metrics (GEM) Workshop in EMNLP 2023

  39. arXiv:2402.03631  [pdf, other

    cs.CV

    CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model

    Authors: Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu

    Abstract: The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just f… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ECCV 2024

  40. arXiv:2402.02968  [pdf, other

    cs.CV cs.LG

    Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

    Authors: Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, Ling Shao, Yi Yang, Bojun Gao, Qun Li, Guobin Wu

    Abstract: Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilitie… ▽ More

    Submitted 26 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles(T-IV). 24 pages, 9 figures, 1 table

  41. DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first dom… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  42. arXiv:2401.07721  [pdf, other

    cs.CV

    Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation

    Authors: Hao Tang, Ling Shao, Nicu Sebe, Luc Van Gool

    Abstract: We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for challenging graph-constrained architectural layout generation tasks. The proposed graph-Transformer-based generator includes a novel graph Transformer encoder that combines graph convolutions and self-attentions in a Transformer to model both local and gl… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted to TPAMI, an extended version of a paper published in CVPR2023. arXiv admin note: substantial text overlap with arXiv:2303.08225

  43. arXiv:2401.06969  [pdf, other

    cs.CV

    Domain Adaptation for Large-Vocabulary Object Detectors

    Authors: Kai Jiang, Jiaxing Huang, Weiying Xie, Jie Lei, Yunsong Li, Ling Shao, Shijian Lu

    Abstract: Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CL… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  44. arXiv:2312.16895  [pdf, other

    cs.LG cs.AR

    RLPlanner: Reinforcement Learning based Floorplanning for Chiplets with Fast Thermal Analysis

    Authors: Yuanyuan Duan, Xingchen Liu, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu

    Abstract: Chiplet-based systems have gained significant attention in recent years due to their low cost and competitive performance. As the complexity and compactness of a chiplet-based system increase, careful consideration must be given to microbump assignments, interconnect delays, and thermal limitations during the floorplanning stage. This paper introduces RLPlanner, an efficient early-stage floorplann… ▽ More

    Submitted 16 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  45. arXiv:2312.12784  [pdf, other

    cs.LG

    Fast Cell Library Characterization for Design Technology Co-Optimization Based on Graph Neural Networks

    Authors: Tianliang Ma, Guangxi Fan, Zhihui Deng, Xuguang Sun, Kainlu Low, Leilai Shao

    Abstract: Design technology co-optimization (DTCO) plays a critical role in achieving optimal power, performance, and area (PPA) for advanced semiconductor process development. Cell library characterization is essential in DTCO flow, but traditional methods are time-consuming and costly. To overcome these challenges, we propose a graph neural network (GNN)-based machine learning model for rapid and accurate… ▽ More

    Submitted 19 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  46. arXiv:2312.06454  [pdf, other

    eess.IV cs.CV cs.LG

    Point Transformer with Federated Learning for Predicting Breast Cancer HER2 Status from Hematoxylin and Eosin-Stained Whole Slide Images

    Authors: Bao Li, Zhenyu Liu, Lizhi Shao, Bensheng Qiu, Hong Bu, Jie Tian

    Abstract: Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation a… ▽ More

    Submitted 27 February, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  47. arXiv:2312.06164  [pdf, other

    cs.CV

    ReshapeIT: Reliable Shape Interaction with Implicit Template for Anatomical Structure Reconstruction

    Authors: Minghui Zhang, Hao Zheng, Yawen Huang, Ling Shao, Yun Gu

    Abstract: Shape modeling of volumetric medical images is crucial for quantitative analysis and surgical planning in computer-aided diagnosis. To alleviate the burden of expert clinicians, reconstructed shapes are typically obtained from deep learning models, such as Convolutional Neural Networks (CNNs) or transformer-based architectures, followed by the marching cube algorithm. However, automatic shape reco… ▽ More

    Submitted 30 September, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

  48. arXiv:2312.03297  [pdf, other

    cs.RO cs.AI cs.GR

    SoftMAC: Differentiable Soft Body Simulation with Forecast-based Contact Model and Two-way Coupling with Articulated Rigid Bodies and Clothes

    Authors: Min Liu, Gang Yang, Siyuan Luo, Lin Shao

    Abstract: Differentiable physics simulation provides an avenue to tackle previously intractable challenges through gradient-based optimization, thereby greatly improving the efficiency of solving robotics-related problems. To apply differentiable simulation in diverse robotic manipulation scenarios, a key challenge is to integrate various materials in a unified framework. We present SoftMAC, a differentiabl… ▽ More

    Submitted 26 July, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Accepted to IROS 2024

  49. Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing

    Authors: Ayoosh Bansal, Yang Zhao, James Zhu, Sheng Cheng, Yuliang Gu, Hyung-Jin Yoon, Hunmin Kim, Naira Hovakimyan, Lui Sha

    Abstract: Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: To appear in AIAA SciTech 2024

    ACM Class: C.3; C.4; J.7

    Journal ref: AIAA SCITECH 2024 Forum, p. 1167

  50. arXiv:2312.00277  [pdf, other

    cs.LG cs.CL

    Text Attribute Control via Closed-Loop Disentanglement

    Authors: Lei Sha, Thomas Lukasiewicz

    Abstract: Changing an attribute of a text without changing the content usually requires to first disentangle the text into irrelevant attributes and content representations. After that, in the inference phase, the representation of one attribute is tuned to a different value, expecting that the corresponding attribute of the text can also be changed accordingly. The usual way of disentanglement is to add so… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: accepted by TACL 2023