Skip to main content

Showing 1–50 of 305 results for author: Liang, S

  1. arXiv:2410.15621  [pdf, other

    cs.PF

    DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs

    Authors: Mingkai Chen, Tianhua Han, Cheng Liu, Shengwen Liang, Kuai Yu, Lei Dai, Ziming Yuan, Ying Wang, Lei Zhang, Huawei Li, Xiaowei Li

    Abstract: Approximate Nearest Neighbor Search (ANNS), which enables efficient semantic similarity search in large datasets, has become a fundamental component of critical applications such as information retrieval and retrieval-augmented generation (RAG). However, ANNS is a well-known I/O-intensive algorithm with a low compute-to-I/O ratio, often requiring massive storage due to the large volume of high-dim… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  2. arXiv:2410.11182  [pdf, other

    cs.LG cs.AI cs.CR

    Archilles' Heel in Semi-open LLMs: Hiding Bottom against Recovery Attacks

    Authors: Hanbo Huang, Yihan Li, Bowen Jiang, Lin Liu, Ruoyu Sun, Zhuotao Liu, Shiyu Liang

    Abstract: Closed-source large language models deliver strong performance but have limited downstream customizability. Semi-open models, combining both closed-source and public layers, were introduced to improve customizability. However, parameters in the closed-source layers are found vulnerable to recovery attacks. In this paper, we explore the design of semi-open models with fewer closed-source layers, ai… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 10 pages for main content of the paper

  3. arXiv:2410.10160  [pdf, other

    cs.CV

    Will the Inclusion of Generated Data Amplify Bias Across Generations in Future Image Classification Models?

    Authors: Zeliang Zhang, Xin Liang, Mingqian Feng, Susan Liang, Chenliang Xu

    Abstract: As the demand for high-quality training data escalates, researchers have increasingly turned to generative models to create synthetic data, addressing data scarcity and enabling continuous model improvement. However, reliance on self-generated data introduces a critical question: Will this practice amplify bias in future models? While most research has focused on overall performance, the impact on… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

    Comments: 15 pages, 7 figures

  4. arXiv:2410.08970  [pdf, other

    cs.CL cs.AI

    NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

    Authors: Zheng Yi Ho, Siyuan Liang, Sen Zhang, Yibing Zhan, Dacheng Tao

    Abstract: Hallucinations in Large Language Models (LLMs) remain a major obstacle, particularly in high-stakes applications where factual accuracy is critical. While representation editing and reading methods have made strides in reducing hallucinations, their heavy reliance on specialised tools and training on in-domain samples, makes them difficult to scale and prone to overfitting. This limits their accur… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  5. arXiv:2410.07463  [pdf, other

    cs.CV

    Language-Guided Joint Audio-Visual Editing via One-Shot Adaptation

    Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

    Abstract: In this paper, we introduce a novel task called language-guided joint audio-visual editing. Given an audio and image pair of a sounding event, this task aims at generating new audio-visual content by editing the given sounding event conditioned on the language guidance. For instance, we can alter the background environment of a sounding object while keeping its appearance unchanged, or we can add… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: ACCV 2024

  6. arXiv:2410.06883  [pdf, other

    cs.LG cs.AI

    Degree Distribution based Spiking Graph Networks for Domain Adaptation

    Authors: Yingxu Wang, Siwei Liu, Mengzhu Wang, Shangsong Liang, Nan Yin

    Abstract: Spiking Graph Networks (SGNs) have garnered significant attraction from both researchers and industry due to their ability to address energy consumption challenges in graph classification. However, SGNs are only effective for in-distribution data and cannot tackle out-of-distribution data. In this paper, we first propose the domain adaptation problem in SGNs, and introduce a novel framework named… ▽ More

    Submitted 9 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

  7. arXiv:2410.04884  [pdf, other

    cs.CV cs.AI

    Patch is Enough: Naturalistic Adversarial Patch against Vision-Language Pre-training Models

    Authors: Dehong Kong, Siyuan Liang, Xiaopeng Zhu, Yuansheng Zhong, Wenqi Ren

    Abstract: Visual language pre-training (VLP) models have demonstrated significant success across various domains, yet they remain vulnerable to adversarial attacks. Addressing these adversarial vulnerabilities is crucial for enhancing security in multimodal learning. Traditionally, adversarial methods targeting VLP models involve simultaneously perturbing images and text. However, this approach faces notabl… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: accepted by Visual Intelligence

  8. arXiv:2410.04479  [pdf, other

    eess.IV cs.CV cs.LG

    SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems

    Authors: Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang

    Abstract: Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  9. arXiv:2410.03988  [pdf, other

    stat.ML cs.LG

    Implicit Bias of Mirror Descent for Shallow Neural Networks in Univariate Regression

    Authors: Shuang Liang, Guido Montúfar

    Abstract: We examine the implicit bias of mirror flow in univariate least squares error regression with wide and shallow neural networks. For a broad class of potential functions, we show that mirror flow exhibits lazy training and has the same implicit bias as ordinary gradient flow when the network width tends to infinity. For ReLU networks, we characterize this bias through a variational problem in funct… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  10. arXiv:2410.01495  [pdf, other

    cs.HC

    Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Lan Chen, Haoyu Chen, Hao Gu, Zhuofan Wen, Shun Chen, Siyuan Zhang, Hailiang Yao, Mingyu Xu, Kang Chen, Bin Liu, Rui Liu, Shan Liang, Ya Li, Jiangyan Yi, Jianhua Tao

    Abstract: Multimodal Emotion Recognition (MER) is an important research topic. This paper advocates for a transformative paradigm in MER. The rationale behind our work is that current approaches often rely on a limited set of basic emotion labels, which do not adequately represent the rich spectrum of human emotions. These traditional and overly simplistic emotion categories fail to capture the inherent com… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  11. arXiv:2410.00835  [pdf, other

    math.NA cs.LG

    Solving High-Dimensional Partial Integral Differential Equations: The Finite Expression Method

    Authors: Gareth Hardwick, Senwei Liang, Haizhao Yang

    Abstract: In this paper, we introduce a new finite expression method (FEX) to solve high-dimensional partial integro-differential equations (PIDEs). This approach builds upon the original FEX and its inherent advantages with new advances: 1) A novel method of parameter grouping is proposed to reduce the number of coefficients in high-dimensional function approximation; 2) A Taylor series approximation metho… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 18 pages, 10 figures

  12. arXiv:2409.20034  [pdf, other

    cs.CV

    Camera Calibration using a Collimator System

    Authors: Shunkun Liang, Banglei Guan, Zhenbao Yu, Pengju Sun, Yang Shang

    Abstract: Camera calibration is a crucial step in photogrammetry and 3D vision applications. In practical scenarios with a long working distance to cover a wide area, target-based calibration methods become complicated and inflexible due to site limitations. This paper introduces a novel camera calibration method using a collimator system, which can provide a reliable and controllable calibration environmen… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV2024 (oral presentation)

  13. arXiv:2409.19650  [pdf, other

    cs.CV cs.AI

    Grounding 3D Scene Affordance From Egocentric Interactions

    Authors: Cuiyu Liu, Wei Zhai, Yuhang Yang, Hongchen Luo, Sen Liang, Yang Cao, Zheng-Jun Zha

    Abstract: Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the env… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  14. arXiv:2409.19526  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats

    Authors: Kuanrong Liu, Siyuan Liang, Jiawei Liang, Pengwen Dai, Xiaochun Cao

    Abstract: Multimodal contrastive learning uses various data modalities to create high-quality features, but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor attacks. These attacks insert malicious behaviors during training, which are activated by specific triggers during inference, posing significant security risks. Despite existing countermeasures through fine-tuning t… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  15. arXiv:2409.19499  [pdf, other

    cs.RO

    Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface

    Authors: Ziniu Wu, Tianyu Wang, Zhaxizhuoma, Chuyue Guan, Zhongjie Jia, Shuai Liang, Haoming Song, Delin Qu, Dong Wang, Zhigang Wang, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li

    Abstract: Collecting real-world manipulation trajectory data involving robotic arms is essential for developing general-purpose action policies in robotic manipulation, yet such data remains scarce. Existing methods face limitations such as high costs, labor intensity, hardware dependencies, and complex setup requirements involving SLAM algorithms. In this work, we introduce Fast-UMI, an interface-mediated… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  16. arXiv:2409.18486  [pdf, other

    cs.CL

    Evaluation of OpenAI o1: Opportunities and Challenges of AGI

    Authors: Tianyang Zhong, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu, Yanjun Lyu, Peng Shu, Xiaowei Yu, Chao Cao, Hanqi Jiang, Hanxu Chen, Yiwei Li, Junhao Chen, Huawen Hu, Yihen Liu, Huaqin Zhao, Shaochen Xu, Haixing Dai, Lin Zhao, Ruidong Zhang, Wei Zhao, Zhenyuan Yang, Jingyuan Chen , et al. (53 additional authors not shown)

    Abstract: This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performan… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  17. arXiv:2409.17601  [pdf, other

    cs.CV cs.AI

    TA-Cleaner: A Fine-grained Text Alignment Backdoor Defense Strategy for Multimodal Contrastive Learning

    Authors: Yuan Xun, Siyuan Liang, Xiaojun Jia, Xinwei Liu, Xiaochun Cao

    Abstract: Pre-trained large models for multimodal contrastive learning, such as CLIP, have been widely recognized in the industry as highly susceptible to data-poisoned backdoor attacks. This poses significant risks to downstream model training. In response to such potential threats, finetuning offers a simpler and more efficient defense choice compared to retraining large models with augmented data. In the… ▽ More

    Submitted 7 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

  18. arXiv:2409.16057  [pdf, other

    cs.CV cs.AI

    Towards Robust Object Detection: Identifying and Removing Backdoors via Module Inconsistency Analysis

    Authors: Xianda Zhang, Siyuan Liang

    Abstract: Object detection models, widely used in security-critical applications, are vulnerable to backdoor attacks that cause targeted misclassifications when triggered by specific patterns. Existing backdoor defense techniques, primarily designed for simpler models like image classifiers, often fail to effectively detect and remove backdoors in object detectors. We propose a backdoor defense framework ta… ▽ More

    Submitted 30 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

  19. arXiv:2409.15968  [pdf, other

    cs.CV

    Adversarial Backdoor Defense in CLIP

    Authors: Junhao Kuang, Siyuan Liang, Jiawei Liang, Kuanrong Liu, Xiaochun Cao

    Abstract: Multimodal contrastive pretraining, exemplified by models like CLIP, has been found to be vulnerable to backdoor attacks. While current backdoor defense methods primarily employ conventional data augmentation to create augmented samples aimed at feature alignment, these methods fail to capture the distinct features of backdoor samples, resulting in suboptimal defense performance. Observations reve… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  20. arXiv:2409.15654  [pdf, other

    cs.AR

    Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM

    Authors: Zhongkai Yu, Shengwen Liang, Tianyun Ma, Yunke Cai, Ziyuan Nan, Di Huang, Xinkai Song, Yifan Hao, Jie Zhang, Tian Zhi, Yongwei Zhao, Zidong Du, Xing Hu, Qi Guo, Tianshi Chen

    Abstract: Deploying advanced large language models on edge devices, such as smartphones and robotics, is a growing trend that enhances user data privacy and network connectivity resilience while preserving intelligent capabilities. However, such a task exhibits single-batch computing with incredibly low arithmetic intensity, which poses the significant challenges of huge memory footprint and bandwidth deman… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: 15 pages, 16 figures

    Journal ref: MICRO 2024

  21. arXiv:2409.14908  [pdf

    cs.RO cs.AI

    KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems

    Authors: Zixuan Wang, Bo Yu, Junzhe Zhao, Wenhao Sun, Sai Hou, Shuai Liang, Xing Hu, Yinhe Han, Yiming Gan

    Abstract: Embodied AI agents responsible for executing interconnected, long-sequence household tasks often face difficulties with in-context memory, leading to inefficiencies and errors in task execution. To address this issue, we introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules, enhancing large language models (LLMs) for planning in embodied agents throug… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  22. arXiv:2409.14201  [pdf, other

    cs.CV

    LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

    Authors: Nan Jiang, Shanchao Liang, Chengxiao Wang, Jiannan Wang, Lin Tan

    Abstract: Portable Document Format (PDF) files are dominantly used for storing and disseminating scientific research, legal documents, and tax information. LaTeX is a popular application for creating PDF documents. Despite its advantages, LaTeX is not WYSWYG -- what you see is what you get, i.e., the LaTeX source and rendered PDF images look drastically different, especially for formulae and tables. This ga… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

  23. arXiv:2409.14122  [pdf, other

    cs.CR cs.LG

    Efficient and Effective Model Extraction

    Authors: Hongyu Zhu, Wentao Hu, Sichu Liang, Fangqi Li, Wenwen Wang, Shilin Wang

    Abstract: Model extraction aims to create a functionally similar copy from a machine learning as a service (MLaaS) API with minimal overhead, typically for illicit profit or as a precursor to further attacks, posing a significant threat to the MLaaS ecosystem. However, recent studies have shown that model extraction is highly inefficient, particularly when the target task distribution is unavailable. In suc… ▽ More

    Submitted 24 September, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

  24. arXiv:2409.11689  [pdf, other

    cs.CV cs.AI

    GUNet: A Graph Convolutional Network United Diffusion Model for Stable and Diversity Pose Generation

    Authors: Shuowen Liang, Sisi Li, Qingyun Wang, Cen Zhang, Kaiquan Zhu, Tian Yang

    Abstract: Pose skeleton images are an important reference in pose-controllable image generation. In order to enrich the source of skeleton images, recent works have investigated the generation of pose skeletons based on natural language. These methods are based on GANs. However, it remains challenging to perform diverse, structurally correct and aesthetically pleasing human pose skeleton generation with var… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  25. arXiv:2409.10643  [pdf, other

    cs.CR cs.LG

    CaBaGe: Data-Free Model Extraction using ClAss BAlanced Generator Ensemble

    Authors: Jonathan Rosenthal, Shanchao Liang, Kevin Zhang, Lin Tan

    Abstract: Machine Learning as a Service (MLaaS) is often provided as a pay-per-query, black-box system to clients. Such a black-box approach not only hinders open replication, validation, and interpretation of model results, but also makes it harder for white-hat researchers to identify vulnerabilities in the MLaaS systems. Model extraction is a promising technique to address these challenges by reverse-eng… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  26. arXiv:2409.08349  [pdf, other

    physics.soc-ph cs.IT cs.SI

    Scientific and technological knowledge grows linearly over time

    Authors: Huquan Kang, Luoyi Fu, Russell J. Funk, Xinbing Wang, Jiaxin Ding, Shiyu Liang, Jianghao Wang, Lei Zhou, Chenghu Zhou

    Abstract: The past few centuries have witnessed a dramatic growth in scientific and technological knowledge. However, the nature of that growth - whether exponential or otherwise - remains controversial, perhaps partly due to the lack of quantitative characterizations. We evaluated knowledge as a collective thinking structure, using citation networks as a representation, by examining extensive datasets that… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  27. arXiv:2409.07321  [pdf, other

    cs.CV cs.AI

    Module-wise Adaptive Adversarial Training for End-to-end Autonomous Driving

    Authors: Tianyuan Zhang, Lu Wang, Jiaqi Kang, Xinwei Zhang, Siyuan Liang, Yuwei Chen, Aishan Liu, Xianglong Liu

    Abstract: Recent advances in deep learning have markedly improved autonomous driving (AD) models, particularly end-to-end systems that integrate perception, prediction, and planning stages, achieving state-of-the-art performance. However, these models remain vulnerable to adversarial attacks, where human-imperceptible perturbations can disrupt decision-making processes. While adversarial training is an effe… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 14 pages

  28. arXiv:2409.05379  [pdf, other

    cs.CV cs.AI cs.GR

    PersonaTalk: Bring Attention to Your Persona in Visual Dubbing

    Authors: Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu

    Abstract: For audio-driven visual dubbing, it remains a considerable challenge to uphold and highlight speaker's persona while synthesizing accurate lip synchronization. Existing methods fall short of capturing speaker's unique speaking style or preserving facial details. In this paper, we present PersonaTalk, an attention-based two-stage framework, including geometry construction and face rendering, for hi… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted at SIGGRAPH Asia 2024 (Conference Track)

  29. arXiv:2409.04992  [pdf, other

    cs.AR cs.CL

    InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

    Authors: Xiurui Pan, Endian Li, Qiao Li, Shengwen Liang, Yizhou Shan, Ke Zhou, Yingwei Luo, Xiaolin Wang, Jie Zhang

    Abstract: The widespread of Large Language Models (LLMs) marks a significant milestone in generative AI. Nevertheless, the increasing context length and batch size in offline LLM inference escalate the memory requirement of the key-value (KV) cache, which imposes a huge burden on the GPU VRAM, especially for resource-constraint scenarios (e.g., edge computing and personal devices). Several cost-effective so… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  30. arXiv:2409.04693  [pdf, other

    cs.AI

    MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality

    Authors: Ruiting Dai, Yuqiao Tan, Lisi Mo, Tao He, Ke Qin, Shuang Liang

    Abstract: Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with complete modality settings, which does not accurately reflect real-world scenarios where partial modality information may be missing. In this paper, we present the fi… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  31. arXiv:2408.14487  [pdf, other

    cs.AI cs.LG cs.SC q-bio.MN

    Active learning of digenic functions with boolean matrix logic programming

    Authors: Lun Ai, Stephen H. Muggleton, Shi-shun Liang, Geoff S. Baldwin

    Abstract: We apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery, based on comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs). Predicted host behaviours are not always correctly described by GEMs. Learning the intricate genetic interactions within GEMs presents computational and empirical challenges.… ▽ More

    Submitted 28 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.06724

  32. arXiv:2408.03246  [pdf, other

    cs.CL

    Making Long-Context Language Models Better Multi-Hop Reasoners

    Authors: Yanyang Li, Shuo Liang, Michael R. Lyu, Liwei Wang

    Abstract: Recent advancements in long-context modeling have enhanced language models (LMs) for complex tasks across multiple NLP applications. Despite this progress, we find that these models struggle with multi-hop reasoning and exhibit decreased performance in the presence of noisy contexts. In this paper, we introduce Reasoning with Attributions, a novel approach that prompts LMs to supply attributions f… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: ACL 2024 Main Conference Camera Ready; Dataset, model, and code are available at https://github.com/LaVi-Lab/LongContextReasoner

  33. arXiv:2408.02882  [pdf, other

    cs.AI cs.CR cs.LG

    Compromising Embodied Agents with Contextual Backdoor Attacks

    Authors: Aishan Liu, Yuguang Zhou, Xianglong Liu, Tianyuan Zhang, Siyuan Liang, Jiakai Wang, Yanjun Pu, Tianlin Li, Junqi Zhang, Wenbo Zhou, Qing Guo, Dacheng Tao

    Abstract: Large language models (LLMs) have transformed the development of embodied intelligence. By providing a few contextual demonstrations, developers can utilize the extensive internal knowledge of LLMs to effortlessly translate complex tasks described in abstract language into sequences of code snippets, which will serve as the execution logic for embodied agents. However, this paper uncovers a signif… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  34. arXiv:2408.00144  [pdf, other

    cs.CL cs.AI

    Distributed In-Context Learning under Non-IID Among Clients

    Authors: Siqi Liang, Sumyeong Ahn, Jiayu Zhou

    Abstract: Advancements in large language models (LLMs) have shown their effectiveness in multiple complicated natural language reasoning tasks. A key challenge remains in adapting these models efficiently to new or unfamiliar tasks. In-context learning (ICL) provides a promising solution for few-shot adaptation by retrieving a set of data points relevant to a query, called in-context examples (ICE), from a… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: 12 pages

    ACM Class: I.2.7

  35. arXiv:2407.16307  [pdf, other

    cs.MM cs.CR

    Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning

    Authors: Xinwei Liu, Xiaojun Jia, Yuan Xun, Siyuan Liang, Xiaochun Cao

    Abstract: Multimodal contrastive learning (MCL) has shown remarkable advances in zero-shot classification by learning from millions of image-caption pairs crawled from the Internet. However, this reliance poses privacy risks, as hackers may unauthorizedly exploit image-text data for model training, potentially including personal and privacy-sensitive information. Recent works propose generating unlearnable… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: ACM MM2024

  36. arXiv:2407.15349  [pdf, other

    cs.CV

    RoadPainter: Points Are Ideal Navigators for Topology transformER

    Authors: Zhongxing Ma, Shuang Liang, Yongkun Wen, Weixin Lu, Guowei Wan

    Abstract: Topology reasoning aims to provide a precise understanding of road scenes, enabling autonomous systems to identify safe and efficient routes. In this paper, we present RoadPainter, an innovative approach for detecting and reasoning the topology of lane centerlines using multi-view images. The core concept behind RoadPainter is to extract a set of points from each centerline mask to improve the acc… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 17 pages, 5 figures, Accepted by ECCV 2024

  37. arXiv:2407.12575  [pdf, other

    cs.AR

    Graphitron: A Domain Specific Language for FPGA-based Graph Processing Accelerator Generation

    Authors: Xinmiao Zhang, Zheng Feng, Shengwen Liang, Xinyu Chen, Cheng Liu, Huawei Li, Xiaowei Li

    Abstract: FPGA-based graph processing accelerators, enabling extensive customization, have demonstrated significant energy efficiency over general computing engines like CPUs and GPUs. Nonetheless, customizing accelerators to diverse graph processing algorithms with distinct computational patterns remains challenging and error-prone for high-level application users. To this end, template-based approaches ha… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  38. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  39. arXiv:2407.12168  [pdf, other

    cs.LG math.DS physics.ao-ph

    A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

    Authors: Junqi Yin, Siming Liang, Siyan Liu, Feng Bao, Hristo G. Chipilski, Dan Lu, Guannan Zhang

    Abstract: The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  40. arXiv:2407.08473  [pdf, other

    cs.AR cs.AI

    Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

    Authors: Kaiyan Chang, Zhirong Chen, Yunhao Zhou, Wenlong Zhu, kun wang, Haobo Xu, Cangyuan Li, Mengdi Wang, Shengwen Liang, Huawei Li, Yinhe Han, Ying Wang

    Abstract: Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  41. arXiv:2407.08239  [pdf, other

    cs.SD cs.LG eess.AS

    An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio

    Authors: Siding Zeng, Jiangyan Yi, Jianhua Tao, Yujie Chen, Shan Liang, Yong Ren, Xiaohui Zhang

    Abstract: When the task of locating manipulation regions in partially-fake audio (PFA) involves cross-domain datasets, the performance of deep learning models drops significantly due to the shift between the source and target domains. To address this issue, existing approaches often employ data augmentation before training. However, they overlook the characteristics in target domain that are absent in sourc… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  42. arXiv:2407.02922  [pdf, other

    cs.IT

    Fair Resource Allocation for Probabilistic Semantic Communication in IIoT

    Authors: Siyun Liang, Zhouxiang Zhao, Chen Zhu, Zhaohui Yang, Yinchao Yang, Mohammad Shikh-Bahaei, Zhaoyang Zhang

    Abstract: In this paper, the problem of minimum rate maximization for probabilistic semantic communication (PSCom) in industrial Internet of Things (IIoT) is investigated. In the considered model, users employ semantic information extraction techniques to compress the original data before sending it to the base station (BS). During this semantic compression process, knowledge graphs are employed to represen… ▽ More

    Submitted 8 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  43. arXiv:2407.00600  [pdf, other

    cs.CV cs.AI

    GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

    Authors: Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, Dacheng Tao

    Abstract: Large Vision-Language Models (LVLMs) have been widely adopted in various applications; however, they exhibit significant gender biases. Existing benchmarks primarily evaluate gender bias at the demographic group level, neglecting individual fairness, which emphasizes equal treatment of similar individuals. This research gap limits the detection of discriminatory behaviors, as individual fairness o… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  44. Improving the Expressiveness of $K$-hop Message-Passing GNNs by Injecting Contextualized Substructure Information

    Authors: Tianjun Yao, Yiongxu Wang, Kun Zhang, Shangsong Liang

    Abstract: Graph neural networks (GNNs) have become the \textit{de facto} standard for representational learning in graphs, and have achieved state-of-the-art performance in many graph-related tasks; however, it has been shown that the expressive power of standard GNNs are equivalent maximally to 1-dimensional Weisfeiler-Lehman (1-WL) Test. Recently, there is a line of works aiming to enhance the expressive… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 13 pages, published in Research track of KDD2023

    ACM Class: I.2.6

  45. arXiv:2406.18844  [pdf, other

    cs.CV

    Revisiting Backdoor Attacks against Large Vision-Language Models

    Authors: Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Ee-Chien Chang, Xiaochun Cao

    Abstract: Instruction tuning enhances large vision-language models (LVLMs) but raises security risks through potential backdoor attacks due to their openness. Previous backdoor studies focus on enclosed scenarios with consistent training and testing instructions, neglecting the practical domain gaps that could affect attack effectiveness. This paper empirically examines the generalizability of backdoor atta… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 24 pages, 8 figures

  46. arXiv:2406.14434  [pdf, other

    cs.CL

    Towards Truthful Multilingual Large Language Models: Benchmarking and Alignment Strategies

    Authors: Weihao Liu, Ning Wu, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang

    Abstract: In the era of large language models (LLMs), building multilingual large language models (MLLMs) that can serve users worldwide holds great significance. However, existing research seldom focuses on the truthfulness of MLLMs. Meanwhile, contemporary multilingual aligning technologies struggle to balance massive languages and often exhibit serious truthfulness gaps across different languages, especi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 15 pages

  47. arXiv:2406.12072  [pdf, other

    cs.AI cs.LG

    DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

    Authors: Jiasheng Zhang, Jialin Chen, Menglin Yang, Aosong Feng, Shuang Liang, Jie Shao, Rex Ying

    Abstract: Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 28 pages, 13 figures

  48. arXiv:2406.04031  [pdf, other

    cs.CV cs.CR

    Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

    Authors: Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xianglong Liu, Dacheng Tao

    Abstract: In the realm of large vision language models (LVLMs), jailbreak attacks serve as a red-teaming approach to bypass guardrails and uncover safety implications. Existing jailbreaks predominantly focus on the visual modality, perturbing solely visual inputs in the prompt for attacks. However, they fall short when confronted with aligned models that fuse visual and textual features simultaneously for g… ▽ More

    Submitted 1 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  49. arXiv:2406.00934  [pdf, other

    cs.CV

    LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions

    Authors: Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao

    Abstract: Lane detection (LD) is an essential component of autonomous driving systems, providing fundamental functionalities like adaptive cruise control and automated lane centering. Existing LD benchmarks primarily focus on evaluating common cases, neglecting the robustness of LD models against environmental illusions such as shadows and tire marks on the road. This research gap poses significant safety c… ▽ More

    Submitted 16 July, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM MM 2024

  50. arXiv:2406.00629  [pdf, other

    cs.CV

    Correlation Matching Transformation Transformers for UHD Image Restoration

    Authors: Cong Wang, Jinshan Pan, Wei Wang, Gang Fu, Siyuan Liang, Mengzhu Wang, Xiao-Ming Wu, Jun Liu

    Abstract: This paper proposes UHDformer, a general Transformer for Ultra-High-Definition (UHD) image restoration. UHDformer contains two learning spaces: (a) learning in high-resolution space and (b) learning in low-resolution space. The former learns multi-level high-resolution features and fuses low-high features and reconstructs the residual images, while the latter explores more representative features… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: AAAI-24; Source codes, datasets, visual results, and pre-trained models are: https://github.com/supersupercong/UHDformer