Skip to main content

Showing 1–50 of 1,285 results for author: Cao, Y

  1. arXiv:2410.16268  [pdf, other

    cs.CV

    SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

    Authors: Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, Jiaqi Wang

    Abstract: The Segment Anything Model 2 (SAM 2) has emerged as a powerful foundation model for object segmentation in both images and videos, paving the way for various downstream video applications. The crucial design of SAM 2 for video segmentation is its memory module, which prompts object-aware memories from previous frames for current frame prediction. However, its greedy-selection memory design suffers… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: Project page: https://mark12ding.github.io/project/SAM2Long/

  2. arXiv:2410.16204  [pdf

    cs.LG cs.CL

    Systematic Review: Text Processing Algorithms in Machine Learning and Deep Learning for Mental Health Detection on Social Media

    Authors: Yuchen Cao, Jianglai Dai, Zhongyan Wang, Yeyubei Zhang, Xiaorui Shen, Yunchong Liu, Yexin Tian

    Abstract: The global rise in depression necessitates innovative detection methods for early intervention. Social media provides a unique opportunity to identify depression through user-generated posts. This systematic review evaluates machine learning (ML) models for depression detection on social media, focusing on biases and methodological challenges throughout the ML lifecycle. A search of PubMed, IEEE X… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  3. arXiv:2410.16184  [pdf, other

    cs.CL

    RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

    Authors: Yantao Liu, Zijun Yao, Rui Min, Yixin Cao, Lei Hou, Juanzi Li

    Abstract: Reward models are critical in techniques like Reinforcement Learning from Human Feedback (RLHF) and Inference Scaling Laws, where they guide language model alignment and select optimal responses. Despite their importance, existing reward model benchmarks often evaluate models by asking them to distinguish between responses generated by models of varying power. However, this approach fails to asses… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  4. arXiv:2410.16121  [pdf, other

    cs.LG cs.CR

    Extracting Spatiotemporal Data from Gradients with Large Language Models

    Authors: Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    Abstract: Recent works show that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains, such as spatiotemporal data. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversio… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2407.08529

  5. arXiv:2410.15811  [pdf, other

    cs.CV

    Data-Efficient CLIP-Powered Dual-Branch Networks for Source-Free Unsupervised Domain Adaptation

    Authors: Yongguang Li, Yueqi Cao, Jindong Li, Qi Wang, Shengsheng Wang

    Abstract: Source-Free Unsupervised Domain Adaptation (SF-UDA) aims to transfer a model's performance from a labeled source domain to an unlabeled target domain without direct access to source samples, addressing data privacy issues. However, most existing SF-UDA approaches assume the availability of abundant source domain samples, which is often impractical due to the high cost of data annotation. In this p… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  6. arXiv:2410.15392  [pdf, other

    cs.CV

    EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting

    Authors: Bohao Liao, Wei Zhai, Zengyu Wan, Tianzhu Zhang, Yang Cao, Zheng-Jun Zha

    Abstract: Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently lo… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: Project Page: https://lbh666.github.io/ef-3dgs/

  7. arXiv:2410.14769  [pdf, other

    eess.IV cs.CV

    Medical AI for Early Detection of Lung Cancer: A Survey

    Authors: Guohui Cai, Ying Cai, Zeyu Zhang, Yuanzhouhan Cao, Lin Wu, Daji Ergu, Zhinbin Liao, Yang Zhao

    Abstract: Lung cancer remains one of the leading causes of morbidity and mortality worldwide, making early diagnosis critical for improving therapeutic outcomes and patient prognosis. Computer-aided diagnosis (CAD) systems, which analyze CT images, have proven effective in detecting and classifying pulmonary nodules, significantly enhancing the detection rate of early-stage lung cancer. Although traditional… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  8. arXiv:2410.14331  [pdf, other

    cs.HC cs.IR

    ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM

    Authors: Songheng Zhang, Lei Wang, Toby Jia-Jun Li, Qiaomu Shen, Yixin Cao, Yong Wang

    Abstract: Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism. However, it is difficult for readers to quickly interpret such data-involved texts and gain deep insights. To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers, w… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  9. arXiv:2410.13221  [pdf, other

    eess.AS cs.SD

    Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition

    Authors: Chao Tan, Sheng Li, Yang Cao, Zhao Ren, Tanja Schultz

    Abstract: Federated Learning (FL) is a privacy-preserving approach that allows servers to aggregate distributed models transmitted from local clients rather than training on user data. More recently, FL has been applied to Speech Emotion Recognition (SER) for secure human-computer interaction applications. Recent research has found that FL is still vulnerable to inference attacks. To this end, this paper fo… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  10. arXiv:2410.12928  [pdf, other

    cs.CV

    DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Model

    Authors: Jingxiang Sun, Cheng Peng, Ruizhi Shao, Yuan-Chen Guo, Xiaochen Zhao, Yangguang Li, Yanpei Cao, Bo Zhang, Yebin Liu

    Abstract: We introduce DreamCraft3D++, an extension of DreamCraft3D that enables efficient high-quality generation of complex 3D assets. DreamCraft3D++ inherits the multi-stage generation process of DreamCraft3D, but replaces the time-consuming geometry sculpting optimization with a feed-forward multi-plane based reconstruction model, speeding up the process by 1000x. For texture refinement, we propose a tr… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Project Page: https://dreamcraft3dplus.github.io/

  11. arXiv:2410.12592  [pdf, other

    cs.CV cs.LG

    Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion

    Authors: Minkyoung Cho, Yulong Cao, Jiachen Sun, Qingzhao Zhang, Marco Pavone, Jeong Joon Park, Heng Yang, Z. Morley Mao

    Abstract: An important paradigm in 3D object detection is the use of multiple modalities to enhance accuracy in both normal and challenging conditions, particularly for long-tail scenarios. To address this, recent studies have explored two directions of adaptive approaches: MoE-based adaptive fusion, which struggles with uncertainties arising from distinct object configurations, and late fusion for output-l… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 23 pages

  12. arXiv:2410.12324  [pdf, other

    cs.RO cs.CV

    PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

    Authors: Guanghao Li, Yu Cao, Qi Chen, Yifan Yang, Jian Pu

    Abstract: In point-line SLAM systems, the utilization of line structural information and the optimization of lines are two significant problems. The former is usually addressed through structural regularities, while the latter typically involves using minimal parameter representations of lines in optimization. However, separating these two steps leads to the loss of constraint information to each other. We… ▽ More

    Submitted 18 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: 8 pages, 4 figures

  13. arXiv:2410.12302  [pdf, other

    cs.IT cs.AI cs.LG

    Two Birds with One Stone: Multi-Task Semantic Communications Systems over Relay Channel

    Authors: Yujie Cao, Tong Wu, Zhiyong Chen, Yin Xu, Meixia Tao, Wenjun Zhang

    Abstract: In this paper, we propose a novel multi-task, multi-link relay semantic communications (MTML-RSC) scheme that enables the destination node to simultaneously perform image reconstruction and classification with one transmission from the source node. In the MTML-RSC scheme, the source node broadcasts a signal using semantic communications, and the relay node forwards the signal to the destination. W… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: submitted to IEEE WCNC

  14. arXiv:2410.11829  [pdf, other

    cs.CV

    MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

    Authors: Yue Cao, Yangzhou Liu, Zhe Chen, Guangchen Shi, Wenhai Wang, Danhuai Zhao, Tong Lu

    Abstract: Despite significant advancements in Multimodal Large Language Models (MLLMs) for understanding complex human intentions through cross-modal interactions, capturing intricate image details remains challenging. Previous methods integrating multiple vision encoders to enhance visual detail introduce redundancy and computational overhead. We observe that most MLLMs utilize only the last-layer feature… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 11 pages, 6 figures, technical report

  15. arXiv:2410.11363  [pdf, other

    cs.CV

    Visual-Geometric Collaborative Guidance for Affordance Learning

    Authors: Hongchen Luo, Wei Zhai, Jiao Wang, Yang Cao, Zheng-Jun Zha

    Abstract: Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing affordance learning algorithms often adopt the label assignment paradigm and presume that there is a unique relationship between functional region and afford… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  16. arXiv:2410.10798  [pdf, other

    cs.CV

    MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

    Authors: Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha

    Abstract: Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods inevitably suffer from loss of image information during understanding task, due to either image discretization or diffusion denoising steps. To address this issue, we propose a novel… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

  17. arXiv:2410.08136  [pdf

    cs.HC

    SoundScape: A Human-AI Co-Creation System Making Your Memories Heard

    Authors: Chongjun Zhong, Jiaxing Yu, Yingping Cao, Songruoyao Wu, Wenqi Wu, Kejun Zhang

    Abstract: Sound plays a significant role in human memory, yet it is often overlooked by mainstream life-recording methods. Most current UGC (User-Generated Content) creation tools emphasize visual content while lacking user-friendly sound design features. This paper introduces SoundScape, a human-AI co-creation system that allows users to easily create sound memories on mobile devices through innovative int… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  18. arXiv:2410.07893  [pdf, other

    cs.CR

    Ormer: A Manipulation-resistant and Gas-efficient Blockchain Pricing Oracle for DeFi

    Authors: Dongbin Bai, Jiannong Cao, Yinfeng Cao, Long Wen

    Abstract: Blockchain oracle is a critical third-party web service for Decentralized Finance (DeFi) protocols. Oracles retrieve external information such as token prices from exchanges and feed them as trusted data sources into smart contracts, enabling core DeFi applications such as loaning protocols. Currently, arithmetic mean based time-weighted average price (TWAP) oracles are widely used in DeFi by aver… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  19. arXiv:2410.07860  [pdf, other

    cs.CV

    BA-Net: Bridge Attention in Deep Neural Networks

    Authors: Ronghui Zhang, Runzong Zou, Yue Zhao, Zirui Zhang, Junzhou Chen, Yue Cao, Chuan Hu, Houbing Song

    Abstract: Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap… ▽ More

    Submitted 10 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

  20. arXiv:2410.07738  [pdf, other

    cs.LG cs.AI

    Enhancing Federated Domain Adaptation with Multi-Domain Prototype-Based Federated Fine-Tuning

    Authors: Jingyuan Zhang, Yiyang Duan, Shuaicheng Niu, Yang Cao, Wei Yang Bryan Lim

    Abstract: Federated Domain Adaptation (FDA) is a Federated Learning (FL) scenario where models are trained across multiple clients with unique data domains but a shared category space, without transmitting private data. The primary challenge in FDA is data heterogeneity, which causes significant divergences in gradient updates when using conventional averaging-based aggregation methods, reducing the efficac… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  21. arXiv:2410.07617  [pdf, other

    cs.CV

    Prototype-based Optimal Transport for Out-of-Distribution Detection

    Authors: Ao Ke, Wenlong Chen, Chuanwen Feng, Yukun Cao, Xike Xie, S. Kevin Zhou, Lei Feng

    Abstract: Detecting Out-of-Distribution (OOD) inputs is crucial for improving the reliability of deep neural networks in the real-world deployment. In this paper, inspired by the inherent distribution shift between ID and OOD data, we propose a novel method that leverages optimal transport to measure the distribution discrepancy between test inputs and ID prototypes. The resulting transport costs are used t… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  22. arXiv:2410.07167  [pdf, other

    cs.CV cs.CL

    Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

    Authors: Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

    Abstract: We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs). Large-scale pre-training plays a critical role in building capable LVLMs, while evaluating its training quality without the costly supervised fine-tuning stage is under-explored. Loss, perplexity, and in-context evalu… ▽ More

    Submitted 16 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Project page: https://github.com/shikiw/Modality-Integration-Rate

  23. arXiv:2410.07165  [pdf, other

    cs.AI cs.LG

    Complex Logical Query Answering by Calibrating Knowledge Graph Completion Models

    Authors: Changyi Xiao, Yixin Cao

    Abstract: Complex logical query answering (CLQA) is a challenging task that involves finding answer entities for complex logical queries over incomplete knowledge graphs (KGs). Previous research has explored the use of pre-trained knowledge graph completion (KGC) models, which can predict the missing facts in KGs, to answer complex logical queries. However, KGC models are typically evaluated using ranking e… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

  24. arXiv:2410.07164  [pdf, other

    cs.CV

    AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

    Authors: Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K. Wong, Ziwei Liu

    Abstract: Recent advancements in diffusion models have led to significant improvements in the generation and animation of 4D full-body human-object interactions (HOI). Nevertheless, existing methods primarily focus on SMPL-based motion generation, which is limited by the scarcity of realistic large-scale interaction data. This constraint affects their ability to create everyday HOI scenes. This paper addres… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Project page: https://yukangcao.github.io/AvatarGO/

  25. arXiv:2410.06366  [pdf, other

    cs.LG cs.AI

    Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

    Authors: Zijie Huang, Wanjia Zhao, Jingdong Gao, Ziniu Hu, Xiao Luo, Yadi Cao, Yuanzhou Chen, Yizhou Sun, Wei Wang

    Abstract: Learning complex physical dynamics purely from data is challenging due to the intrinsic properties of systems to be satisfied. Incorporating physics-informed priors, such as in Hamiltonian Neural Networks (HNNs), achieves high-precision modeling for energy-conservative systems. However, real-world systems often deviate from strict energy conservation and follow different physical priors. To addres… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted to The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

  26. arXiv:2410.06241  [pdf, other

    cs.CV

    BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

    Authors: Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang

    Abstract: The text-to-video (T2V) generation models, offering convenient visual creation, have recently garnered increasing attention. Despite their substantial potential, the generated videos may present artifacts, including structural implausibility, temporal inconsistency, and a lack of motion, often resulting in near-static video. In this work, we have identified a correlation between the disparity of t… ▽ More

    Submitted 16 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

  27. arXiv:2410.05624  [pdf, other

    cs.CV cs.LG

    Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion

    Authors: Yice Cao, Chenchen Liu, Zhenhua Wu, Wenxin Yao, Liu Xiong, Jie Chen, Zhixiang Huang

    Abstract: As remote sensing imaging technology continues to advance and evolve, processing high-resolution and diversified satellite imagery to improve segmentation accuracy and enhance interpretation efficiency emerg as a pivotal area of investigation within the realm of remote sensing. Although segmentation algorithms based on CNNs and Transformers achieve significant progress in performance, balancing se… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  28. arXiv:2410.05582  [pdf, other

    cs.RO

    Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-tuning

    Authors: Zhiyu Huang, Xinshuo Weng, Maximilian Igl, Yuxiao Chen, Yulong Cao, Boris Ivanovic, Marco Pavone, Chen Lv

    Abstract: Autonomous driving necessitates the ability to reason about future interactions between traffic agents and to make informed evaluations for planning. This paper introduces the \textit{Gen-Drive} framework, which shifts from the traditional prediction and deterministic planning framework to a generation-then-evaluation planning paradigm. The framework employs a behavior diffusion model as a scene g… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  29. arXiv:2410.05259  [pdf, other

    cs.CV

    GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting

    Authors: Yukang Cao, Masoud Hadi, Liang Pan, Ziwei Liu

    Abstract: Diffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clot… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 21 pages, 11 figures

  30. arXiv:2410.04787  [pdf, other

    cs.GT math.OC

    A Differentially Private Energy Trading Mechanism Approaching Social Optimum

    Authors: Yuji Cao, Yue Chen

    Abstract: This paper proposes a differentially private energy trading mechanism for prosumers in peer-to-peer (P2P) markets, offering provable privacy guarantees while approaching the Nash equilibrium with nearly socially optimal efficiency. We first model the P2P energy trading as a (generalized) Nash game and prove the vulnerability of traditional distributed algorithms to privacy attacks through an adver… ▽ More

    Submitted 21 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: 11 pages, 8 figures

  31. arXiv:2410.04784  [pdf, other

    cs.CL

    Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge

    Authors: Jiahuan Li, Yiqing Cao, Shujian Huang, Jiajun Chen

    Abstract: Having been trained on massive pretraining data, large language models have shown excellent performance on many knowledge-intensive tasks. However, pretraining data tends to contain misleading and even conflicting information, and it is intriguing to understand how LLMs handle these noisy data during training. In this study, we systematically analyze LLMs' learning preferences for data with confli… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: accepted by EMNLP 2024, main conference

  32. arXiv:2410.04225  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Video Super-Resolution Quality Assessment: Methods and Results

    Authors: Ivan Molodetskikh, Artem Borisov, Dmitriy Vatolin, Radu Timofte, Jianzhao Liu, Tianwu Zhi, Yabin Zhang, Yang Li, Jingwen Xu, Yiting Liao, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Yuqin Cao, Wei Sun, Weixia Zhang, Yinan Sun, Ziheng Jia, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Weihua Luo , et al. (2 additional authors not shown)

    Abstract: This paper presents the Video Super-Resolution (SR) Quality Assessment (QA) Challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024. The task of this challenge was to develop an objective QA method for videos upscaled 2x and 4x by modern image- and video-SR algorithms. QA methods were evaluated by comparing their output with aggregate subjec… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: 18 pages, 7 figures

  33. arXiv:2410.03290  [pdf, other

    cs.CV cs.AI

    Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

    Authors: Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang

    Abstract: Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding. In this paper, we introduce Grounded-VideoLLM, a novel Video-LLM adept at perceiving and reasoning over specific video moments in a fine-grained manner. We identify that current Video-LLMs have limitations for fine-gr… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  34. arXiv:2410.03122  [pdf, other

    cs.CL cs.AI cs.LG

    RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning

    Authors: Zihao Zhao, Yuchen Yang, Yijiang Li, Yinzhi Cao

    Abstract: The ripple effect poses a significant challenge in knowledge editing for large language models. Namely, when a single fact is edited, the model struggles to accurately update the related facts in a sequence, which is evaluated by multi-hop questions linked to a chain of related facts. Recent strategies have moved away from traditional parameter updates to more flexible, less computation-intensive… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: EMNLP findings

  35. arXiv:2410.02288  [pdf, other

    cs.CV

    Computer-aided Colorization State-of-the-science: A Survey

    Authors: Yu Cao, Xin Duan, Xiangqiao Meng, P. Y. Mok, Ping Li, Tong-Yee Lee

    Abstract: This paper reviews published research in the field of computer-aided colorization technology. We argue that the colorization task originates from computer graphics, prospers by introducing computer vision, and tends to the fusion of vision and graphics, so we put forward our taxonomy and organize the whole paper chronologically. We extend the existing reconstruction-based colorization evaluation t… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  36. arXiv:2410.02240  [pdf, other

    cs.CV cs.AI

    SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack

    Authors: Zihao Pan, Weibin Wu, Yuhang Cao, Zibin Zheng

    Abstract: Deep neural network based systems deployed in sensitive environments are vulnerable to adversarial attacks. Unrestricted adversarial attacks typically manipulate the semantic content of an image (e.g., color or texture) to create adversarial examples that are both effective and photorealistic. Recent works have utilized the diffusion inversion process to map images into a latent space, where high-… ▽ More

    Submitted 17 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

  37. arXiv:2410.01647  [pdf, other

    cs.CV

    3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

    Authors: Yang Cao, Yuanliang Jv, Dan Xu

    Abstract: Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerge… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: Code Page: https://github.com/yangcaoai/3DGS-DET

  38. arXiv:2410.01353  [pdf, other

    cs.SE cs.AI

    Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?

    Authors: Zhenyu Pan, Rongyu Cao, Yongchang Cao, Yingwei Ma, Binhua Li, Fei Huang, Han Liu, Yongbin Li

    Abstract: Code completion, a key downstream task in code generation, is one of the most frequent and impactful methods for enhancing developer productivity in software development. As intelligent completion tools evolve, we need a robust evaluation benchmark that enables meaningful comparisons between products and guides future advancements. However, existing benchmarks focus more on coarse-grained tasks wi… ▽ More

    Submitted 16 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

  39. arXiv:2410.00713  [pdf, other

    cs.CV

    RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations

    Authors: Kaichen Zhou, Yang Cao, Teawhan Kim, Hao Zhao, Hao Dong, Kai Ming Ting, Ye Zhu

    Abstract: Recent advancements in industrial anomaly detection have been hindered by the lack of realistic datasets that accurately represent real-world conditions. Existing algorithms are often developed and evaluated using idealized datasets, which deviate significantly from real-life scenarios characterized by environmental noise and data corruption such as fluctuating lighting conditions, variable object… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  40. arXiv:2409.20146  [pdf, other

    cs.CV

    VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection

    Authors: Huilin Deng, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang

    Abstract: Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects by establishing feature mapping between textual prompts and inspection images, demonstrating excellent research value in flexible industrial manufacturing. However, existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts. Recently, adapting Mul… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  41. arXiv:2409.19977  [pdf, other

    cs.LG cs.AI

    Knowledge Graph Embedding by Normalizing Flows

    Authors: Changyi Xiao, Xiangnan He, Yixin Cao

    Abstract: A key to knowledge graph embedding (KGE) is to choose a proper representation space, e.g., point-wise Euclidean space and complex vector space. In this paper, we propose a unified perspective of embedding and introduce uncertainty into KGE from the view of group theory. Our model can incorporate existing models (i.e., generality), ensure the computation is tractable (i.e., efficiency) and enjoy th… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  42. arXiv:2409.19769  [pdf, other

    cs.LG cs.AI eess.SY

    Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear Systems

    Authors: Umer Siddique, Abhinav Sinha, Yongcan Cao

    Abstract: In this paper, we propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions. Specifically, the proposed method is capable of jointly learning both the control policy and the communication policy, thereby reducing the number of parameters and computational overhead when learning t… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  43. arXiv:2409.19650  [pdf, other

    cs.CV cs.AI

    Grounding 3D Scene Affordance From Egocentric Interactions

    Authors: Cuiyu Liu, Wei Zhai, Yuhang Yang, Hongchen Luo, Sen Liang, Yang Cao, Zheng-Jun Zha

    Abstract: Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the env… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  44. arXiv:2409.18685  [pdf, other

    cs.LG stat.ML

    Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

    Authors: Han Zhang, Yuan Cao

    Abstract: SimCLR is one of the most popular contrastive learning methods for vision tasks. It pre-trains deep neural networks based on a large amount of unlabeled data by teaching the model to distinguish between positive and negative pairs of augmented images. It is believed that SimCLR can pre-train a deep neural network to learn efficient representations that can lead to a better performance of future su… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: 65 pages, 4 figures

  45. arXiv:2409.17589  [pdf, other

    cs.CV cs.AI

    Improving Fast Adversarial Training via Self-Knowledge Guidance

    Authors: Chengze Jiang, Junkai Wang, Minjing Dong, Jie Gui, Xinli Shi, Yuan Cao, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Adversarial training has achieved remarkable advancements in defending against adversarial attacks. Among them, fast adversarial training (FAT) is gaining attention for its ability to achieve competitive robustness with fewer computing resources. Existing FAT methods typically employ a uniform strategy that optimizes all training data equally without considering the influence of different examples… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 13 pages

  46. arXiv:2409.17313  [pdf, other

    cs.CV cs.AI cs.CL

    Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation

    Authors: Zehao Wang, Minye Wu, Yixin Cao, Yubo Ma, Meiqi Chen, Tinne Tuytelaars

    Abstract: This study presents a novel evaluation framework for the Vision-Language Navigation (VLN) task. It aims to diagnose current models for various instruction categories at a finer-grained level. The framework is structured around the context-free grammar (CFG) of the task. The CFG serves as the basis for the problem decomposition and the core premise of the instruction categories design. We propose a… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 Findings; project page: https://zehao-wang.github.io/navnuances

  47. NFTracer: Tracing NFT Impact Dynamics in Transaction-flow Substitutive Systems with Visual Analytics

    Authors: Yifan Cao, Qing Shi, Lue Shen, Kani Chen, Yang Wang, Wei Zeng, Huamin Qu

    Abstract: Impact dynamics are crucial for estimating the growth patterns of NFT projects by tracking the diffusion and decay of their relative appeal among stakeholders. Machine learning methods for impact dynamics analysis are incomprehensible and rigid in terms of their interpretability and transparency, whilst stakeholders require interactive tools for informed decision-making. Nevertheless, developing s… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 25 pages, 13 figures, 3 tables, accepted by IEEE Transactions on Visualization and Computer Graphics (2024)

  48. arXiv:2409.15656  [pdf, other

    cs.CR

    Identified-and-Targeted: The First Early Evidence of the Privacy-Invasive Use of Browser Fingerprinting for Online Tracking

    Authors: Zengrui Liu, Jimmy Dani, Shujiang Wu, Yinzhi Cao, Nitesh Saxena

    Abstract: While advertising has become commonplace in today's online interactions, there is a notable dearth of research investigating the extent to which browser fingerprinting is harnessed for user tracking and targeted advertising. Prior studies only measured whether fingerprinting-related scripts are being run on the websites but that in itself does not necessarily mean that fingerprinting is being used… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  49. arXiv:2409.14827  [pdf, other

    cs.CV cs.HC cs.MM

    AIM 2024 Challenge on Video Saliency Prediction: Methods and Results

    Authors: Andrey Moskalenko, Alexey Bryncev, Dmitry Vatolin, Radu Timofte, Gen Zhan, Li Yang, Yunlong Tang, Yiting Liao, Jiongzhi Lin, Baitao Huang, Morteza Moradi, Mohammad Moradi, Francesco Rundo, Concetto Spampinato, Ali Borji, Simone Palazzo, Yuxin Zhu, Yinan Sun, Huiyu Duan, Yuqin Cao, Ziheng Jia, Qiang Hu, Xiongkuo Min, Guangtao Zhai, Hao Fang , et al. (8 additional authors not shown)

    Abstract: This paper reviews the Challenge on Video Saliency Prediction at AIM 2024. The goal of the participants was to develop a method for predicting accurate saliency maps for the provided set of video sequences. Saliency maps are widely exploited in various applications, including video compression, quality assessment, visual perception studies, the advertising industry, etc. For this competition, a pr… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: ECCVW 2024

    ACM Class: I.4.6; I.2.10

  50. arXiv:2409.14028  [pdf, other

    eess.IV cs.CV

    MSDet: Receptive Field Enhanced Multiscale Detection for Tiny Pulmonary Nodule

    Authors: Guohui Cai, Ying Cai, Zeyu Zhang, Daji Ergu, Yuanzhouhan Cao, Binbin Hu, Zhibin Liao, Yang Zhao

    Abstract: Pulmonary nodules are critical indicators for the early diagnosis of lung cancer, making their detection essential for timely treatment. However, traditional CT imaging methods suffered from cumbersome procedures, low detection rates, and poor localization accuracy. The subtle differences between pulmonary nodules and surrounding tissues in complex lung CT images, combined with repeated downsampli… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.