Skip to main content

Showing 1–50 of 225 results for author: Pang, Y

  1. arXiv:2409.19656  [pdf, other

    cs.CL

    Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

    Authors: Fengzhu Zeng, Wenqian Li, Wei Gao, Yan Pang

    Abstract: Detecting multimodal misinformation, especially in the form of image-text pairs, is crucial. Obtaining large-scale, high-quality real-world fact-checking datasets for training detectors is costly, leading researchers to use synthetic datasets generated by AI technologies. However, the generalizability of detectors trained on synthetic data to real-world scenarios remains unclear due to the distrib… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024 Findings

  2. arXiv:2409.18355  [pdf, other

    cs.CV

    SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement

    Authors: Yunkui Pang, Yilin Liu, Xu Chen, Pew-Thian Yap, Jun Lian

    Abstract: Cone Beam Computed Tomography (CBCT) finds diverse applications in medicine. Ensuring high image quality in CBCT scans is essential for accurate diagnosis and treatment delivery. Yet, the susceptibility of CBCT images to noise and artifacts undermines both their usefulness and reliability. Existing methods typically address CBCT artifacts through image-to-image translation approaches. These method… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: MICCAI 2024

  3. arXiv:2409.04779  [pdf, other

    cs.LG math.NA

    Component Fourier Neural Operator for Singularly Perturbed Differential Equations

    Authors: Ye Li, Ting Du, Yiwen Pang, Zhongyi Huang

    Abstract: Solving Singularly Perturbed Differential Equations (SPDEs) poses computational challenges arising from the rapid transitions in their solutions within thin regions. The effectiveness of deep learning in addressing differential equations motivates us to employ these methods for solving SPDEs. In this manuscript, we introduce Component Fourier Neural Operator (ComFNO), an innovative operator learni… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  4. arXiv:2409.04018  [pdf, other

    cs.CV

    Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency, and Accuracy

    Authors: Boyuan Tian, Yihan Pang, Muhammad Huzaifa, Shenlong Wang, Sarita Adve

    Abstract: Extended Reality (XR) enables immersive experiences through untethered headsets but suffers from stringent battery and resource constraints. Energy-efficient design is crucial to ensure both longevity and high performance in XR devices. However, latency and accuracy are often prioritized over energy, leading to a gap in achieving energy efficiency. This paper examines scene reconstruction, a key b… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: ISMAR 2024

  5. arXiv:2409.03209  [pdf, other

    cs.CV

    iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

    Authors: Lin Sun, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang

    Abstract: Stable diffusion has demonstrated strong image synthesis ability to given text descriptions, suggesting it to contain strong semantic clue for grouping objects. The researchers have explored employing stable diffusion for training-free segmentation. Most existing approaches refine cross-attention map by self-attention map once, demonstrating that self-attention map contains useful semantic informa… ▽ More

    Submitted 8 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: Project Page: https://linsun449.github.io/iSeg/ Code: https://github.com/linsun449/iseg.code

  6. arXiv:2409.01184  [pdf, other

    cs.CV

    PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery

    Authors: Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon Płotka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa , et al. (7 additional authors not shown)

    Abstract: The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operat… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  7. arXiv:2408.15914  [pdf, other

    cs.CV

    CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

    Authors: Feize Wu, Yun Pang, Junyi Zhang, Lianyu Pang, Jian Yin, Baoquan Zhao, Qing Li, Xudong Mao

    Abstract: Recent advances in text-to-image personalization have enabled high-quality and controllable image synthesis for user-provided concepts. However, existing methods still struggle to balance identity preservation with text alignment. Our approach is based on the fact that generating prompt-aligned images requires a precise semantic understanding of the prompt, which involves accurately processing the… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  8. arXiv:2408.14762  [pdf, other

    cs.LG cs.SI

    Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction

    Authors: Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto

    Abstract: Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the i… ▽ More

    Submitted 29 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  9. arXiv:2408.11845  [pdf, other

    cs.CL

    LLaMA based Punctuation Restoration With Forward Pass Only Decoding

    Authors: Yutong Pang, Debjyoti Paul, Kevin Jiang, Xuedong Zhang, Xin Lei

    Abstract: This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To a… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  10. arXiv:2408.08056  [pdf, other

    cs.LG

    DATTA: Towards Diversity Adaptive Test-Time Adaptation in Dynamic Wild World

    Authors: Chuyang Ye, Dongyan Wei, Zhendong Liu, Yuanyi Pang, Yixi Lin, Jiarong Liao, Qinting Jiang, Xianghua Fu, Qing Li, Jingyan Jiang

    Abstract: Test-time adaptation (TTA) effectively addresses distribution shifts between training and testing data by adjusting models on test samples, which is crucial for improving model inference in real-world applications. However, traditional TTA methods typically follow a fixed pattern to address the dynamic data patterns (low-diversity or high-diversity patterns) often leading to performance degradatio… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 16 pages, 2 figures

  11. arXiv:2408.06787  [pdf, other

    cs.CL

    Unlock the Power of Frozen LLMs in Knowledge Graph Completion

    Authors: Bo Xue, Yi Xu, Yunchong Song, Yiming Pang, Yuyang Ren, Jiaxin Ding, Luoyi Fu, Xinbing Wang

    Abstract: Traditional knowledge graph completion (KGC) methods rely solely on structural information, struggling with the inherent sparsity of knowledge graphs (KGs). Large Language Models (LLMs) learn extensive knowledge from large corpora with powerful context modeling, making them promising for mitigating the limitations of previous methods. Directly fine-tuning LLMs offers great capability but comes at… ▽ More

    Submitted 18 September, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  12. arXiv:2408.02666  [pdf, other

    cs.CL cs.AI

    Self-Taught Evaluators

    Authors: Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li

    Abstract: Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation. To train such evaluators, the standard approach is to collect a large amount of human preference judgments over model responses, which is costly and the data becomes stale as models improve. In this work, we present an approach that aims to im-prove e… ▽ More

    Submitted 8 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  13. arXiv:2407.19548  [pdf, other

    cs.CV

    Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

    Authors: Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Wangbo Yu, Chaoran Feng, Yatian Pang, Bin Lin, Li Yuan

    Abstract: Recent 3D large reconstruction models typically employ a two-stage process, including first generate multi-view images by a multi-view diffusion model, and then utilize a feed-forward model to reconstruct images to 3D content.However, multi-view diffusion models often produce low-quality and inconsistent images, adversely affecting the quality of the final 3D reconstruction. To address this issue,… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Project page: https://pku-yuangroup.github.io/Cycle3D/

  14. arXiv:2407.17120  [pdf, other

    cs.LG cs.AI

    Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

    Authors: Jingren Liu, Zhong Ji, YunLong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

    Abstract: Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics fo… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  15. arXiv:2407.12581  [pdf, other

    cs.CR cs.AI cs.CV cs.CY

    Towards Understanding Unsafe Video Generation

    Authors: Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang

    Abstract: Video generation models (VGMs) have demonstrated the capability to synthesize high-quality output. It is important to understand their potential to produce unsafe content, such as violent or terrifying videos. In this work, we provide a comprehensive understanding of unsafe video generation. First, to confirm the possibility that these models could indeed generate unsafe videos, we choose unsafe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 18 pages

  16. arXiv:2407.11503  [pdf, other

    cs.CV

    Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation

    Authors: Shijie Chang, Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu

    Abstract: Existing few-shot segmentation (FSS) methods mainly focus on prototype feature generation and the query-support matching mechanism. As a crucial prompt for generating prototype features, the pair of image-mask types in the support set has become the default setting. However, various types such as image, text, box, and mask all can provide valuable information regarding the objects in context, clas… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Preprint under review

  17. arXiv:2407.10987  [pdf, ps, other

    cs.NI cs.AI eess.SP

    Adaptive Digital Twin and Communication-Efficient Federated Learning Network Slicing for 5G-enabled Internet of Things

    Authors: Daniel Ayepah-Mensah, Guolin Sun, Yu Pang, Wei Jiang

    Abstract: Network slicing enables industrial Internet of Things (IIoT) networks with multiservice and differentiated resource requirements to meet increasing demands through efficient use and management of network resources. Typically, the network slice orchestrator relies on demand forecasts for each slice to make informed decisions and maximize resource utilization. The new generation of Industry 4.0 has… ▽ More

    Submitted 22 June, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures, conference

  18. arXiv:2406.13853  [pdf, other

    cs.HC

    AltGeoViz: Facilitating Accessible Geovisualization

    Authors: Chu Li, Rock Yuren Pang, Ather Sharif, Arnavi Chheda-Kothary, Jeffrey Heer, Jon E. Froehlich

    Abstract: Geovisualizations are powerful tools for exploratory spatial analysis, enabling sighted users to discern patterns, trends, and relationships within geographic data. However, these visual tools have remained largely inaccessible to screen-reader users. We present AltGeoViz, a new system we designed to facilitate geovisualization exploration for these users. AltGeoViz dynamically generates alt-text… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  19. arXiv:2406.01987  [pdf, other

    cs.CV

    Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

    Authors: Yunpeng Zhao, Cheng Chen, Qing You Pang, Quanzheng Li, Carol Tang, Beng-Ti Ang, Yueming Jin

    Abstract: Addressing missing modalities presents a critical challenge in multimodal learning. Current approaches focus on developing models that can handle modality-incomplete inputs during inference, assuming that the full set of modalities are available for all the data during training. This reliance on full-modality data for training limits the use of abundant modality-incomplete samples that are often e… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2405.17441  [pdf, other

    cs.NI cs.AI cs.CL eess.SY

    When Large Language Models Meet Optical Networks: Paving the Way for Automation

    Authors: Danshi Wang, Yidi Wang, Xiaotian Jiang, Yao Zhang, Yue Pang, Min Zhang

    Abstract: Since the advent of GPT, large language models (LLMs) have brought about revolutionary advancements in all walks of life. As a superior natural language processing (NLP) technology, LLMs have consistently achieved state-of-the-art performance on numerous areas. However, LLMs are considered to be general-purpose models for NLP tasks, which may encounter challenges when applied to complex tasks in s… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  21. arXiv:2405.17405  [pdf, other

    cs.CV

    Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer

    Authors: Ruizhi Shao, Youxin Pang, Zerong Zheng, Jingxiang Sun, Yebin Liu

    Abstract: We present a novel approach for generating 360-degree high-quality, spatio-temporally coherent human videos from a single image. Our framework combines the strengths of diffusion transformers for capturing global correlations across viewpoints and time, and CNNs for accurate condition injection. The core is a hierarchical 4D transformer architecture that factorizes self-attention across views, tim… ▽ More

    Submitted 23 September, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project website is https://human4dit.github.io

  22. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  23. arXiv:2405.15182  [pdf, other

    cs.CR cs.AI

    RFLPA: A Robust Federated Learning Framework against Poisoning Attacks with Secure Aggregation

    Authors: Peihua Mai, Ran Yan, Yan Pang

    Abstract: Federated learning (FL) allows multiple devices to train a model collaboratively without sharing their data. Despite its benefits, FL is vulnerable to privacy leakage and poisoning attacks. To address the privacy concern, secure aggregation (SecAgg) is often used to obtain the aggregation of gradients on sever without inspecting individual user updates. Unfortunately, existing defense strategies a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages

    ACM Class: E.4

  24. Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

    Authors: Bao Hoang, Yijiang Pang, Siqi Liang, Liang Zhan, Paul Thompson, Jiayu Zhou

    Abstract: Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or faciliti… ▽ More

    Submitted 7 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures, accepted to KDD2024-ADS

  25. arXiv:2405.10523  [pdf, other

    cs.CL

    Smart Expert System: Large Language Models as Text Classifiers

    Authors: Zhiqiang Wang, Yiran Pang, Yanbin Lin

    Abstract: Text classification is a fundamental task in Natural Language Processing (NLP), and the advent of Large Language Models (LLMs) has revolutionized the field. This paper introduces the Smart Expert System, a novel approach that leverages LLMs as text classifiers. The system simplifies the traditional text classification workflow, eliminating the need for extensive preprocessing and domain expertise.… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 11 pages, 3 figures, and 8 tables

  26. arXiv:2405.06783  [pdf, other

    cs.HC cs.AI cs.CY

    BLIP: Facilitating the Exploration of Undesirable Consequences of Digital Technologies

    Authors: Rock Yuren Pang, Sebastin Santy, René Just, Katharina Reinecke

    Abstract: Digital technologies have positively transformed society, but they have also led to undesirable consequences not anticipated at the time of design or development. We posit that insights into past undesirable consequences can help researchers and practitioners gain awareness and anticipate potential adverse effects. To test this assumption, we introduce BLIP, a system that extracts real-world undes… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: To appear in the Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  27. arXiv:2405.05493  [pdf, ps, other

    cs.CL cs.AI

    Parameter-Efficient Fine-Tuning With Adapters

    Authors: Keyu Chen, Yuan Pang, Zi Yang

    Abstract: In the arena of language model fine-tuning, the traditional approaches, such as Domain-Adaptive Pretraining (DAPT) and Task-Adaptive Pretraining (TAPT), although effective, but computational intensive. This research introduces a novel adaptation method utilizing the UniPELT framework as a base and added a PromptTuning Layer, which significantly reduces the number of trainable parameters while main… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  28. arXiv:2405.04376  [pdf, other

    cs.LG

    Towards Stability of Parameter-free Optimization

    Authors: Yijiang Pang, Shuyang Yu, Bao Hoang, Jiayu Zhou

    Abstract: Hyperparameter tuning, particularly the selection of an appropriate learning rate in adaptive gradient training methods, remains a challenge. To tackle this challenge, in this paper, we propose a novel parameter-free optimizer, \textsc{AdamG} (Adam with the golden step size), designed to automatically adapt to diverse optimization problems without manual tuning. The core technique underlying \text… ▽ More

    Submitted 27 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  29. arXiv:2405.01353  [pdf, other

    cs.CV

    Sparse multi-view hand-object reconstruction for unseen environments

    Authors: Yik Lung Pang, Changjae Oh, Andrea Cavallaro

    Abstract: Recent works in hand-object reconstruction mainly focus on the single-view and dense multi-view settings. On the one hand, single-view methods can leverage learned shape priors to generalise to unseen objects but are prone to inaccuracies due to occlusions. On the other hand, dense multi-view methods are very accurate but cannot easily adapt to unseen objects without further data collection. In co… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Camera-ready version. Paper accepted to CVPRW 2024. 8 pages, 7 figures, 1 table

  30. arXiv:2405.01002  [pdf, other

    cs.CV cs.LG

    Spider: A Unified Framework for Context-dependent Concept Segmentation

    Authors: Xiaoqi Zhao, Youwei Pang, Wei Ji, Baicheng Sheng, Jiaming Zuo, Lihe Zhang, Huchuan Lu

    Abstract: Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD understanding tasks in respective branches, the isolated evolution leads to their limited cross-domain generalisation and repetitive technique innovatio… ▽ More

    Submitted 28 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  31. arXiv:2404.19733  [pdf, other

    cs.CL cs.AI

    Iterative Reasoning Preference Optimization

    Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  32. arXiv:2404.15802  [pdf, other

    cs.CV cs.AI

    Raformer: Redundancy-Aware Transformer for Video Wire Inpainting

    Authors: Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou, Yanwei Pang, Jungong Han

    Abstract: Video Wire Inpainting (VWI) is a prominent application in video inpainting, aimed at flawlessly removing wires in films or TV series, offering significant time and labor savings compared to manual frame-by-frame removal. However, wire removal poses greater challenges due to the wires being longer and slimmer than objects typically targeted in general video inpainting tasks, and often intersecting… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  33. arXiv:2404.09431  [pdf, other

    cs.CV

    VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection

    Authors: Bonan Ding, Jin Xie, Jing Nie, Jiale Cao, Xuelong Li, Yanwei Pang

    Abstract: Due to its cost-effectiveness and widespread availability, monocular 3D object detection, which relies solely on a single camera during inference, holds significant importance across various applications, including autonomous driving and robotics. Nevertheless, directly predicting the coordinates of objects in 3D space from monocular images poses challenges. Therefore, an effective solution involv… ▽ More

    Submitted 26 August, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: 11 pages, 4 figures

  34. arXiv:2404.07600  [pdf, other

    cs.CV

    Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

    Authors: Hefeng Wang, Jiale Cao, Jin Xie, Aiping Yang, Yanwei Pang

    Abstract: Text-to-image diffusion models have shown powerful ability on conditional image synthesis. With large-scale vision-language pre-training, diffusion models are able to generate high-quality images with rich texture and reasonable structure under different text prompts. However, it is an open problem to adapt the pre-trained diffusion model for visual perception. In this paper, we propose an implici… ▽ More

    Submitted 15 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE TMM

  35. arXiv:2404.07445  [pdf, other

    cs.CV

    Multi-view Aggregation Network for Dichotomous Image Segmentation

    Authors: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu

    Abstract: Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious mu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 as Highlight

  36. arXiv:2404.06787  [pdf, other

    cs.LG cs.AI

    Private Wasserstein Distance with Random Noises

    Authors: Wenqian Li, Haozhi Wang, Zhe Huang, Yan Pang

    Abstract: Wasserstein distance is a principle measure of data divergence from a distributional standpoint. However, its application becomes challenging in the context of data privacy, where sharing raw data is restricted. Prior attempts have employed techniques like Differential Privacy or Federated optimization to approximate Wasserstein distance. Nevertheless, these approaches often lack accuracy and robu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  37. arXiv:2403.15733  [pdf, other

    cs.SI cs.CY

    Spatio-Temporal Graph Convolutional Network Combined Large Language Model: A Deep Learning Framework for Bike Demand Forecasting

    Authors: Peisen Li, Yizhe Pang, Junyu Ren

    Abstract: This study presents a new deep learning framework, combining Spatio-Temporal Graph Convolutional Network (STGCN) with a Large Language Model (LLM), for bike demand forecasting. Addressing challenges in transforming discrete datasets and integrating unstructured language data, the framework leverages LLMs to extract insights from Points of Interest (POI) text data. The proposed STGCN-L model demons… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: ISNN 2024

  38. arXiv:2403.12486  [pdf, other

    cs.LG cs.AI

    NTK-Guided Few-Shot Class Incremental Learning

    Authors: Jingren Liu, Zhong Ji, Yanwei Pang, YunLong Yu

    Abstract: The proliferation of Few-Shot Class Incremental Learning (FSCIL) methodologies has highlighted the critical challenge of maintaining robust anti-amnesia capabilities in FSCIL learners. In this paper, we present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective. Our method focuses on two key aspects: ensuring op… ▽ More

    Submitted 24 September, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  39. arXiv:2403.12455  [pdf, other

    cs.CV

    CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation

    Authors: Wenqi Zhu, Jiale Cao, Jin Xie, Shuangming Yang, Yanwei Pang

    Abstract: Open-vocabulary video instance segmentation strives to segment and track instances belonging to an open set of categories in a videos. The vision-language model Contrastive Language-Image Pre-training (CLIP) has shown robust zero-shot classification ability in image-level open-vocabulary tasks. In this paper, we propose a simple encoder-decoder network, called CLIP-VIS, to adapt CLIP for open-voca… ▽ More

    Submitted 8 October, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TCSVT

  40. arXiv:2403.10127  [pdf, other

    cs.CV

    TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model

    Authors: Changhong Hou, Junchuan Yu, Daqing Ge, Liu Yang, Laidian Xi, Yunxuan Pang, Yi Wen

    Abstract: Landslides are one of the most destructive natural disasters in the world, posing a serious threat to human life and safety. The development of foundation models has provided a new research paradigm for large-scale landslide detection. The Segment Anything Model (SAM) has garnered widespread attention in the field of image segmentation. However, our experiment found that SAM performed poorly in th… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  41. arXiv:2403.08902  [pdf, other

    cs.CV

    Envision3D: One Image to 3D with Anchor Views Interpolation

    Authors: Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

    Abstract: We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: GitHub repository: https://github.com/PKU-YuanGroup/Envision3D

  42. arXiv:2403.07888  [pdf, other

    cs.CV cs.AI

    Cross-modality debiasing: using language to mitigate sub-population shifts in imaging

    Authors: Yijiang Pang, Bao Hoang, Jiayu Zhou

    Abstract: Sub-population shift is a specific type of domain shift that highlights changes in data distribution within specific sub-groups or populations between training and testing. Sub-population shift accounts for a significant source of algorithmic bias and calls for distributional robustness. Recent studies found inherent distributional robustness in multi-modality foundation models, such as the vision… ▽ More

    Submitted 2 April, 2024; v1 submitted 2 February, 2024; originally announced March 2024.

  43. arXiv:2402.15810  [pdf, other

    cs.DL cs.CL cs.LG

    OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining

    Authors: Fanjin Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng, Tianyi Han, Yuwei An, Dan Zhang, Weng Lam Tam, Kun Cao, Yunhe Pang, Xinyu Guan, Huihui Yuan, Jian Song, Xiaoyan Li, Yuxiao Dong, Jie Tang

    Abstract: With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs.… ▽ More

    Submitted 20 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: KDD'24, 9 pages, 5 appendix pages

    Journal ref: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25--29, 2024, Barcelona, Spain

  44. arXiv:2402.13126  [pdf, other

    cs.CR cs.AI cs.CV cs.LG eess.IV

    VGMShield: Mitigating Misuse of Video Generative Models

    Authors: Yan Pang, Yang Zhang, Tianhao Wang

    Abstract: With the rapid advancement in video generation, people can conveniently utilize video generation models to create videos tailored to their specific desires. Nevertheless, there are also growing concerns about their potential misuse in creating and disseminating false information. In this work, we introduce VGMShield: a set of three straightforward but pioneering mitigations through the lifecycle… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 17 pages, 10 figures

  45. arXiv:2402.02797  [pdf, other

    cs.CV cs.LG

    Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects

    Authors: Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu

    Abstract: Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion netw… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  46. arXiv:2402.01621  [pdf, other

    cs.LG

    Stochastic Two Points Method for Deep Model Zeroth-order Optimization

    Authors: Yijiang Pang, Jiayu Zhou

    Abstract: Large foundation models, such as large language models, have performed exceptionally well in various application scenarios. Building or fully fine-tuning such large models is usually prohibitive due to either hardware budget or lack of access to backpropagation. The zeroth-order methods offer a promising direction for tackling this challenge, where only forward passes are needed to update the mode… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  47. arXiv:2401.15947  [pdf, other

    cs.CV

    MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

    Authors: Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Yatian Pang, Munan Ning, Li Yuan

    Abstract: Recent advances demonstrate that scaling Large Vision-Language Models (LVLMs) effectively improves downstream task performances. However, existing scaling methods enable all model parameters to be active for each token in the calculation, which brings massive training and inferring costs. In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs. This strategy innovati… ▽ More

    Submitted 6 July, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: K = P + N represents the length of the output sequence in the formula (8)

  48. arXiv:2401.10020  [pdf, other

    cs.CL cs.AI

    Self-Rewarding Language Models

    Authors: Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

    Abstract: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewardi… ▽ More

    Submitted 8 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  49. arXiv:2401.09176  [pdf

    cs.LG

    ADCNet: a unified framework for predicting the activity of antibody-drug conjugates

    Authors: Liye Chen, Biaoshun Li, Yihao Chen, Mujie Lin, Shipeng Zhang, Chenxin Li, Yu Pang, Ling Wang

    Abstract: Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug. Nevertheless, the realization of rational design of ADC is very difficult because the relationship between their structures and activities is difficult to understand. In the present study, we introdu… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  50. arXiv:2401.00870  [pdf, other

    cs.CR cs.AI

    ConfusionPrompt: Practical Private Inference for Online Large Language Models

    Authors: Peihua Mai, Ran Yan, Rui Ye, Youjia Yang, Yinchuan Li, Yan Pang

    Abstract: State-of-the-art large language models (LLMs) are commonly deployed as online services, necessitating users to transmit informative prompts to cloud servers, thus engendering substantial privacy concerns. In response, we present ConfusionPrompt, a novel private LLM inference framework designed to obfuscate the server by: (i) decomposing the prompt into sub-prompts, and (ii) generating pseudo promp… ▽ More

    Submitted 24 May, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: 21 pages

    MSC Class: I.2.7