Skip to main content

Showing 1–50 of 210 results for author: Gong, S

  1. arXiv:2410.14157  [pdf, other

    cs.CL cs.LG

    Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

    Authors: Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong

    Abstract: Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. Through the lens of subgoal imbalance, we demonstrate how diffusion models effectively learn difficult subgoals that elude autoregressive approaches. We propose Multi-granularity Diffusio… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  2. arXiv:2410.11473  [pdf, other

    cs.CV

    InvSeg: Test-Time Prompt Inversion for Semantic Segmentation

    Authors: Jiayi Lin, Jiabo Huang, Jian Hu, Shaogang Gong

    Abstract: Visual-textual correlations in the attention maps derived from text-to-image diffusion models are proven beneficial to dense visual prediction tasks, e.g., semantic segmentation. However, a significant challenge arises due to the input distributional discrepancy between the context-rich sentences used for image generation and the isolated class names typically employed in semantic segmentation, hi… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  3. arXiv:2410.00376  [pdf, other

    cs.IT eess.SP

    Frequency Diverse Array-enabled RIS-aided Integrated Sensing and Communication

    Authors: Hanyu Yang, Shiqi Gong, Heng Liu, Chengwen Xing, Nan Zhao, Dusit Niyato

    Abstract: Integrated sensing and communication (ISAC) has been envisioned as a prospective technology to enable ubiquitous sensing and communications in next-generation wireless networks. In contrast to existing works on reconfigurable intelligent surface (RIS) aided ISAC systems using conventional phased arrays (PAs), this paper investigates a frequency diverse array (FDA)-enabled RIS-aided ISAC system, wh… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: 36 pages, 9 figures

  4. arXiv:2409.13580  [pdf, other

    cs.IT

    Lyapunov-guided Deep Reinforcement Learning for Semantic-aware AoI Minimization in UAV-assisted Wireless Networks

    Authors: Yusi Long, Shimin Gong, Sumei Sun, Gary Lee, Lanhua Li, Dusit Niyato

    Abstract: This paper investigates an unmanned aerial vehicle (UAV)-assisted semantic network where the ground users (GUs) periodically capture and upload the sensing information to a base station (BS) via UAVs' relaying. Both the GUs and the UAVs can extract semantic information from large-size raw data and transmit it to the BS for recovery. Smaller-size semantic information reduces latency and improves in… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: This paper has been sumitted to IEEE TWC

  5. arXiv:2409.05024  [pdf, other

    cs.CV

    Deep Self-Cleansing for Medical Image Segmentation with Noisy Labels

    Authors: Jiahua Dong, Yue Zhang, Qiuli Wang, Ruofeng Tong, Shihong Ying, Shaolin Gong, Xuanpu Zhang, Lanfen Lin, Yen-Wei Chen, S. Kevin Zhou

    Abstract: Medical image segmentation is crucial in the field of medical imaging, aiding in disease diagnosis and surgical planning. Most established segmentation methods rely on supervised deep learning, in which clean and precise labels are essential for supervision and significantly impact the performance of models. However, manually delineated labels often contain noise, such as missing labels and inaccu… ▽ More

    Submitted 26 September, 2024; v1 submitted 8 September, 2024; originally announced September 2024.

    Comments: 31 pages, 7 figures

  6. arXiv:2409.01113  [pdf, other

    cs.CV

    KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding

    Authors: Zhihao Xu, Shengjie Gong, Jiapeng Tang, Lingyu Liang, Yining Huang, Haojie Li, Shuangping Huang

    Abstract: We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a p… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024

  7. arXiv:2409.00966  [pdf, other

    math.PR cs.DS cs.LG math.ST

    A computational transition for detecting correlated stochastic block models by low-degree polynomials

    Authors: Guanyi Chen, Jian Ding, Shuyang Gong, Zhangsong Li

    Abstract: Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfracλ{n};k,ε;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfracλ{n};k,ε)$ with $k=O(1)$ symmetric communiti… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 75 pages, 2 figures

    MSC Class: Primary 68Q87; Secondary 62M20

  8. arXiv:2408.15205  [pdf, other

    cs.CV

    Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

    Authors: Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong

    Abstract: Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object. To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task. Current methods use Multimodal Large Language Models (MLLMs) to reason detailed ins… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: We propose using hallucinations as prior knowledge to extract and validate task-related information, which helps generate instance-specific prompts for reducing reliance on manual prompts in promptable segmentation

  9. arXiv:2408.12817  [pdf, other

    cs.LG physics.chem-ph

    Data-Driven Parametrization of Molecular Mechanics Force Fields for Expansive Chemical Space Coverage

    Authors: Tianze Zheng, Ailun Wang, Xu Han, Yu Xia, Xingyuan Xu, Jiawei Zhan, Yu Liu, Yang Chen, Zhi Wang, Xiaojie Wu, Sheng Gong, Wen Yan

    Abstract: A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this… ▽ More

    Submitted 8 October, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: ByteFF, a machine learning parametrized MMFF. Code available at https://github.com/bytedance/byteff

  10. arXiv:2408.02586  [pdf, other

    cs.IT eess.SP

    Massive MIMO-OTFS-Based Random Access for Cooperative LEO Satellite Constellations

    Authors: Boxiao Shen, Yongpeng Wu, Shiqi Gong, Heng Liu, Björn Ottersten, Wenjun Zhang

    Abstract: This paper investigates joint device identification, channel estimation, and symbol detection for cooperative multi-satellite-enhanced random access, where orthogonal time-frequency space modulation with the large antenna array is utilized to combat the dynamics of the terrestrial-satellite links (TSLs). We introduce the generalized complex exponential basis expansion model to parameterize TSLs, t… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by IEEE Journal on Selected Areas in Communications

  11. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  12. arXiv:2407.14544  [pdf, other

    cs.DC

    Fast Iterative Graph Computing with Updated Neighbor States

    Authors: Yijie Zhou, Shufeng Gong, Feng Yao, Hanzhang Chen, Song Yu, Pengxi Liu, Yanfeng Zhang, Ge Yu, Jeffrey Xu Yu

    Abstract: Enhancing the efficiency of iterative computation on graphs has garnered considerable attention in both industry and academia. Nonetheless, the majority of efforts focus on expediting iterative computation by minimizing the running time per iteration step, ignoring the optimization of the number of iteration rounds, which is a crucial aspect of iterative computation. We experimentally verified the… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 14 pages, 13 figures, 2 tables; accepted for publication in ICDE 2024

  13. arXiv:2407.13076  [pdf, other

    cs.MA cs.NI eess.SP

    Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks

    Authors: Ziqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu

    Abstract: Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  14. arXiv:2407.12014  [pdf, other

    cs.HC cs.CY

    Surprising Performances of Students with Autism in Classroom with NAO Robot

    Authors: Qin Yang, Huan Lu, Dandan Liang, Shengrong Gong, Huanghao Feng

    Abstract: Autism is a developmental disorder that manifests in early childhood and persists throughout life, profoundly affecting social behavior and hindering the acquisition of learning and social skills in those diagnosed. As technological advancements progress, an increasing array of technologies is being utilized to support the education of students with Autism Spectrum Disorder (ASD), aiming to improv… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  15. arXiv:2407.10753  [pdf, other

    cs.CV

    OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

    Authors: Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

    Abstract: Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  16. arXiv:2407.07249  [pdf, other

    cs.CV

    Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion

    Authors: Yu Cao, Shaogang Gong

    Abstract: In the field of Few-Shot Image Generation (FSIG) using Deep Generative Models (DGMs), accurately estimating the distribution of target domain with minimal samples poses a significant challenge. This requires a method that can both capture the broad diversity and the true characteristics of the target domain distribution. We present Conditional Relaxing Diffusion Inversion (CRDI), an innovative `tr… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  17. arXiv:2407.05679  [pdf, other

    cs.CV cs.AI

    BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

    Authors: Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence… ▽ More

    Submitted 18 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages

  18. arXiv:2407.05118  [pdf, other

    cs.CV

    SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

    Authors: Zixu Cheng, Yujiang Pu, Shaogang Gong, Parisa Kordjamshidi, Yu Kong

    Abstract: Temporal grounding, also known as video moment retrieval, aims at locating video segments corresponding to a given query sentence. The compositional nature of natural language enables the localization beyond predefined events, posing a certain challenge to the compositional generalizability of existing methods. Recent studies establish the correspondence between videos and queries through a decomp… ▽ More

    Submitted 15 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  19. arXiv:2407.03804  [pdf, other

    cs.LG cs.NI

    Multi-Time Scale Service Caching and Pricing in MEC Systems with Dynamic Program Popularity

    Authors: Yiming Chen, Xingyuan Hu, Bo Gu, Shimin Gong, Zhou Su

    Abstract: In mobile edge computing systems, base stations (BSs) equipped with edge servers can provide computing services to users to reduce their task execution time. However, there is always a conflict of interest between the BS and users. The BS prices the service programs based on user demand to maximize its own profit, while the users determine their offloading strategies based on the prices to minimiz… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  20. arXiv:2406.17880  [pdf, other

    cs.CV

    MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval

    Authors: Weitong Cai, Jiabo Huang, Shaogang Gong, Hailin Jin, Yang Liu

    Abstract: Video Moment Retrieval (VMR) aims to localize a specific temporal segment within an untrimmed long video given a natural language query. Existing methods often suffer from inadequate training annotations, i.e., the sentence typically matches with a fraction of the prominent video content in the foreground with limited wording diversity. This intrinsic modality imbalance leaves a considerable porti… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Under review

  21. arXiv:2406.16715  [pdf, other

    cs.LG

    GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights

    Authors: Shengbo Gong, Juntong Ni, Noveen Sachdeva, Carl Yang, Wei Jin

    Abstract: Graph condensation (GC) is an emerging technique designed to learn a significantly smaller graph that retains the essential information of the original graph. This condensed graph has shown promise in accelerating graph neural networks while preserving performance comparable to those achieved with the original, larger graphs. Additionally, this technique facilitates downstream applications like ne… ▽ More

    Submitted 6 October, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 22 pages

  22. arXiv:2406.01791  [pdf, other

    cs.CV

    Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels

    Authors: Weitong Cai, Jiabo Huang, Shaogang Gong

    Abstract: Video moment retrieval (VMR) is to search for a visual temporal moment in an untrimmed raw video by a given text query description (sentence). Existing studies either start from collecting exhaustive frame-wise annotations on the temporal boundary of target moments (fully-supervised), or learn with only the video-level video-text pairing labels (weakly-supervised). The former is poor in generalisa… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by BMVC2022

  23. arXiv:2405.19100  [pdf, other

    cs.CV

    Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

    Authors: Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras

    Abstract: Current facial expression recognition (FER) models are often designed in a supervised learning manner and thus are constrained by the lack of large-scale facial expression images with high-quality annotations. Consequently, these models often fail to generalize well, performing poorly on unseen images in inference. Vision-language-based zero-shot models demonstrate a promising potential for addres… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: The code and pre-trained models are available at https://github.com/zengqunzhao/Exp-CLIP

  24. arXiv:2405.18725  [pdf, other

    cs.LG cs.MA

    Can We Enhance the Quality of Mobile Crowdsensing Data Without Ground Truth?

    Authors: Jiajie Li, Bo Gu, Shimin Gong, Zhou Su, Mohsen Guizani

    Abstract: Mobile crowdsensing (MCS) has emerged as a prominent trend across various domains. However, ensuring the quality of the sensing data submitted by mobile users (MUs) remains a complex and challenging problem. To address this challenge, an advanced method is required to detect low-quality sensing data and identify malicious MUs that may disrupt the normal operations of an MCS system. Therefore, this… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  25. arXiv:2405.08278  [pdf, other

    cs.CR cs.SI

    Facilitating Feature and Topology Lightweighting: An Ethereum Transaction Graph Compression Method for Malicious Account Detection

    Authors: Jiajun Zhou, Xuanze Chen, Shengbo Gong, Chenkai Hu, Chengxiang Jin, Shanqing Yu, Qi Xuan

    Abstract: Ethereum has become one of the primary global platforms for cryptocurrency, playing an important role in promoting the diversification of the financial ecosystem. However, the relative lag in regulation has led to a proliferation of malicious activities in Ethereum, posing a serious threat to fund security. Existing regulatory methods usually detect malicious accounts through feature engineering o… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Blockchain and Trustworthy Systems 2024

  26. arXiv:2404.19449  [pdf, other

    cs.IT

    AoI-aware Sensing Scheduling and Trajectory Optimization for Multi-UAV-assisted Wireless Backscatter Networks

    Authors: Yusi Long, Songhan Zhao, Shimin Gong, Bo Gu, Dusit Niyato, Xuemin, Shen

    Abstract: This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissio… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE TVT

  27. arXiv:2404.07181  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development

    Authors: Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang

    Abstract: Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for l… ▽ More

    Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  28. arXiv:2404.05192  [pdf, other

    cs.LG

    ATFNet: Adaptive Time-Frequency Ensembled Network for Long-term Time Series Forecasting

    Authors: Hengyu Ye, Jiadong Chen, Shijin Gong, Fuxin Jiang, Tieying Zhang, Jianjun Chen, Xiaofeng Gao

    Abstract: The intricate nature of time series data analysis benefits greatly from the distinct advantages offered by time and frequency domain representations. While the time domain is superior in representing local dependencies, particularly in non-periodic series, the frequency domain excels in capturing global dependencies, making it ideal for series with evident periodic patterns. To capitalize on both… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  29. arXiv:2404.04647  [pdf, other

    cs.CV

    Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training

    Authors: Shizhan Gong, Qi Dou, Farzan Farnia

    Abstract: Gradient-based saliency maps have been widely used to explain the decisions of deep neural network classifiers. However, standard gradient-based interpretation maps, including the simple gradient and integrated gradient algorithms, often lack desired structures such as sparsity and connectedness in their application to real-world computer vision models. A frequently used approach to inducing spars… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  30. arXiv:2404.00598  [pdf, other

    cs.IT eess.SP

    Robust Beamforming Design and Antenna Selection for Dynamic HRIS-aided MISO System

    Authors: Jintao Wang, Binggui Zhou, Chengzhi Ma, Shiqi Gong, Guanghua Yang, Shaodan Ma

    Abstract: In this paper, we propose a dynamic hybrid active-passive reconfigurable intelligent surface (HRIS) to enhance multiple-input-single-output (MISO) communications, leveraging the property of dynamically placing active elements. Specifically, considering the impact of hardware impairments (HWIs), we investigate channel-aware configurations of the receive antennas at the base station (BS) and the act… ▽ More

    Submitted 8 October, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 6 pages, 3 figures

  31. arXiv:2403.14943  [pdf, ps, other

    cs.IT eess.SP

    Primary Rate Maximization in Movable Antennas Empowered Symbiotic Radio Communications

    Authors: Bin Lyu, Hao Liu, Wenqing Hong, Shimin Gong, Feng Tian

    Abstract: In this paper, we propose a movable antenna (MA) empowered scheme for symbiotic radio (SR) communication systems. Specifically, multiple antennas at the primary transmitter (PT) can be flexibly moved to favorable locations to boost the channel conditions of the primary and secondary transmissions. The primary transmission is achieved by the active transmission from the PT to the primary user (PU),… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: To appear in IEEE VTC-Spring 2024. 6 Pages,5 figures

  32. arXiv:2403.04326  [pdf, other

    eess.SY cs.AI cs.LG

    Edge-based Parametric Digital Twins for Intelligent Building Indoor Climate Modeling

    Authors: Zhongjun Ni, Chi Zhang, Magnus Karlsson, Shaofang Gong

    Abstract: Digital transformation in the built environment generates vast data for developing data-driven models to optimize building operations. This study presents an integrated solution utilizing edge computing, digital twins, and deep learning to enhance the understanding of climate in buildings. Parametric digital twins, created using an ontology, ensure consistent data representation across diverse ser… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 8 pages, 8 figures, accepted in the 20th IEEE International Conference on Factory Communication Systems

    MSC Class: 68T07 ACM Class: I.5.4

  33. arXiv:2402.17463  [pdf, other

    cs.CL

    Training-Free Long-Context Scaling of Large Language Models

    Authors: Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

    Abstract: The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  34. arXiv:2402.15095  [pdf, ps, other

    math.ST cs.DS cs.LG math.PR

    The Umeyama algorithm for matching correlated Gaussian geometric models in the low-dimensional regime

    Authors: Shuyang Gong, Zhangsong Li

    Abstract: Motivated by the problem of matching two correlated random geometric graphs, we study the problem of matching two Gaussian geometric models correlated through a latent node permutation. Specifically, given an unknown permutation $π^*$ on $\{1,\ldots,n\}$ and given $n$ i.i.d. pairs of correlated Gaussian vectors $\{X_{π^*(i)},Y_i\}$ in $\mathbb{R}^d$ with noise parameter $σ$, we consider two types… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 31 pages

    MSC Class: 68Q87 (Primary); 62M15 (Secondary)

  35. arXiv:2402.13577  [pdf, other

    cs.CL

    BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

    Authors: Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong

    Abstract: Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs). The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains. However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively lever… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Preprint

  36. arXiv:2402.07754  [pdf, other

    cs.CL cs.AI cs.LG

    Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

    Authors: Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

    Abstract: Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models. In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language m… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Multiple updates (add boolean logic dataset, add DoT based on SEDD model and add detailed mathematical formulation in Appendix)

  37. arXiv:2402.03358  [pdf, other

    cs.SI cs.AI cs.DS cs.LG

    A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation

    Authors: Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B. Aditya Prakash, Wei Jin

    Abstract: Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction, or graph summarization, has gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide… ▽ More

    Submitted 29 June, 2024; v1 submitted 28 January, 2024; originally announced February 2024.

    Comments: Accepted by IJCAI 2024 (This ArXiv version is a long version of our IJCAI paper)

  38. arXiv:2402.02950  [pdf, other

    cs.CR eess.SP

    Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

    Authors: Yankai Rong, Guoshun Nan, Minwei Zhang, Sihan Chen, Songtao Wang, Xuefei Zhang, Nan Ma, Shixun Gong, Zhaohui Yang, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

    Abstract: Recently proliferated deep learning-based semantic communications (DLSC) focus on how transmitted symbols efficiently convey a desired meaning to the destination. However, the sensitivity of neural models and the openness of wireless channels cause the DLSC system to be extremely fragile to various malicious attacks. This inspires us to ask a question: "Can we further exploit the advantages of tra… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 13 pages, 12 figures

  39. arXiv:2402.02673  [pdf, other

    cs.GT

    A Unified Framework of Multi-Stage Multi-Winner Voting: An Axiomatic Exploration

    Authors: Shengjie Gong, Lingxiao Huang, Shuangping Huang, Yuyi Wang, Zhiqi Wang, Tao Xiao, Xiang Yan, Chunxue Yang

    Abstract: Multi-winner voting plays a crucial role in selecting representative committees based on voter preferences. Previous research has predominantly focused on single-stage voting rules, which are susceptible to manipulation during preference collection. In order to mitigate manipulation and increase the cost associated with it, we propose the introduction of multiple stages in the voting procedure, le… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  40. arXiv:2402.02430  [pdf, other

    cs.CV cs.LG

    Exploiting Low-level Representations for Ultra-Fast Road Segmentation

    Authors: Huan Zhou, Feng Xue, Yucong Li, Shi Gong, Yiqun Li, Yu Zhou

    Abstract: Achieving real-time and accuracy on embedded platforms has always been the pursuit of road segmentation methods. To this end, they have proposed many lightweight networks. However, they ignore the fact that roads are "stuff" (background or environmental elements) rather than "things" (specific identifiable objects), which inspires us to explore the feasibility of representing roads with low-level… ▽ More

    Submitted 6 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 11 pages, 7 figures, IEEE TITS

  41. arXiv:2401.13329  [pdf, other

    cs.CV

    Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval

    Authors: Dezhao Luo, Shaogang Gong, Jiabo Huang, Hailin Jin, Yang Liu

    Abstract: Video Moment Retrieval (VMR) requires precise modelling of fine-grained moment-text associations to capture intricate visual-language relationships. Due to the lack of a diverse and generalisable VMR dataset to facilitate learning scalable moment-text associations, existing methods resort to joint training on both source and target domain videos for cross-domain applications. Meanwhile, recent dev… ▽ More

    Submitted 29 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  42. arXiv:2401.11205  [pdf, other

    cs.IT eess.SP

    Joint Beamforming Optimization and Mode Selection for RDARS-aided MIMO Systems

    Authors: Jintao Wang, Chengzhi Ma, Shiqi Gong, Xi Yang, Shaodan Ma

    Abstract: Considering the appealing distribution gains of distributed antenna systems (DAS) and passive gains of reconfigurable intelligent surface (RIS), a flexible reconfigurable architecture called reconfigurable distributed antenna and reflecting surface (RDARS) is proposed. RDARS encompasses DAS and RIS as two special cases and maintains the advantages of distributed antennas while reducing the hardwar… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: 13 pages, 9 figures. This paper has been submitted to IEEE journal for possible publication

  43. arXiv:2312.08924  [pdf, other

    cs.CV

    Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking

    Authors: Shitong Sun, Fanghua Ye, Shaogang Gong

    Abstract: Composed image retrieval attempts to retrieve an image of interest from gallery images through a composed query of a reference image and its corresponding modified text. It has recently attracted attention due to the collaboration of information-rich images and concise language to precisely express the requirements of target images. Most current composed image retrieval methods follow a supervised… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Under Review

  44. arXiv:2312.07374  [pdf, other

    cs.CV

    Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects

    Authors: Jian Hu, Jiayi Lin, Weitong Cai, Shaogang Gong

    Abstract: Camouflaged object detection (COD) approaches heavily rely on pixel-level annotated datasets. Weakly-supervised COD (WSCOD) approaches use sparse annotations like scribbles or points to reduce annotation effort, but this can lead to decreased accuracy. The Segment Anything Model (SAM) shows remarkable segmentation ability with sparse prompts like points. However, manual prompt is not always feasib… ▽ More

    Submitted 18 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  45. arXiv:2311.14837  [pdf, other

    cs.CV cs.IR

    Benchmarking Robustness of Text-Image Composed Retrieval

    Authors: Shitong Sun, Jindong Gu, Shaogang Gong

    Abstract: Text-image composed retrieval aims to retrieve the target image through the composed query, which is specified in the form of an image plus some text that describes desired modifications to the input image. It has recently attracted attention due to its ability to leverage both information-rich images and concise language to precisely express the requirements for target images. However, the robust… ▽ More

    Submitted 30 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted by R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023

  46. arXiv:2311.11574  [pdf, other

    cs.IT

    A Framework on Complex Matrix Derivatives with Special Structure Constraints for Wireless Systems

    Authors: Xin Ju, Shiqi Gong, Nan Zhao, Chengwen Xing, Arumugam Nallanathan, Dusit Niyato

    Abstract: Matrix-variate optimization plays a central role in advanced wireless system designs. In this paper, we aim to explore optimal solutions of matrix variables under two special structure constraints using complex matrix derivatives, including diagonal structure constraints and constant modulus constraints, both of which are closely related to the state-of-the-art wireless applications. Specifically,… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  47. arXiv:2310.15913  [pdf, other

    cs.CV

    Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID

    Authors: Qilei Li, Shaogang Gong

    Abstract: While deep learning has significantly improved ReID model accuracy under the independent and identical distribution (IID) assumption, it has also become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable/unknown domain shift. Contemporary domain generalization (DG) ReID models struggle in learning domain-invariant representation solely through traini… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to WACV2024

  48. arXiv:2310.05793  [pdf, other

    cs.LG cs.CL

    DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

    Authors: Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

    Abstract: Diffusion models have gained prominence in generating high-quality sequences of text. Nevertheless, current approaches predominantly represent discrete text within a continuous diffusion space, which incurs substantial computational overhead during training and results in slower sampling speeds. In this paper, we introduce a soft absorbing state that facilitates the diffusion model in learning to… ▽ More

    Submitted 16 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings Camera-ready

  49. arXiv:2310.00856  [pdf, other

    cs.SI

    Multi-triplet Feature Augmentation for Ponzi Scheme Detection in Ethereum

    Authors: Chengxiang Jin, Jiajun Zhou, Shengbo Gong, Chenxuan Xie, Qi Xuan

    Abstract: Blockchain technology revolutionizes the Internet, but also poses increasing risks, particularly in cryptocurrency finance. On the Ethereum platform, Ponzi schemes, phishing scams, and a variety of other frauds emerge. Existing Ponzi scheme detection approaches based on heterogeneous transaction graph modeling leverages semantic information between node (account) pairs to establish connections, ov… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: Accepted by 2023 IEEE International Conference on Data Mining Workshops (ICDMW)

  50. arXiv:2309.08965  [pdf, other

    cs.AI cs.LG cs.MA

    Multiagent Reinforcement Learning with an Attention Mechanism for Improving Energy Efficiency in LoRa Networks

    Authors: Xu Zhang, Ziqi Lin, Shimin Gong, Bo Gu, Dusit Niyato

    Abstract: Long Range (LoRa) wireless technology, characterized by low power consumption and a long communication range, is regarded as one of the enabling technologies for the Industrial Internet of Things (IIoT). However, as the network scale increases, the energy efficiency (EE) of LoRa networks decreases sharply due to severe packet collisions. To address this issue, it is essential to appropriately assi… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 6 pages, 3 figures, This paper has been accepted for publication in IEEE Global Communications Conference (GLOBECOM) 2023