Skip to main content

Showing 1–50 of 111 results for author: Kong, W

  1. arXiv:2410.07705  [pdf

    cs.RO

    Lean Methodology for Garment Modernization

    Authors: Ray Wai Man Kong, Theodore Ho Tin Kong, Tianxu Huang

    Abstract: Lean Methodology for Garment Modernization. This article presents the lean methodology for modernizing garment manufacturing, focusing on lean thinking, lean practices, automation development, VSM, and CRP, and how to integrate them effectively. While isolated automation of specific operations can improve efficiency and reduce cycle time, it does not necessarily enhance overall garment output and… ▽ More

    Submitted 10 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: 11 pages,7 Figures

  2. arXiv:2409.13985  [pdf, other

    cs.RO

    LiDAR-based Quadrotor for Slope Inspection in Dense Vegetation

    Authors: Wenyi Liu, Yunfan Ren, Rui Guo, Vickie W. W. Kong, Anthony S. P. Hung, Fangcheng Zhu, Yixi Cai, Yuying Zou, Fu Zhang

    Abstract: This work presents a LiDAR-based quadrotor system for slope inspection in dense vegetation environments. Cities like Hong Kong are vulnerable to climate hazards, which often result in landslides. To mitigate the landslide risks, the Civil Engineering and Development Department (CEDD) has constructed steel flexible debris-resisting barriers on vulnerable natural catchments to protect residents. How… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: 36 pages

  3. arXiv:2408.11311  [pdf, other

    cs.AR quant-ph

    HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

    Authors: Qi Zhou, Zi-Hao Mei, Han-Qing Shi, Liang-Liang Guo, Xiao-Yan Yang, Yun-Jie Wang, Xiao-Fan Xu, Cheng Xue, Wei-Cheng Kong, Jun-Chao Wang, Yu-Chun Wu, Zhao-Yun Chen, Guo-Ping Guo

    Abstract: Quantum computing holds immense potential for addressing a myriad of intricate challenges, which is significantly amplified when scaled to thousands of qubits. However, a major challenge lies in developing an efficient and scalable quantum control system. To address this, we propose a novel Hierarchical MicroArchitecture (HiMA) designed to facilitate qubit scaling and exploit quantum process-level… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.09899  [pdf, other

    cs.AI cs.CV cs.HC

    LCE: A Framework for Explainability of DNNs for Ultrasound Image Based on Concept Discovery

    Authors: Weiji Kong, Xun Gong, Juan Wang

    Abstract: Explaining the decisions of Deep Neural Networks (DNNs) for medical images has become increasingly important. Existing attribution methods have difficulty explaining the meaning of pixels while existing concept-based methods are limited by additional annotations or specific model structures that are difficult to apply to ultrasound images. In this paper, we propose the Lesion Concept Explainer (LC… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.09504  [pdf

    cs.RO

    Design and Experimental Study of Vacuum Suction Grabbing Technology to Grasp Fabric Piece

    Authors: Ray Wai Man Kong, Mingyi Liu, Theodore Ho Tin Kong

    Abstract: Vacuum Suction Grabbing Technology. The primary objective of this study was to design the grabbing technique used to determine the vacuum suction gripper and its design parameters for the pocket welting operation in apparel manufacturing. It presents the application of vacuum suction in grabbing technology, a technique that has revolutionized the handling and manipulation to grasp the various fabr… ▽ More

    Submitted 8 October, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 9 Pages, 3 figures, 6 diagrams, 1 table

  6. arXiv:2408.07100  [pdf, other

    cs.LG cs.AI

    Pattern-Matching Dynamic Memory Network for Dual-Mode Traffic Prediction

    Authors: Wenchao Weng, Mei Wu, Hanyu Jiang, Wanzeng Kong, Xiangjie Kong, Feng Xia

    Abstract: In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  7. arXiv:2408.00573  [pdf, ps, other

    cs.LG

    Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks

    Authors: Xianliang Xu, Ting Du, Wang Kong, Ye Li, Zhongyi Huang

    Abstract: First-order methods, such as gradient descent (GD) and stochastic gradient descent (SGD), have been proven effective in training neural networks. In the context of over-parameterization, there is a line of work demonstrating that randomly initialized (stochastic) gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. However, the lea… ▽ More

    Submitted 6 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

  8. arXiv:2407.21416  [pdf, other

    cs.CV cs.RO

    VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning

    Authors: Yuhang Ming, Minyang Xu, Xingrui Yang, Weicai Ye, Weihan Wang, Yong Peng, Weichen Dai, Wanzeng Kong

    Abstract: Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive performance at the cost of heavy pre-training and limited generalizability. When deployed in unseen environments, these methods exhibit significant performan… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  9. arXiv:2407.12108  [pdf, other

    cs.LG cs.CL cs.CR

    Private prediction for large-scale synthetic text generation

    Authors: Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia Ponomareva, Umar Syed, Andreas Terzis, Sergei Vassilvitskii

    Abstract: We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the mod… ▽ More

    Submitted 9 October, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 20 pages; updated figure + some new experiments from EMNLP 2024 findings camera-ready

  10. arXiv:2407.10374  [pdf, other

    cs.CV cs.AI

    An Empirical Study of Mamba-based Pedestrian Attribute Recognition

    Authors: Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao, Qingchuan Ma, Chenglong Li, Jin Tang

    Abstract: Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: In Peer Review

  11. arXiv:2407.05237  [pdf, ps, other

    cs.LG cs.CR cs.DS math.OC stat.ML

    Privacy of the last iterate in cyclically-sampled DP-SGD on nonconvex composite losses

    Authors: Weiwei Kong, Mónica Ribero

    Abstract: Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    MSC Class: 65K10 (Primary); 60G15; 68P27 ACM Class: G.3; G.1.6

  12. arXiv:2407.02827  [pdf, other

    cs.LG math.OC

    Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks

    Authors: Xianliang Xu, Ting Du, Wang Kong, Ye Li, Zhongyi Huang

    Abstract: Optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD) outperforms it in handling certain multi-scale problems. In this paper, we provide convergence analysis for the IGD in training over-parameterized two-layer PINNs. We first… ▽ More

    Submitted 10 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  13. arXiv:2406.12282  [pdf, other

    cs.LG

    SAGDFN: A Scalable Adaptive Graph Diffusion Forecasting Network for Multivariate Time Series Forecasting

    Authors: Yue Jiang, Xiucheng Li, Yile Chen, Shuai Liu, Weilong Kong, Antonis F. Lentzakis, Gao Cong

    Abstract: Time series forecasting is essential for our daily activities and precise modeling of the complex correlations and shared patterns among multiple time series is essential for improving forecasting performance. Spatial-Temporal Graph Neural Networks (STGNNs) are widely used in multivariate time series forecasting tasks and have achieved promising performance on multiple real-world datasets for thei… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted at ICDE 2024

  14. arXiv:2405.08311  [pdf, ps, other

    cs.CL cs.AI

    A Decoupling and Aggregating Framework for Joint Extraction of Entities and Relations

    Authors: Yao Wang, Xin Liu, Weikun Kong, Hai-Tao Yu, Teeradaj Racharak, Kyoung-Sook Kim, Minh Le Nguyen

    Abstract: Named Entity Recognition and Relation Extraction are two crucial and challenging subtasks in the field of Information Extraction. Despite the successes achieved by the traditional approaches, fundamental research questions remain open. First, most recent studies use parameter sharing for a single subtask or shared features for both two subtasks, ignoring their semantic differences. Second, informa… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  15. arXiv:2405.06995  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Benchmarking Cross-Domain Audio-Visual Deception Detection

    Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot

    Abstract: Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features d… ▽ More

    Submitted 5 October, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

    Comments: 12 pages

  16. arXiv:2405.06361  [pdf, other

    cs.LG

    Certified $\ell_2$ Attribution Robustness via Uniformly Smoothed Attributions

    Authors: Fan Wang, Adams Wai-Kin Kong

    Abstract: Model attribution is a popular tool to explain the rationales behind model predictions. However, recent work suggests that the attributions are vulnerable to minute perturbations, which can be added to input samples to fool the attributions while maintaining the prediction outputs. Although empirical studies have shown positive performance via adversarial training, an effective certified defense m… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  17. arXiv:2405.01825  [pdf, other

    cs.CV

    Improving Concept Alignment in Vision-Language Concept Bottleneck Models

    Authors: Nithish Muthuchamy Selvaraj, Xiaobao Guo, Adams Wai-Kin Kong, Alex Kot

    Abstract: Concept Bottleneck Models (CBM) map images to human-interpretable concepts before making class predictions. Recent approaches automate CBM construction by prompting Large Language Models (LLMs) to generate text concepts and employing Vision Language Models (VLMs) to score these concepts for CBM training. However, it is desired to build CBMs with concepts defined by human experts rather than LLM-ge… ▽ More

    Submitted 24 August, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  18. arXiv:2404.15409  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

    Authors: Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C. Perdomo, Adam Smith

    Abstract: We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d^{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our ne… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 42 pages, 3 figures

  19. arXiv:2404.09516  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    State Space Model for New-Generation Network Alternative to Transformers: A Survey

    Authors: Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, Yaowei Wang, Yonghong Tian, Jin Tang

    Abstract: In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: The First review of State Space Model (SSM)/Mamba and their applications in artificial intelligence, 33 pages

  20. arXiv:2403.10214  [pdf, other

    cs.CL

    Enhanced Coherence-Aware Network with Hierarchical Disentanglement for Aspect-Category Sentiment Analysis

    Authors: Jin Cui, Fumiyo Fukumoto, Xinfeng Wang, Yoshimi Suzuki, Jiyi Li, Noriko Tomuro, Wanzeng Kong

    Abstract: Aspect-category-based sentiment analysis (ACSA), which aims to identify aspect categories and predict their sentiments has been intensively studied due to its wide range of NLP applications. Most approaches mainly utilize intrasentential features. However, a review often includes multiple different aspect categories, and some of them do not explicitly appear in the review. Even in a sentence, ther… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  21. arXiv:2403.10021  [pdf, other

    cs.CR

    Time-Frequency Jointed Imperceptible Adversarial Attack to Brainprint Recognition with Deep Learning Models

    Authors: Hangjie Yi, Yuhang Ming, Dongjun Liu, Wanzeng Kong

    Abstract: EEG-based brainprint recognition with deep learning models has garnered much attention in biometric identification. Yet, studies have indicated vulnerability to adversarial attacks in deep learning models with EEG inputs. In this paper, we introduce a novel adversarial attack method that jointly attacks time-domain and frequency-domain EEG signals by employing wavelet transform. Different from mos… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: This work is accepted by ICME 2024

  22. arXiv:2403.06135  [pdf, other

    cs.CV cs.AI cs.LG

    MACE: Mass Concept Erasure in Diffusion Models

    Authors: Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, Adams Wai-Kin Kong

    Abstract: The rapid expansion of large-scale text-to-image diffusion models has raised growing concerns regarding their potential misuse in creating harmful or misleading content. In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure. This task aims to prevent models from generating images that embody unwanted concepts when prompted. Existing concept erasure methods a… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  23. arXiv:2401.08189  [pdf, other

    cs.AI cs.CL cs.LG

    PRewrite: Prompt Rewriting with Reinforcement Learning

    Authors: Weize Kong, Spurthi Amba Hombaiah, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

    Abstract: Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a "trial and error" fashion that can be time consuming, ineffective, and sub-optimal. Even for the prompts which seemingly work well, there is always a lingering question: can the prompts be made better with further modifications? To address these problems, we investigate automat… ▽ More

    Submitted 10 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  24. arXiv:2401.06954  [pdf, other

    cs.CL

    Bridging the Preference Gap between Retrievers and LLMs

    Authors: Zixuan Ke, Weize Kong, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

    Abstract: Large Language Models (LLMs) have demonstrated superior results across a wide range of tasks, and Retrieval-augmented Generation (RAG) is an effective way to enhance the performance by locating relevant information and placing it into the context window of the LLM. However, the relationship between retrievers and LLMs in a RAG is still under-investigated. Most existing work treats the retriever an… ▽ More

    Submitted 20 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  25. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  26. arXiv:2312.09538  [pdf, other

    cs.CV cs.RO

    AEGIS-Net: Attention-guided Multi-Level Feature Aggregation for Indoor Place Recognition

    Authors: Yuhang Ming, Jian Ma, Xingrui Yang, Weichen Dai, Yong Peng, Wanzeng Kong

    Abstract: We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our A… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

  27. arXiv:2311.16416  [pdf, other

    cs.DS cs.LG stat.ML

    A Combinatorial Approach to Robust PCA

    Authors: Weihao Kong, Mingda Qiao, Rajat Sen

    Abstract: We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown $k$-dimensional subspace $U \subseteq \mathbb{R}^d$, and $s$ randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenari… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: To appear at ITCS 2024

  28. arXiv:2311.14580  [pdf, other

    cs.CV

    Large Language Models as Automated Aligners for benchmarking Vision-Language Models

    Authors: Yuanfeng Ji, Chongjian Ge, Weikai Kong, Enze Xie, Zhengying Liu, Zhengguo Li, Ping Luo

    Abstract: With the advancements in Large Language Models (LLMs), Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks. However, existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets to measure task-specific performance, face significant limitations in assessing the alignment of th… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  29. arXiv:2311.14464  [pdf, other

    cs.LG cs.CE physics.flu-dyn

    Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation

    Authors: Loh Sher En Jessica, Naheed Anjum Arafat, Wei Xian Lim, Wai Lee Chan, Adams Wai Kin Kong

    Abstract: Computational fluid dynamics (CFD) simulation is an irreplaceable modelling step in many engineering designs, but it is often computationally expensive. Some graph neural network (GNN)-based CFD methods have been proposed. However, the current methods inherit the weakness of traditional numerical simulators, as well as ignore the cell characteristics in the mesh used in the finite volume method, a… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  30. arXiv:2311.08362  [pdf, other

    cs.LG stat.ML

    Transformers can optimally learn regression mixture models

    Authors: Reese Pathak, Rajat Sen, Weihao Kong, Abhimanyu Das

    Abstract: Mixture models arise in many regression problems, but most methods have seen limited adoption partly due to these algorithms' highly-tailored and model-specific nature. On the other hand, transformers are flexible, neural sequence models that present the intriguing possibility of providing general-purpose prediction methods, even in this mixture setting. In this work, we investigate the hypothesis… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 24 pages, 9 figures

  31. arXiv:2311.05383  [pdf

    cs.CV

    Improving Hand Recognition in Uncontrolled and Uncooperative Environments using Multiple Spatial Transformers and Loss Functions

    Authors: Wojciech Michal Matkowski, Xiaojie Li, Adams Wai Kin Kong

    Abstract: The prevalence of smartphone and consumer camera has led to more evidence in the form of digital images, which are mostly taken in uncontrolled and uncooperative environments. In these images, criminals likely hide or cover their faces while their hands are observable in some cases, creating a challenging use case for forensic investigation. Many existing hand-based recognition methods perform wel… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  32. arXiv:2310.12570  [pdf, other

    eess.IV cs.CV cs.GR cs.LG

    DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

    Authors: Guanqun Sun, Yizhi Pan, Weikun Kong, Zichang Xu, Jianhua Ma, Teeradaj Racharak, Le-Minh Nguyen, Junyi Xin

    Abstract: Accurate medical image segmentation is critical for disease quantification and treatment evaluation. While traditional Unet architectures and their transformer-integrated variants excel in automated segmentation tasks. However, they lack the ability to harness the intrinsic position and channel features of image. Existing models also struggle with parameter efficiency and computational complexity,… ▽ More

    Submitted 14 November, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

  33. arXiv:2310.10688  [pdf, other

    cs.CL cs.AI cs.LG

    A decoder-only foundation model for time-series forecasting

    Authors: Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou

    Abstract: Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention… ▽ More

    Submitted 17 April, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

  34. arXiv:2310.05116  [pdf, other

    cs.CL cs.IR

    Utilizing Contextual Clues and Role Correlations for Enhancing Document-level Event Argument Extraction

    Authors: Wanlong Liu, Dingyi Zeng, Li Zhou, Yichen Xiao, Weishan Kong, Malu Zhang, Shaohuan Cheng, Hongyang Zhao, Wenyu Chen

    Abstract: Document-level event argument extraction is a crucial yet challenging task within the field of information extraction. Current mainstream approaches primarily focus on the information interaction between event triggers and their arguments, facing two limitations: insufficient context interaction and the ignorance of event correlations. Here, we introduce a novel framework named CARLG (Contextual A… ▽ More

    Submitted 3 April, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: pre-submission

  35. arXiv:2310.03104  [pdf, other

    cs.LG cs.CR

    DP-SGD for non-decomposable objective functions

    Authors: William Kong, Andrés Muñoz Medina, Mónica Ribero

    Abstract: Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using different… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  36. arXiv:2310.00296  [pdf, other

    cs.CV

    QUIZ: An Arbitrary Volumetric Point Matching Method for Medical Image Registration

    Authors: Lin Liu, Xinxin Fan, Haoyang Liu, Chulong Zhang, Weibin Kong, Jingjing Dai, Yuming Jiang, Yaoqin Xie, Xiaokun Liang

    Abstract: Rigid pre-registration involving local-global matching or other large deformation scenarios is crucial. Current popular methods rely on unsupervised learning based on grayscale similarity, but under circumstances where different poses lead to varying tissue structures, or where image quality is poor, these methods tend to exhibit instability and inaccuracies. In this study, we propose a novel meth… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  37. Learning to Rewrite Prompts for Personalized Text Generation

    Authors: Cheng Li, Mingyang Zhang, Qiaozhu Mei, Weize Kong, Michael Bendersky

    Abstract: Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction. Most existing studies focus on designing specialized models for a particular domain, or they require fine-tuning the LLMs to generate personalized text. We consider a typical scenario in which the large language model, which generates personalized output, is frozen and can onl… ▽ More

    Submitted 8 February, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: In Proceedings of the ACM Web Conference 2024 (WWW '24)

  38. arXiv:2309.01973  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Linear Regression using Heterogeneous Data Batches

    Authors: Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky

    Abstract: In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and import… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  39. arXiv:2308.12531  [pdf, other

    cs.CL

    CARE: Co-Attention Network for Joint Entity and Relation Extraction

    Authors: Wenjun Kong, Yamei Xia

    Abstract: Joint entity and relation extraction is the fundamental task of information extraction, consisting of two subtasks: named entity recognition and relation extraction. However, most existing joint extraction methods suffer from issues of feature confusion or inadequate interaction between the two subtasks. Addressing these challenges, in this work, we propose a Co-Attention network for joint entity… ▽ More

    Submitted 27 March, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted by LREC-COLING 2024

  40. arXiv:2308.04001  [pdf, other

    cs.GR

    Explicit Topology Optimization of Conforming Voronoi Foams

    Authors: Ming Li, Jingqiao Hu, Wei Chen, Weipeng Kong, Jin Huang

    Abstract: Topology optimization is able to maximally leverage the high DOFs and mechanical potentiality of porous foams but faces three fundamental challenges: conforming to free-form outer shapes, maintaining geometric connectivity between adjacent cells, and achieving high simulation accuracy. To resolve the issues, borrowing the concept from Voronoi tessellation, we propose to use the site (or seed) posi… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  41. arXiv:2307.12493  [pdf, other

    cs.CV cs.AI

    TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

    Authors: Shilin Lu, Yanzhu Liu, Adams Wai-Kin Kong

    Abstract: Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Curr… ▽ More

    Submitted 10 October, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  42. arXiv:2307.05608  [pdf, other

    cs.CR

    DP-Auditorium: a Large Scale Library for Auditing Differential Privacy

    Authors: William Kong, Andrés Muñoz Medina, Mónica Ribero, Umar Syed

    Abstract: New regulations and increased awareness of data privacy have led to the deployment of new and more efficient differentially private mechanisms across public institutions and industries. Ensuring the correctness of these mechanisms is therefore crucial to ensure the proper protection of data. However, since differential privacy is a property of the mechanism itself, and not of an individual output,… ▽ More

    Submitted 18 December, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

  43. arXiv:2306.07096  [pdf, other

    cs.CV

    Global and Local Semantic Completion Learning for Vision-Language Pre-training

    Authors: Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

    Abstract: Cross-modal alignment plays a crucial role in vision-language pre-training (VLP) models, enabling them to capture meaningful associations across different modalities. For this purpose, numerous masked modeling tasks have been proposed for VLP to further promote cross-modal interactions. The core idea of previous masked modeling tasks is to focus on reconstructing the masked tokens based on visible… ▽ More

    Submitted 5 December, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.13437

  44. arXiv:2306.00676  [pdf, other

    cs.CV

    Hyperspectral Target Detection Based on Low-Rank Background Subspace Learning and Graph Laplacian Regularization

    Authors: Dunbin Shen, Xiaorui Ma, Wenfeng Kong, Jiacheng Tian, Hongyu Wang

    Abstract: Hyperspectral target detection is good at finding dim and small objects based on spectral characteristics. However, existing representation-based methods are hindered by the problem of the unknown background dictionary and insufficient utilization of spatial information. To address these issues, this paper proposes an efficient optimizing approach based on low-rank representation (LRR) and graph L… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 4 pages, 3 figures, 1 table

  45. arXiv:2305.17445  [pdf, other

    cs.SE

    Synthesizing Speech Test Cases with Text-to-Speech? An Empirical Study on the False Alarms in Automated Speech Recognition Testing

    Authors: Julia Kaiwen Lau, Kelvin Kai Wen Kong, Julian Hao Yong, Per Hoong Tan, Zhou Yang, Zi Qian Yong, Joshua Chern Wey Low, Chun Yong Chong, Mei Kuan Lim, David Lo

    Abstract: Recent studies have proposed the use of Text-To-Speech (TTS) systems to automatically synthesise speech test cases on a scale and uncover a large number of failures in ASR systems. However, the failures uncovered by synthetic test cases may not reflect the actual performance of an ASR system when it transcribes human audio, which we refer to as false alarms. Given a failed test case synthesised fr… ▽ More

    Submitted 18 July, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: 13 pages, Accepted at ISSTA2023

  46. arXiv:2304.08424  [pdf, other

    stat.ML cs.LG

    Long-term Forecasting with TiDE: Time-series Dense Encoder

    Authors: Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, Rose Yu

    Abstract: Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and… ▽ More

    Submitted 4 April, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

  47. arXiv:2303.12745  [pdf, other

    cs.CV cs.AI

    Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning

    Authors: Xiaobao Guo, Nithish Muthuchamy Selvaraj, Zitong Yu, Adams Wai-Kin Kong, Bingquan Shen, Alex Kot

    Abstract: Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this is… ▽ More

    Submitted 3 August, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: 11 pages, 6 figures

  48. arXiv:2303.03131  [pdf, other

    cs.CV cs.AI cs.MM

    Video Question Answering Using CLIP-Guided Visual-Text Attention

    Authors: Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, Xudong Jiang

    Abstract: Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text featur… ▽ More

    Submitted 8 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Submitted to the 2023 IEEE International Conference on Image Processing (ICIP 2023)

    ACM Class: I.2.10

  49. arXiv:2303.03105  [pdf, other

    cs.MM

    Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset

    Authors: Weikai Kong, Shuhong Ye, Chenglin Yao, Jianfeng Ren

    Abstract: Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the so… ▽ More

    Submitted 7 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

  50. arXiv:2303.00340  [pdf, other

    cs.LG cs.CR cs.CV

    A Practical Upper Bound for the Worst-Case Attribution Deviations

    Authors: Fan Wang, Adams Wai-Kin Kong

    Abstract: Model attribution is a critical component of deep neural networks (DNNs) for its interpretability to complex models. Recent studies bring up attention to the security of attribution methods as they are vulnerable to attribution attacks that generate similar images with dramatically different attributions. Existing works have been investigating empirically improving the robustness of DNNs against t… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.