subscribe to arXiv mailings

On the token distance modeling ability of higher RoPE attention dimension

Authors: Xiangyu Hong, Che Jiang, Biqing Qi, Fandong Meng, Mo Yu, Bowen Zhou, Jie Zhou

Abstract: Length extrapolation algorithms based on Rotary position embedding (RoPE) have shown promising results in extending the context length of language models. However, understanding how position embedding can capture longer-range contextual information remains elusive. Based on the intuition that different dimensions correspond to different frequency of changes in RoPE encoding, we conducted a dimensi… ▽ More Length extrapolation algorithms based on Rotary position embedding (RoPE) have shown promising results in extending the context length of language models. However, understanding how position embedding can capture longer-range contextual information remains elusive. Based on the intuition that different dimensions correspond to different frequency of changes in RoPE encoding, we conducted a dimension-level analysis to investigate the correlation between a hidden dimension of an attention head and its contribution to capturing long-distance dependencies. Using our correlation metric, we identified a particular type of attention heads, which we named Positional Heads, from various length-extrapolated models. These heads exhibit a strong focus on long-range information interaction and play a pivotal role in long input processing, as evidence by our ablation. We further demonstrate the correlation between the efficiency of length extrapolation and the extension of the high-dimensional attention allocation of these heads. The identification of Positional Heads provides insights for future research in long-text comprehension. △ Less

Submitted 21 October, 2024; v1 submitted 11 October, 2024; originally announced October 2024.

Comments: Accepted to EMNLP 2024 Findings

arXiv:2409.18339 [pdf, other]

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

Authors: Xin Hong, Yuan Gong, Vidhyasaharan Sethu, Ting Dang

Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated great success in many Natural Language Processing (NLP) tasks. In addition to their cognitive intelligence, exploring their capabilities in emotional intelligence is also crucial, as it enables more natural and empathetic conversational AI. Recent studies have shown LLMs' capability in recognizing emotions, but they often focus… ▽ More Recent advancements in Large Language Models (LLMs) have demonstrated great success in many Natural Language Processing (NLP) tasks. In addition to their cognitive intelligence, exploring their capabilities in emotional intelligence is also crucial, as it enables more natural and empathetic conversational AI. Recent studies have shown LLMs' capability in recognizing emotions, but they often focus on single emotion labels and overlook the complex and ambiguous nature of human emotions. This study is the first to address this gap by exploring the potential of LLMs in recognizing ambiguous emotions, leveraging their strong generalization capabilities and in-context learning. We design zero-shot and few-shot prompting and incorporate past dialogue as context information for ambiguous emotion recognition. Experiments conducted using three datasets indicate significant potential for LLMs in recognizing ambiguous emotions, and highlight the substantial benefits of including context information. Furthermore, our findings indicate that LLMs demonstrate a high degree of effectiveness in recognizing less ambiguous emotions and exhibit potential for identifying more ambiguous emotions, paralleling human perceptual capabilities. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: 5 pages, 4 figures

arXiv:2409.09391 [pdf, other]

Tran-GCN: A Transformer-Enhanced Graph Convolutional Network for Person Re-Identification in Monitoring Videos

Authors: Xiaobin Hong, Tarmizi Adam, Masitah Ghazali

Abstract: Person Re-Identification (Re-ID) has gained popularity in computer vision, enabling cross-camera pedestrian recognition. Although the development of deep learning has provided a robust technical foundation for person Re-ID research, most existing person Re-ID methods overlook the potential relationships among local person features, failing to adequately address the impact of pedestrian pose variat… ▽ More Person Re-Identification (Re-ID) has gained popularity in computer vision, enabling cross-camera pedestrian recognition. Although the development of deep learning has provided a robust technical foundation for person Re-ID research, most existing person Re-ID methods overlook the potential relationships among local person features, failing to adequately address the impact of pedestrian pose variations and local body parts occlusion. Therefore, we propose a Transformer-enhanced Graph Convolutional Network (Tran-GCN) model to improve Person Re-Identification performance in monitoring videos. The model comprises four key components: (1) A Pose Estimation Learning branch is utilized to estimate pedestrian pose information and inherent skeletal structure data, extracting pedestrian key point information; (2) A Transformer learning branch learns the global dependencies between fine-grained and semantically meaningful local person features; (3) A Convolution learning branch uses the basic ResNet architecture to extract the person's fine-grained local features; (4) A Graph Convolutional Module (GCM) integrates local feature information, global feature information, and body information for more effective person identification after fusion. Quantitative and qualitative analysis experiments conducted on three different datasets (Market-1501, DukeMTMC-ReID, and MSMT17) demonstrate that the Tran-GCN model can more accurately capture discriminative person features in monitoring videos, significantly improving identification accuracy. △ Less

Submitted 14 September, 2024; originally announced September 2024.

arXiv:2409.05385 [pdf, other]

Towards Building a Robust Knowledge Intensive Question Answering Model with Large Language Models

Authors: Xingyun Hong, Yan Shao, Zhilin Wang, Manni Duan, Jin Xiongnan

Abstract: The development of LLMs has greatly enhanced the intelligence and fluency of question answering, while the emergence of retrieval enhancement has enabled models to better utilize external information. However, the presence of noise and errors in retrieved information poses challenges to the robustness of LLMs. In this work, to evaluate the model's performance under multiple interferences, we first… ▽ More The development of LLMs has greatly enhanced the intelligence and fluency of question answering, while the emergence of retrieval enhancement has enabled models to better utilize external information. However, the presence of noise and errors in retrieved information poses challenges to the robustness of LLMs. In this work, to evaluate the model's performance under multiple interferences, we first construct a dataset based on machine reading comprehension datasets simulating various scenarios, including critical information absence, noise, and conflicts. To address the issue of model accuracy decline caused by noisy external information, we propose a data augmentation-based fine-tuning method to enhance LLM's robustness against noise. Additionally, contrastive learning approach is utilized to preserve the model's discrimination capability of external information. We have conducted experiments on both existing LLMs and our approach, the results are evaluated by GPT-4, which indicates that our proposed methods improve model robustness while strengthening the model's discrimination capability. △ Less

Submitted 17 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

Comments: This paper has been accepted by NLPCC-2024

arXiv:2408.08412 [pdf, other]

Penny-Wise and Pound-Foolish in Deepfake Detection

Authors: Yabin Wang, Zhiwu Huang, Su Zhou, Adam Prugel-Bennett, Xiaopeng Hong

Abstract: The diffusion of deepfake technologies has sparked serious concerns about its potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancement, many current approaches prioritize short-term gains at expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models solely with a penny-wise o… ▽ More The diffusion of deepfake technologies has sparked serious concerns about its potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancement, many current approaches prioritize short-term gains at expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models solely with a penny-wise objective on a single deepfake dataset, while disregarding the pound-wise balance for generalization and knowledge retention. To address this "Penny-Wise and Pound-Foolish" issue, we propose a novel learning framework (PoundNet) for generalization of deepfake detection on a pre-trained vision-language model. PoundNet incorporates a learnable prompt design and a balanced objective to preserve broad knowledge from upstream tasks (object classification) while enhancing generalization for downstream tasks (deepfake detection). We train PoundNet on a standard single deepfake dataset, following common practice in the literature. We then evaluate its performance across 10 public large-scale deepfake datasets with 5 main evaluation metrics-forming the largest benchmark test set for assessing the generalization ability of deepfake detection models, to our knowledge. The comprehensive benchmark evaluation demonstrates the proposed PoundNet is significantly less "Penny-Wise and Pound-Foolish", achieving a remarkable improvement of 19% in deepfake detection performance compared to state-of-the-art methods, while maintaining a strong performance of 63% on object classification tasks, where other deepfake detection models tend to be ineffective. Code and data are open-sourced at https://github.com/iamwangyabin/PoundNet. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.06647 [pdf, other]

Magnetic Field of the Quasar 1604+159 from Parsec to Kilo-parsec Scale

Authors: Xu-Zhi Hu, Xiaoyu Hong, Wei Zhao, Liang Chen, Wei-Yang Wang, Linhui Wu

Abstract: We present a multi-frequency polarimetric study for the quasar 1604+159. The source was observed at the $L$ band with the American Very Long Baseline Array (VLBA) and the $L$, $X$, and $U$ bands with the Very Large Array (VLA). These observations provide different resolutions from mas to arcsec, enabling us to probe the morphology and magnetic field from tens of parsec to hundreds of kilo-parsec s… ▽ More We present a multi-frequency polarimetric study for the quasar 1604+159. The source was observed at the $L$ band with the American Very Long Baseline Array (VLBA) and the $L$, $X$, and $U$ bands with the Very Large Array (VLA). These observations provide different resolutions from mas to arcsec, enabling us to probe the morphology and magnetic field from tens of parsec to hundreds of kilo-parsec scale. We detect a symmetrical Fanaroff-Riley-Class-I-like structure. The source has several lobes and bulges, forming a cocoon shape. The polarization is normal to the edges of the structure with high fractional polarization up to $\sim 60\%$. Two hotspots are observed at the eastern and western sides of the source, located symmetrically relative to the core. The flux density ratio ($>1.5$) between the two hotspots suggests the Doppler beaming effect exists at a large scale. The polarized emission in the hotspots also shows a symmetrical structure with an oblique direction from the jet direction. In general, the jet propagates in a collimating structure with several bends. Polarization is also detected perpendicular to the local jet from $\sim$100 mas to $\sim$ 1 arcsec. The jet shows strong polarized intensity and high fractional polarization at the bending edges. We discuss the possible origins of the observed structure and magnetic field. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: 17 pages, accepted for publication in ApJ

arXiv:2408.01980 [pdf, other]

Measurement Induced Magic Resources

Authors: Gongchu Li, Lei Chen, Si-Qi Zhang, Xu-Song Hong, Huaqing Xu, Yuancheng Liu, You Zhou, Geng Chen, Chuan-Feng Li, Alioscia Hamma, Guang-Can Guo

Abstract: Magic states and magic gates are crucial for achieving universal computation, but some important questions about how magic resources should be implemented to attain quantum advantage have remained unexplored, for instance, in the context of Measurement-based Quantum Computation (MQC) with only single-qubit measurements. This work bridges the gap between MQC and the resource theory of magic by intr… ▽ More Magic states and magic gates are crucial for achieving universal computation, but some important questions about how magic resources should be implemented to attain quantum advantage have remained unexplored, for instance, in the context of Measurement-based Quantum Computation (MQC) with only single-qubit measurements. This work bridges the gap between MQC and the resource theory of magic by introducing the concept of ``invested'' and ``potential" magic resources. The former quantifies the magic cost associated with the MQC framework, serving both as a witness of magic resources and an upper bound for the realization of a desired unitary transformation. Potential magic resources represent the maximum achievable magic resource in a given graph structure defining the MQC. We utilize these concepts to analyze the magic resource requirements of the Quantum Fourier Transform (QFT) and provide a fresh perspective on the universality of MQC of different resource states, highlighting the crucial role of non-Pauli measurements for injecting magic. We demonstrate experimentally our theoretical predictions in a high-fidelity four-photon setup and demonstrate the efficiency of MQC in generating magic states, surpassing the limitations of conventional magic state injection methods. Our findings pave the way for future research exploring magic resource optimization and novel distillation schemes within the MQC framework, contributing to the advancement of fault-tolerant universal quantum computation. △ Less

Submitted 29 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

Comments: 25 pages, 11 figures

arXiv:2407.19491 [pdf, other]

Multi-modal Crowd Counting via Modal Emulation

Authors: Chenhao Wang, Xiaopeng Hong, Zhiheng Ma, Yupeng Wei, Yabin Wang, Xiaopeng Fan

Abstract: Multi-modal crowd counting is a crucial task that uses multi-modal cues to estimate the number of people in crowded scenes. To overcome the gap between different modalities, we propose a modal emulation-based two-pass multi-modal crowd-counting framework that enables efficient modal emulation, alignment, and fusion. The framework consists of two key components: a \emph{multi-modal inference} pass… ▽ More Multi-modal crowd counting is a crucial task that uses multi-modal cues to estimate the number of people in crowded scenes. To overcome the gap between different modalities, we propose a modal emulation-based two-pass multi-modal crowd-counting framework that enables efficient modal emulation, alignment, and fusion. The framework consists of two key components: a \emph{multi-modal inference} pass and a \emph{cross-modal emulation} pass. The former utilizes a hybrid cross-modal attention module to extract global and local information and achieve efficient multi-modal fusion. The latter uses attention prompting to coordinate different modalities and enhance multi-modal alignment. We also introduce a modality alignment module that uses an efficient modal consistency loss to align the outputs of the two passes and bridge the semantic gap between modalities. Extensive experiments on both RGB-Thermal and RGB-Depth counting datasets demonstrate its superior performance compared to previous methods. Code available at https://github.com/Mr-Monday/Multi-modal-Crowd-Counting-via-Modal-Emulation. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: This is the preprint version of the paper to appear in BMVC 2024. Please cite the final published version. Code is available at https://github.com/Mr-Monday/Multi-modal-Crowd-Counting-via-Modal-Emulation

arXiv:2407.19078 [pdf, other]

Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization procedure to automate budget decision-making for cities, relying on feature store, model training and serving, optimizers, and backtesting; proposing state-of-the-art deep learning (DL) estimator based on S-Learner and a novel tensor B-Spline regression model, we solve high-dimensional optimization with ADMM and primal-dual interior point convex optimization, substantially improving Uber's resource allocation efficiency. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

MSC Class: 62J99

arXiv:2407.11086 [pdf, other]

Pre-training with Fractional Denoising to Enhance Molecular Property Prediction

Authors: Yuyan Ni, Shikun Feng, Xin Hong, Yuancheng Sun, Wei-Ying Ma, Zhi-Ming Ma, Qiwei Ye, Yanyan Lan

Abstract: Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook… ▽ More Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to significantly improve molecular distribution modeling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.09367 [pdf, other]

Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

Authors: Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma, Weijun Zhuang, Yaohui Ma, Yong Dai, Yaowei Wang

Abstract: Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts. To address these challenges, we reshape the online data bu… ▽ More Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts. To address these challenges, we reshape the online data buffering and organizing mechanism for CTTA. We propose an uncertainty-aware buffering approach to identify and aggregate significant samples with high certainty from the unsupervised, single-pass data stream. Based on this, we propose a graph-based class relation preservation constraint to overcome catastrophic forgetting. Furthermore, a pseudo-target replay objective is used to mitigate error accumulation. Extensive experiments demonstrate the superiority of our method in both segmentation and classification CTTA tasks. Code is available at https://github.com/z1358/OBAO. △ Less

Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

Comments: This is the preprint version of our paper and supplemental material to appear in ECCV 2024

arXiv:2407.07518 [pdf, other]

Multi-modal Crowd Counting via a Broker Modality

Authors: Haoliang Meng, Xiaopeng Hong, Chenhao Wang, Miao Shang, Wangmeng Zuo

Abstract: Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images. This task is challenging due to the significant gap between these distinct modalities. In this paper, we propose a novel approach by introducing an auxiliary broker modality and on this basis frame the task as a triple-modal learning problem. We devise a fusion-based method to generate this brok… ▽ More Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images. This task is challenging due to the significant gap between these distinct modalities. In this paper, we propose a novel approach by introducing an auxiliary broker modality and on this basis frame the task as a triple-modal learning problem. We devise a fusion-based method to generate this broker modality, leveraging a non-diffusion, lightweight counterpart of modern denoising diffusion-based fusion models. Additionally, we identify and address the ghosting effect caused by direct cross-modal image fusion in multi-modal crowd counting. Through extensive experimental evaluations on popular multi-modal crowd-counting datasets, we demonstrate the effectiveness of our method, which introduces only 4 million additional parameters, yet achieves promising results. The code is available at https://github.com/HenryCilence/Broker-Modality-Crowd-Counting. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: This is the preprint version of the paper and supplemental material to appear in ECCV 2024. Please cite the final published version. Code is available at https://github.com/HenryCilence/Broker-Modality-Crowd-Counting

arXiv:2407.01310 [pdf, other]

Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces

Authors: Perusha Moodley, Pramod Kaushik, Dhillu Thambi, Mark Trovinger, Praveen Paruchuri, Xia Hong, Benjamin Rosman

Abstract: Decision Transformers, in their vanilla form, struggle to perform on image-based environments with multi-discrete action spaces. Although enhanced Decision Transformer architectures have been developed to improve performance, these methods have not specifically addressed this problem of multi-discrete action spaces which hampers existing Decision Transformer architectures from learning good repres… ▽ More Decision Transformers, in their vanilla form, struggle to perform on image-based environments with multi-discrete action spaces. Although enhanced Decision Transformer architectures have been developed to improve performance, these methods have not specifically addressed this problem of multi-discrete action spaces which hampers existing Decision Transformer architectures from learning good representations. To mitigate this, we propose Multi-State Action Tokenisation (M-SAT), an approach for tokenising actions in multi-discrete action spaces that enhances the model's performance in such environments. Our approach involves two key changes: disentangling actions to the individual action level and tokenising the actions with auxiliary state information. These two key changes also improve individual action level interpretability and visibility within the attention layers. We demonstrate the performance gains of M-SAT on challenging ViZDoom environments with multi-discrete action spaces and image-based state spaces, including the Deadly Corridor and My Way Home scenarios, where M-SAT outperforms the baseline Decision Transformer without any additional data or heavy computational overheads. Additionally, we find that removing positional encoding does not adversely affect M-SAT's performance and, in some cases, even improves it. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.18159 [pdf, other]

Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

Authors: Xiaolin Hong, Hongwei Yi, Fazhi He, Qiong Cao

Abstract: Generating 3D scenes from human motion sequences supports numerous applications, including virtual reality and architectural design. However, previous auto-regression-based human-aware 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans, often resulting in overlapping object generation in the same space. To address this limit… ▽ More Generating 3D scenes from human motion sequences supports numerous applications, including virtual reality and architectural design. However, previous auto-regression-based human-aware 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans, often resulting in overlapping object generation in the same space. To address this limitation, we explore the potential of diffusion models that simultaneously consider all input humans and the floor plan to generate plausible 3D scenes. Our approach not only satisfies all input human interactions but also adheres to spatial constraints with the floor plan. Furthermore, we introduce two spatial collision guidance mechanisms: human-object collision avoidance and object-room boundary constraints. These mechanisms help avoid generating scenes that conflict with human motions while respecting layout constraints. To enhance the diversity and accuracy of human-guided scene generation, we have developed an automated pipeline that improves the variety and plausibility of human-object interactions in the existing 3D FRONT HUMAN dataset. Extensive experiments on both synthetic and real-world datasets demonstrate that our framework can generate more natural and plausible 3D scenes with precise human-scene interactions, while significantly reducing human-object collisions compared to previous state-of-the-art methods. Our code and data will be made publicly available upon publication of this work. △ Less

Submitted 20 August, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.15877 [pdf, other]

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

Abstract: Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks o… ▽ More Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks or standalone function calls. Solving challenging and practical requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs.To assess how well LLMs can solve challenging and practical tasks via programs, we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks. To evaluate LLMs rigorously, each task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area. △ Less

Submitted 7 October, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

arXiv:2406.15062 [pdf, other]

Decoupled static and dynamical charge correlations in La$_{2-x}$Sr$_x$CuO$_4$

Authors: L. Martinelli, I. Biało, X. Hong, J. Oppliger, C. Lin, T. Schaller, J. Küspert, M. H. Fischer, T. Kurosawa, N. Momono, M. Oda, J. Choi, S. Agrestini, M. Garcia-Fernandez, Ke-Jin Zhou, Q. Wang, J. Chang

Abstract: The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the… ▽ More The relation between charge order, its quantum fluctuations and optical phonon modes in cuprate superconductors remains an unsolved problem. The exploration of these excitations is however complicated by the presence of twinned domains. Here, we use uniaxial strain in combination with ultra-high-resolution Resonant Inelastic X-ray Scattering (RIXS) at the oxygen K- and copper L3-edges to study the excitations stemming from the charge ordering wave vector in La1.875Sr0.125CuO4. By detwinning stripe ordering, we demonstrate that the optical phonon anomalies do not show any stripe anisotropy. The low-energy charge excitations also retain an in-plane four-fold symmetry. As such, we find that both phonon and charge excitations are decoupled entirely from the strength of static charge ordering. The almost isotropic character of charge excitations remains a possible source for the strange metal properties found in the normal state of cuprate superconductors. △ Less

Submitted 15 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures

arXiv:2406.07487 [pdf, other]

GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

Authors: Hang Yao, Ming Liu, Haolin Wang, Zhicun Yin, Zifei Yan, Xiaopeng Hong, Wangmeng Zuo

Abstract: Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with dif… ▽ More Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with different anomalies is uneven. Therefore, instead of utilizing the same setting for all samples, we propose to predict a particular denoising step for each sample by evaluating the difference between image contents and the priors extracted from diffusion models. From the local perspective, reconstructing abnormal regions differs from normal areas even in the same image. Theoretically, the diffusion model predicts a noise for each step, typically following a standard Gaussian distribution. However, due to the difference between the anomaly and its potential normal counterpart, the predicted noise in abnormal regions will inevitably deviate from the standard Gaussian distribution. To this end, we propose introducing synthetic abnormal samples in training to encourage the diffusion models to break through the limitation of standard Gaussian distribution, and a spatial-adaptive feature fusion scheme is utilized during inference. With the above modifications, we propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection, which introduces appealing flexibility and achieves anomaly-free reconstruction while retaining as much normal information as possible. Extensive experiments are conducted on three commonly used anomaly detection datasets (MVTec-AD, MPDD, and VisA) and a printed circuit board dataset (PCB-Bank) we integrated, showing the effectiveness of the proposed method. △ Less

Submitted 9 September, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by ECCV 2024, code and models: https://github.com/hyao1/GLAD. Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

arXiv:2406.00334 [pdf, other]

Image Captioning via Dynamic Path Customization

Authors: Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji

Abstract: This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art approaches are static and hand-crafted networks, which not only heavily rely on expert knowledge, but also ignore the semantic diversity of input samples, therefore resulting in suboptimal performance. To address thes… ▽ More This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art approaches are static and hand-crafted networks, which not only heavily rely on expert knowledge, but also ignore the semantic diversity of input samples, therefore resulting in suboptimal performance. To address these issues, we propose a novel Dynamic Transformer Network (DTNet) for image captioning, which dynamically assigns customized paths to different samples, leading to discriminative yet accurate captions. Specifically, to build a rich routing space and improve routing efficiency, we introduce five types of basic cells and group them into two separate routing spaces according to their operating domains, i.e., spatial and channel. Then, we design a Spatial-Channel Joint Router (SCJR), which endows the model with the capability of path customization based on both spatial and channel information of the input sample. To validate the effectiveness of our proposed DTNet, we conduct extensive experiments on the MS-COCO dataset and achieve new state-of-the-art performance on both the Karpathy split and the online test server. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: TNNLS24

arXiv:2405.20696 [pdf, other]

Directly Estimating Mixed-State Entanglement with Bell Measurement Assistance

Authors: Gong-Chu Li, Lei Chen, Si-Qi Zhang, Xu-Song Hong, You Zhou, Geng Chen, Chuan-Feng Li, Guang-Can Guo

Abstract: Entanglement plays a fundamental role in quantum physics and information processing. Here, we develop an unbiased estimator for mixed-state entanglement in the few-shot scenario and directly estimate it using random unitary evolution in a photonic system. As a supplement to traditional projective measurements, we incorporate Bell measurements on qubit-pairs, enriching the previous randomized measu… ▽ More Entanglement plays a fundamental role in quantum physics and information processing. Here, we develop an unbiased estimator for mixed-state entanglement in the few-shot scenario and directly estimate it using random unitary evolution in a photonic system. As a supplement to traditional projective measurements, we incorporate Bell measurements on qubit-pairs, enriching the previous randomized measurement scheme, which is no-go in this task with only local unitary evolution. The scheme is scalable to n-qubits via Bell measurements on qubit-pairs. The estimator can be derived directly from a few consecutive outcomes while exhibiting greater robustness to system errors and noise compared to schemes based on shadow estimation. We find that, under a fixed measurement resource, the way with more versatile measurement settings with fewer repeats per setting is more efficient. Our protocol and demonstration advance the direct characterization of quantum states in practice. △ Less

Submitted 6 July, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: 5 pages, 4 figures

arXiv:2405.17802 [pdf, other]

Multi-level Interaction Modeling for Protein Mutational Effect Prediction

Authors: Yuanle Mo, Xin Hong, Bowen Gao, Yinjun Jia, Yanyan Lan

Abstract: Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different si… ▽ More Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different sidechain conformations, which lead to changes in the backbone conformation, eventually affecting the binding affinity between proteins. However, existing methods typically focus only on sidechain-level interaction modeling, resulting in suboptimal predictions. In this work, we propose a self-supervised multi-level pre-training framework, ProMIM, to fully capture all three levels of interactions with well-designed pretraining objectives. Experiments show ProMIM outperforms all the baselines on the standard benchmark, especially on mutations where significant changes in backbone conformations may occur. In addition, leading results from zero-shot evaluations for SARS-CoV-2 mutational effect prediction and antibody optimization underscore the potential of ProMIM as a powerful next-generation tool for developing novel therapeutic approaches and new drugs. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.18456 [pdf, other]

Equivalence Checking of Parameterised Quantum Circuits

Authors: Xin Hong, Wei-Jia Huang, Wei-Chen Chien, Yuan Feng, Min-Hsiu Hsieh, Sanjiang Li, Mingsheng Ying

Abstract: Parameterised quantum circuits (PQCs) hold great promise for demonstrating quantum advantages in practical applications of quantum computation. Examples of successful applications include the variational quantum eigensolver, the quantum approximate optimisation algorithm, and quantum machine learning. However, before executing PQCs on real quantum devices, they undergo compilation and optimisation… ▽ More Parameterised quantum circuits (PQCs) hold great promise for demonstrating quantum advantages in practical applications of quantum computation. Examples of successful applications include the variational quantum eigensolver, the quantum approximate optimisation algorithm, and quantum machine learning. However, before executing PQCs on real quantum devices, they undergo compilation and optimisation procedures. Given the inherent error-proneness of these processes, it becomes crucial to verify the equivalence between the original PQC and its compiled or optimised version. Unfortunately, most existing quantum circuit verifiers cannot directly handle parameterised quantum circuits; instead, they require parameter substitution to perform verification. In this paper, we address the critical challenge of equivalence checking for PQCs. We propose a novel compact representation for PQCs based on tensor decision diagrams. Leveraging this representation, we present an algorithm for verifying PQC equivalence without the need for instantiation. Our approach ensures both effectiveness and efficiency, as confirmed by experimental evaluations. The decision-diagram representations offer a powerful tool for analysing and verifying parameterised quantum circuits, bridging the gap between theoretical models and practical implementations. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18060 [pdf, other]

Prompt Customization for Continual Learning

Authors: Yong Dai, Xiaopeng Hong, Yabin Wang, Zhiheng Ma, Dongmei Jiang, Yaowei Wang

Abstract: Contemporary continual learning approaches typically select prompts from a pool, which function as supplementary inputs to a pre-trained model. However, this strategy is hindered by the inherent noise of its selection approach when handling increasing tasks. In response to these challenges, we reformulate the prompting approach for continual learning and propose the prompt customization (PC) metho… ▽ More Contemporary continual learning approaches typically select prompts from a pool, which function as supplementary inputs to a pre-trained model. However, this strategy is hindered by the inherent noise of its selection approach when handling increasing tasks. In response to these challenges, we reformulate the prompting approach for continual learning and propose the prompt customization (PC) method. PC mainly comprises a prompt generation module (PGM) and a prompt modulation module (PMM). In contrast to conventional methods that employ hard prompt selection, PGM assigns different coefficients to prompts from a fixed-sized pool of prompts and generates tailored prompts. Moreover, PMM further modulates the prompts by adaptively assigning weights according to the correlations between input data and corresponding prompts. We evaluate our method on four benchmark datasets for three diverse settings, including the class, domain, and task-agnostic incremental learning tasks. Experimental results demonstrate consistent improvement (by up to 16.2\%), yielded by the proposed method, over the state-of-the-art (SOTA) techniques. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: ACM MM

arXiv:2404.01174 [pdf, other]

SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding

Authors: Wenrui Li, Xiaopeng Hong, Ruiqin Xiong, Xiaopeng Fan

Abstract: Temporal video grounding (TVG) is a critical task in video content understanding, requiring precise alignment between video content and natural language instructions. Despite significant advancements, existing methods face challenges in managing confidence bias towards salient objects and capturing long-term dependencies in video sequences. To address these issues, we introduce SpikeMba: a multi-m… ▽ More Temporal video grounding (TVG) is a critical task in video content understanding, requiring precise alignment between video content and natural language instructions. Despite significant advancements, existing methods face challenges in managing confidence bias towards salient objects and capturing long-term dependencies in video sequences. To address these issues, we introduce SpikeMba: a multi-modal spiking saliency mamba for temporal video grounding. Our approach integrates Spiking Neural Networks (SNNs) with state space models (SSMs) to leverage their unique advantages in handling different aspects of the task. Specifically, we use SNNs to develop a spiking saliency detector that generates the proposal set. The detector emits spike signals when the input signal exceeds a predefined threshold, resulting in a dynamic and binary saliency proposal set. To enhance the model's capability to retain and infer contextual information, we introduce relevant slots which learnable tensors that encode prior knowledge. These slots work with the contextual moment reasoner to maintain a balance between preserving contextual information and exploring semantic relevance dynamically. The SSMs facilitate selective information propagation, addressing the challenge of long-term dependency in video content. By combining SNNs for proposal generation and SSMs for effective contextual reasoning, SpikeMba addresses confidence bias and long-term dependencies, thereby significantly enhancing fine-grained multimodal relationship capture. Our experiments demonstrate the effectiveness of SpikeMba, which consistently outperforms state-of-the-art methods across mainstream benchmarks. △ Less

Submitted 23 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00989 [pdf, other]

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Authors: Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

Abstract: Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentri… ▽ More Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. Figure 1 offers a glimpse of all 28 scene categories of our 360+x dataset. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives. △ Less

Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: CVPR 2024 (Oral Presentation), Project page: https://x360dataset.github.io/

Journal ref: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024

arXiv:2403.20009 [pdf, other]

On Large Language Models' Hallucination with Regard to Known Facts

Authors: Che Jiang, Biqing Qi, Xiangyu Hong, Dayuan Fu, Yang Cheng, Fandong Meng, Mo Yu, Bowen Zhou, Jie Zhou

Abstract: Large language models are successful in answering factoid questions but are also prone to hallucination.We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations.We are able to conduct this analysis via two key ideas.First, we identify the factual question… ▽ More Large language models are successful in answering factoid questions but are also prone to hallucination.We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations.We are able to conduct this analysis via two key ideas.First, we identify the factual questions that query the same triplet knowledge but result in different answers. The difference between the model behaviors on the correct and incorrect outputs hence suggests the patterns when hallucinations happen. Second, to measure the pattern, we utilize mappings from the residual streams to vocabulary space. We reveal the different dynamics of the output token probabilities along the depths of layers between the correct and hallucinated cases. In hallucinated cases, the output token's information rarely demonstrates abrupt increases and consistent superiority in the later stages of the model. Leveraging the dynamic curve as a feature, we build a classifier capable of accurately detecting hallucinatory predictions with an 88\% success rate. Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted by NAACL 2024 MainConference

arXiv:2403.19952 [pdf, ps, other]

Theoretical investigation on the optical absorption spectra in cyclo[n]carbons (n=10, 14, 18)

Authors: Xuhai Hong, Lang Su, Jie Li

Abstract: The optical absorption spectra of cyclo[n]carbons (n=10, 14, 18) are investigated in the framework of time-dependent density functional theory. The collective plasmon excitations well develop as the increases of the ring size and the symmetry group of cyclo[n]carbons. An increase in intensity for the main peaks with the growing number of atoms in cyclo[n]carbons is observed. With the increase of t… ▽ More The optical absorption spectra of cyclo[n]carbons (n=10, 14, 18) are investigated in the framework of time-dependent density functional theory. The collective plasmon excitations well develop as the increases of the ring size and the symmetry group of cyclo[n]carbons. An increase in intensity for the main peaks with the growing number of atoms in cyclo[n]carbons is observed. With the increase of the radius of the monocyclic ring, as more electrons participate in the dipole oscillation the main excitation peaks are red-shifted to the lower energy. The highly symmetrical structures of cyclo[n]carbons (D_{nh}) possess degenerate levels, leading to simpler spectra with fewer peaks. The Fourier transform of the induced electron density of the cyclo[n]carbons (n=10, 14, 18) is investigated at the excitation frequencies. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.12965 [pdf, other]

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment

Authors: Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao

Abstract: This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and… ▽ More This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate the wearing style. To achieve this goal, we first construct a strong pipeline for standard virtual try-on, supporting single/multiple garment try-on and model-to-model settings in complicated scenarios. To make it manipulable, we propose sparse correspondence alignment which involves point-based control to guide the generation for specific locations. With this design, Wear-Any-Way gets state-of-the-art performance for the standard setting and provides a novel interaction form for customizing the wearing style. For instance, it supports users to drag the sleeve to make it rolled up, drag the coat to make it open, and utilize clicks to control the style of tuck, etc. Wear-Any-Way enables more liberated and flexible expressions of the attires, holding profound implications in the fashion industry. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Project Page: https://mengtingchen.github.io/wear-any-way-page/

arXiv:2402.15297 [pdf, other]

Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling

Authors: Hui Lin, Zhiheng Ma, Rongrong Ji, Yaowei Wang, Zhou Su, Xiaopeng Hong, Deyu Meng

Abstract: This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd-counting model. Firstly, we design a pixel-wise distribution matching loss to measure the differences in the p… ▽ More This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd-counting model. Firstly, we design a pixel-wise distribution matching loss to measure the differences in the pixel-wise density distributions between the prediction and the ground truth; Secondly, we enhance the transformer decoder by using density tokens to specialize the forwards of decoders w.r.t. different density intervals; Thirdly, we design the interleaving consistency self-supervised learning mechanism to learn from unlabeled data efficiently. Extensive experiments on four datasets are performed to show that our method clearly outperforms the competitors by a large margin under various labeled ratio settings. Code will be released at https://github.com/LoraLinH/Semi-supervised-Counting-via-Pixel-by-pixel-Density-Distribution-Modelling. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: This is the technical report of a paper that was submitted to IEEE Transactions and is now under review

arXiv:2401.17132 [pdf, other]

Evolution of magnetic field of the Quasar 1604+159 at pc scale

Authors: Xu-Zhi Hu, Xiaoyu Hong, Wei Zhao, Liang Chen, Wei-Yang Wang, Linhui Wu

Abstract: We have analyzed the total intensity, spectral index, linear polarization, and RM distributions at pc scale for the quasar 1604+159. The source was observed in 2002 and 2020 with the VLBA. Combining the MOJAVE results, we studied the evolution of the magnetic field. We detected a core-jet structure. The jet extends to a distance of ~25 mas. The jet shape varies slightly with time. We divided the s… ▽ More We have analyzed the total intensity, spectral index, linear polarization, and RM distributions at pc scale for the quasar 1604+159. The source was observed in 2002 and 2020 with the VLBA. Combining the MOJAVE results, we studied the evolution of the magnetic field. We detected a core-jet structure. The jet extends to a distance of ~25 mas. The jet shape varies slightly with time. We divided the source structure into the central region and the jet region. In the jet region, we find the polarized emission varies with time. The flatter spectral index values and EVPA direction indicate the possible existence of shocks, contributing to the variation. In the central region, the derived core shift index k_r values indicate that the core in 2002 is close to the equipartition case while deviating from it in 2020. The measured magnetic field strength in 2020 is two orders of magnitude lower than that in 2002. We detected transverse RM gradients, evidence of a helical magnetic field, in the core. At 15 GHz, in the place close to the jet base, the polarization direction changes significantly with time from perpendicular to parallel to the jet direction. The evolution of RM and magnetic field structure are potential reasons for the observed polarization change. The core |RM| in 2020 increases with frequency following a power law with index a = 2.7, suggesting a fast electron density fall-off in the medium with distance from the jet base. △ Less

Submitted 1 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 24 pages, 14 figures, accepted for publication in ApJ

arXiv:2401.12164 [pdf, other]

Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNE

Authors: Hong Wei, James Xiao, Yichao Zhang, Xia Hong

Abstract: Image segmentation is a clustering task whereby each pixel is assigned a cluster label. Remote sensing data usually consists of multiple bands of spectral images in which there exist semantically meaningful land cover subregions, co-registered with other source data such as LIDAR (LIght Detection And Ranging) data, where available. This suggests that, in order to account for spatial correlation be… ▽ More Image segmentation is a clustering task whereby each pixel is assigned a cluster label. Remote sensing data usually consists of multiple bands of spectral images in which there exist semantically meaningful land cover subregions, co-registered with other source data such as LIDAR (LIght Detection And Ranging) data, where available. This suggests that, in order to account for spatial correlation between pixels, a feature vector associated with each pixel may be a vectorized tensor representing the multiple bands and a local patch as appropriate. Similarly, multiple types of texture features based on a pixel's local patch would also be beneficial for encoding locally statistical information and spatial variations, without necessarily labelling pixel-wise a large amount of ground truth, then training a supervised model, which is sometimes impractical. In this work, by resorting to label only a small quantity of pixels, a new semi-supervised segmentation approach is proposed. Initially, over all pixels, an image data matrix is created in high dimensional feature space. Then, t-SNE projects the high dimensional data onto 3D embedding. By using radial basis functions as input features, which use the labelled data samples as centres, to pair with the output class labels, a modified canonical correlation analysis algorithm, referred to as RBF-CCA, is introduced which learns the associated projection matrix via the small labelled data set. The associated canonical variables, obtained for the full image, are applied by k-means clustering algorithm. The proposed semi-supervised RBF-CCA algorithm has been implemented on several remotely sensed multispectral images, demonstrating excellent segmentation results. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.08490 [pdf, other]

doi 10.1103/PhysRevB.109.235132

Multimechanism quantum anomalous Hall and Chern number tunable states in germanene (silicene, stanene)/$M$Bi$_2$Te$_4$ heterostructures

Authors: Zhe Li, Jiatong Zhang, Xiyu Hong, Xiao Feng, Ke He

Abstract: By constructing germanene (silicene, stanene)/$M$Bi$_2$Te$_4$ ($M$ = 3d-transition elements) heterostructures, we discovered and designed multimechanism quantum-anomalous-Hall (QAH) systems, including $Γ$-based QAH, $K$-$K'$-connected QAH, and valley-polarized $K$- or $K'$-based QAH states via first-principle computations. The unique systems possess a global gap and tunable Chern number. The coexi… ▽ More By constructing germanene (silicene, stanene)/$M$Bi$_2$Te$_4$ ($M$ = 3d-transition elements) heterostructures, we discovered and designed multimechanism quantum-anomalous-Hall (QAH) systems, including $Γ$-based QAH, $K$-$K'$-connected QAH, and valley-polarized $K$- or $K'$-based QAH states via first-principle computations. The unique systems possess a global gap and tunable Chern number. The coexisting conventional $Γ$-based QAH state of $M$Bi$_2$Te$_4$ and valley-polarized $K$($K'$)-based QAH state of germanene (silicene, stanene), with opposite chirality, can interact with each other. Adjusting magnetic configurations of $M$Bi$_2$Te$_4$-layers not only switch on (off) the QAH conductance, but also modulate Chern numbers exactly. For example, the germanene/bilayer-NiBi$_2$Te$_4$ possesses the Chern number $C = +1$ in ferromagnetic couplings and $C = +2$ in antiferromagnetic couplings. The novel multimechanism QAH insulators, which are achievable in experiments, provide a new approach to spintronics and valleytronics based on topological states of matter. △ Less

Submitted 17 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 9 pages, 4 figures

Journal ref: Phys. Rev. B 109, 235132(2024)

arXiv:2401.03870 [pdf, other]

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Authors: Hui Lin, Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan, Deyu Meng

Abstract: Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gra… ▽ More Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gramformer: a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively on the basis of two different types of graphs. Firstly, an attention graph is proposed to diverse attention maps to attend to complementary information. The graph is building upon the dissimilarities between patches, modulating the attention in an anti-similarity fashion. Secondly, a feature-based centrality encoding is proposed to discover the centrality positions or importance of nodes. We encode them with a proposed centrality indices scheme to modulate the node features and similarity relationships. Extensive experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method. Code is available at {https://github.com/LoraLinH/Gramformer}. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: This is the accepted version of the paper and supplemental material to appear in AAAI 2024. Please cite the final published version. Code is available at {https://github.com/LoraLinH/Gramformer}

arXiv:2401.02335 [pdf, other]

Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection

Authors: Yabin Wang, Zhiwu Huang, Zhiheng Ma, Xiaopeng Hong

Abstract: The emergence of text-to-image generative models has revolutionized the field of deepfakes, enabling the creation of realistic and convincing visual content directly from textual descriptions. However, this advancement presents considerably greater challenges in detecting the authenticity of such content. Existing deepfake detection datasets and methods often fall short in effectively capturing th… ▽ More The emergence of text-to-image generative models has revolutionized the field of deepfakes, enabling the creation of realistic and convincing visual content directly from textual descriptions. However, this advancement presents considerably greater challenges in detecting the authenticity of such content. Existing deepfake detection datasets and methods often fall short in effectively capturing the extensive range of emerging deepfakes and offering satisfactory explanatory information for detection. To address the significant issue, this paper introduces a deepfake database (DFLIP-3K) for the development of convincing and explainable deepfake detection. It encompasses about 300K diverse deepfake samples from approximately 3K generative models, which boasts the largest number of deepfake models in the literature. Moreover, it collects around 190K linguistic footprints of these deepfakes. The two distinguished features enable DFLIP-3K to develop a benchmark that promotes progress in linguistic profiling of deepfakes, which includes three sub-tasks namely deepfake detection, model identification, and prompt prediction. The deepfake model and prompt are two essential components of each deepfake, and thus dissecting them linguistically allows for an invaluable exploration of trustworthy and interpretable evidence in deepfake detection, which we believe is the key for the next-generation deepfake detection. Furthermore, DFLIP-3K is envisioned as an open database that fosters transparency and encourages collaborative efforts to further enhance its growth. Our extensive experiments on the developed benchmark verify that our DFLIP-3K database is capable of serving as a standardized resource for evaluating and comparing linguistic-based deepfake detection, identification, and prompt prediction techniques. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.14792 [pdf, ps, other]

The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs

Authors: Junli Fang, João F. C. Mota, Baoshan Lu, Weicheng Zhang, Xuemin Hong

Abstract: The joint source-channel coding (JSCC) framework leverages deep learning to learn from data the best codes for source and channel coding. When the output signal, rather than being binary, is directly mapped onto the IQ domain (complex-valued), we call the resulting framework joint source coding and modulation (JSCM). We consider a JSCM scenario and show the existence of a strict tradeoff between c… ▽ More The joint source-channel coding (JSCC) framework leverages deep learning to learn from data the best codes for source and channel coding. When the output signal, rather than being binary, is directly mapped onto the IQ domain (complex-valued), we call the resulting framework joint source coding and modulation (JSCM). We consider a JSCM scenario and show the existence of a strict tradeoff between channel rate, distortion, perception, and classification accuracy, a tradeoff that we name RDPC. We then propose two image compression methods to navigate that tradeoff: the RDPCO algorithm which, under simple assumptions, directly solves the optimization problem characterizing the tradeoff, and an algorithm based on an inverse-domain generative adversarial network (ID-GAN), which is more general and achieves extreme compression. Simulation results corroborate the theoretical findings, showing that both algorithms exhibit the RDPC tradeoff. They also demonstrate that the proposed ID-GAN algorithm effectively balances image distortion, perception, and classification accuracy, and significantly outperforms traditional separation-based methods and recent deep JSCM architectures in terms of one or more of these metrics. △ Less

Submitted 6 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Paper accepted in IEEE Transactions on Signal Processing

arXiv:2312.07867 [pdf, other]

BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering

Authors: Xiaojie Hong, Zixin Song, Liangzhi Li, Xiaoli Wang, Feiyan Liu

Abstract: Medical Visual Question Answering (Med-VQA) is a very important task in healthcare industry, which answers a natural language question with a medical image. Existing VQA techniques in information systems can be directly applied to solving the task. However, they often suffer from (i) the data insufficient problem, which makes it difficult to train the state of the arts (SOTAs) for the domain-speci… ▽ More Medical Visual Question Answering (Med-VQA) is a very important task in healthcare industry, which answers a natural language question with a medical image. Existing VQA techniques in information systems can be directly applied to solving the task. However, they often suffer from (i) the data insufficient problem, which makes it difficult to train the state of the arts (SOTAs) for the domain-specific task, and (ii) the reproducibility problem, that many existing models have not been thoroughly evaluated in a unified experimental setup. To address these issues, this paper develops a Benchmark Evaluation SysTem for Medical Visual Question Answering, denoted by BESTMVQA. Given self-collected clinical data, our system provides a useful tool for users to automatically build Med-VQA datasets, which helps overcoming the data insufficient problem. Users also can conveniently select a wide spectrum of SOTA models from our model library to perform a comprehensive empirical study. With simple configurations, our system automatically trains and evaluates the selected models over a benchmark dataset, and reports the comprehensive results for users to develop new techniques or perform medical practice. Limitations of existing work are overcome (i) by the data generation tool, which automatically constructs new datasets from unstructured clinical data, and (ii) by evaluating SOTAs on benchmark datasets in a unified experimental setup. The demonstration video of our system can be found at https://youtu.be/QkEeFlu1x4A. Our code and data will be available soon. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.12221 [pdf, other]

Spinon heat transport in the three-dimensional quantum magnet PbCuTe$_2$O$_6$

Authors: Xiaochen Hong, Matthias Gillig, Abanoub R. N. Hanna, Shravani Chillal, A. T. M. Nazmul Islam, Bella Lake, Bernd Büchner, Christian Hess

Abstract: Quantum spin liquids (QSL) are novel phases of matter which remain quantum disordered even at the lowest temperature. They are characterized by emergent gauge fields and fractionalized quasiparticles. Here we show that the sub-Kelvin thermal transport of the three-dimensional $S=1/2$ hyper-hyperkagome quantum magnet PbCuTe$_2$O$_6$ is governed by a sizeable charge-neutral fermionic contribution wh… ▽ More Quantum spin liquids (QSL) are novel phases of matter which remain quantum disordered even at the lowest temperature. They are characterized by emergent gauge fields and fractionalized quasiparticles. Here we show that the sub-Kelvin thermal transport of the three-dimensional $S=1/2$ hyper-hyperkagome quantum magnet PbCuTe$_2$O$_6$ is governed by a sizeable charge-neutral fermionic contribution which is compatible with the itinerant fractionalized excitations of a spinon Fermi surface. We demonstrate that this hallmark feature of the QSL state is remarkably robust against sample crystallinity, large magnetic field, and field-induced magnetic order, ruling out the imitation of QSL features by extrinsic effects. Our findings thus reveal the characteristic low-energy features of PbCuTe$_2$O$_6$ which qualify this compound as a true QSL material. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.07311 [pdf, other]

Do large language models and humans have similar behaviors in causal inference with script knowledge?

Authors: Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi, Vera Demberg

Abstract: Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$. In our manipulation, event $A$ is stated, negate… ▽ More Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$. In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text. We first conducted a self-paced reading experiment, which showed that humans exhibit significantly longer reading times when causal conflicts exist ($\neg A \rightarrow B$) than under logical conditions ($A \rightarrow B$). However, reading times remain similar when cause A is not explicitly mentioned, indicating that humans can easily infer event B from their script knowledge. We then tested a variety of LLMs on the same data to check to what extent the models replicate human behavior. Our experiments show that 1) only recent LLMs, like GPT-3 or Vicuna, correlate with human behavior in the $\neg A \rightarrow B$ condition. 2) Despite this correlation, all models still fail to predict that $nil \rightarrow B$ is less surprising than $\neg A \rightarrow B$, indicating that LLMs still have difficulties integrating script knowledge. Our code and collected data set are available at https://github.com/tony-hong/causal-script. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 15 pages, 3 figures

ACM Class: I.2.7; I.2.0

arXiv:2311.06126 [pdf, other]

A centi-pc-scale compact radio core in the nearby galaxy M60

Authors: Xiaofeng Li, Jun Yang, Xiaopeng Cheng, Mai Liao, Xiaoyu Hong, Liming Dou, Tianle Zhao, Zhongying Fan, Fupeng Zhang, Weirong Huang

Abstract: M60, an elliptical galaxy located 16.5~Mpc away, has an active nucleus with a very low luminosity and an extremely low accretion rate. Its central supermassive black hole has a mass of $M_{\rm BH}\sim4.5\times10^{9}\, M_{\odot}$ and a Schwarzschild radii corresponding to $R_{\rm S}\sim5.4\,μ\mathrm{as}$. To investigate the nature of its innermost radio nucleus, data from the Very Long Baseline Arr… ▽ More M60, an elliptical galaxy located 16.5~Mpc away, has an active nucleus with a very low luminosity and an extremely low accretion rate. Its central supermassive black hole has a mass of $M_{\rm BH}\sim4.5\times10^{9}\, M_{\odot}$ and a Schwarzschild radii corresponding to $R_{\rm S}\sim5.4\,μ\mathrm{as}$. To investigate the nature of its innermost radio nucleus, data from the Very Long Baseline Array (VLBA) at 4.4 and 7.6~GHz were reduced. The VLBA images reveal a compact component with total flux densities of $\sim$20~mJy at both frequencies, a size of $\leq$0.27~mas (99.7$\%$ confidence level), about 0.022~pc ($50\,R_{\rm S}$) at 7.6~GHz, and a brightness temperature of $\geq6\times10^{9}$~K. This suggests that the observed centi-parsec-scale compact core could be attributed to a nonthermal jet base or an advection-dominated accretion flow (ADAF) with nonthermal electrons. The extremely compact structure also supports the presence of an SMBH in the center. Our results indicate that M60 is a promising target for broad-band VLBI observations at millimeter wavelengths to probe ADAF scenarios and tightly constrain the potential photon ring (about 28\,$μ$as) around its SMBH. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 15 pages, 5 figures, 3 tables, accepted for publication in Astrophysical Journal

arXiv:2310.10352 [pdf, other]

Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes

Authors: Yifei Qian, Xiaopeng Hong, Zhongliang Guo, Ognjen Arandjelović, Carl R. Donovan

Abstract: To alleviate the heavy annotation burden for training a reliable crowd counting model and thus make the model more practicable and accurate by being able to benefit from more data, this paper presents a new semi-supervised method based on the mean teacher framework. When there is a scarcity of labeled data available, the model is prone to overfit local patches. Within such contexts, the convention… ▽ More To alleviate the heavy annotation burden for training a reliable crowd counting model and thus make the model more practicable and accurate by being able to benefit from more data, this paper presents a new semi-supervised method based on the mean teacher framework. When there is a scarcity of labeled data available, the model is prone to overfit local patches. Within such contexts, the conventional approach of solely improving the accuracy of local patch predictions through unlabeled data proves inadequate. Consequently, we propose a more nuanced approach: fostering the model's intrinsic 'subitizing' capability. This ability allows the model to accurately estimate the count in regions by leveraging its understanding of the crowd scenes, mirroring the human cognitive process. To achieve this goal, we apply masking on unlabeled data, guiding the model to make predictions for these masked patches based on the holistic cues. Furthermore, to help with feature learning, herein we incorporate a fine-grained density classification task. Our method is general and applicable to most existing crowd counting methods as it doesn't have strict structural or loss constraints. In addition, we observe that the model trained with our framework exhibits a 'subitizing'-like behavior. It accurately predicts low-density regions with only a 'glance', while incorporating local details to predict high-density regions. Our method achieves the state-of-the-art performance, surpassing previous approaches by a large margin on challenging benchmarks such as ShanghaiTech A and UCF-QNRF. The code is available at: https://github.com/cha15yq/MRC-Crowd. △ Less

Submitted 20 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted by TCSVT

arXiv:2310.04900 [pdf, other]

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

Authors: Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

Abstract: Instructional videos are a common source for learning text-video or even multimodal representations by leveraging subtitles extracted with automatic speech recognition systems (ASR) from the audio signal in the videos. However, in contrast to human-annotated captions, both speech and subtitles naturally differ from the visual content of the videos and thus provide only noisy supervision. As a resu… ▽ More Instructional videos are a common source for learning text-video or even multimodal representations by leveraging subtitles extracted with automatic speech recognition systems (ASR) from the audio signal in the videos. However, in contrast to human-annotated captions, both speech and subtitles naturally differ from the visual content of the videos and thus provide only noisy supervision. As a result, large-scale annotation-free web video training data remains sub-optimal for training text-video models. In this work, we propose to leverage the capabilities of large language models (LLMs) to obtain high-quality video descriptions aligned with videos at scale. Specifically, we prompt an LLM to create plausible video captions based on ASR subtitles of instructional videos. To this end, we introduce a prompting method that is able to take into account a longer text of subtitles, allowing us to capture the contextual information beyond one single sentence. We further prompt the LLM to generate timestamps for each produced caption based on the timestamps of the subtitles and finally align the generated captions to the video temporally. In this way, we obtain human-style video captions at scale without human supervision. We apply our method to the subtitles of the HowTo100M dataset, creating a new large-scale dataset, HowToCaption. Our evaluation shows that the resulting captions not only significantly improve the performance over many different benchmark datasets for zero-shot text-video retrieval and video captioning, but also lead to a disentangling of textual narration from the audio, boosting the performance in text-video-audio tasks. △ Less

Submitted 7 September, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: https://github.com/ninatu/howtocaption

arXiv:2310.04237 [pdf]

Written and spoken corpus of real and fake social media postings about COVID-19

Authors: Ng Bee Chin, Ng Zhi Ee Nicole, Kyla Kwan, Lee Yong Han Dylann, Liu Fang, Xu Hong

Abstract: This study investigates the linguistic traits of fake news and real news. There are two parts to this study: text data and speech data. The text data for this study consisted of 6420 COVID-19 related tweets re-filtered from Patwa et al. (2021). After cleaning, the dataset contained 3049 tweets, with 2161 labeled as 'real' and 888 as 'fake'. The speech data for this study was collected from TikTok,… ▽ More This study investigates the linguistic traits of fake news and real news. There are two parts to this study: text data and speech data. The text data for this study consisted of 6420 COVID-19 related tweets re-filtered from Patwa et al. (2021). After cleaning, the dataset contained 3049 tweets, with 2161 labeled as 'real' and 888 as 'fake'. The speech data for this study was collected from TikTok, focusing on COVID-19 related videos. Research assistants fact-checked each video's content using credible sources and labeled them as 'Real', 'Fake', or 'Questionable', resulting in a dataset of 91 real entries and 109 fake entries from 200 TikTok videos with a total word count of 53,710 words. The data was analysed using the Linguistic Inquiry and Word Count (LIWC) software to detect patterns in linguistic data. The results indicate a set of linguistic features that distinguish fake news from real news in both written and speech data. This offers valuable insights into the role of language in shaping trust, social media interactions, and the propagation of fake news. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 9 pages, 3 tables

arXiv:2309.04931 [pdf]

doi 10.1103/PhysRevLett.132.056204

Transport Anisotropy in One-dimensional Graphene Superlattice in the High Kronig-Penney Potential Limit

Authors: Tianlin Li, Hanying Chen, Kun Wang, Yifei Hao, Le Zhang, Kenji Watanabe, Takashi Taniguchi, Xia Hong

Abstract: One-dimensional graphene superlattice subjected to strong Kronig-Penney (KP) potential is promising for achieving electron lensing effect, while previous studies utilizing the modulated dielectric gates can only yield a moderate, spatially dispersed potential profile. Here, we realize high KP potential modulation of graphene via nanoscale ferroelectric domain gating. Graphene transistors are fabri… ▽ More One-dimensional graphene superlattice subjected to strong Kronig-Penney (KP) potential is promising for achieving electron lensing effect, while previous studies utilizing the modulated dielectric gates can only yield a moderate, spatially dispersed potential profile. Here, we realize high KP potential modulation of graphene via nanoscale ferroelectric domain gating. Graphene transistors are fabricated on PbZr$_{0.2}$Ti$_{0.8}$O$_{3}$ back-gates patterned with periodic, 100-200 nm wide stripe domains. Due to band reconstruction, the h-BN top-gating induces satellite Dirac points in samples with current along the superlattice vector $\hat{s}$, a feature absent in samples with current perpendicular to $\hat{s}$. The satellite Dirac point position scales with the superlattice period ($L$) as $\propto L^β$, with $β= -1.18 \pm 0.06$. These results can be well explained by the high KP potential scenario, with the Fermi velocity perpendicular to $\hat{s}$ quenched to about 1% of that for pristine graphene. Our study presents a promising material platform for realizing electron supercollimation and investigating flat band phenomena. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: 12 pages, 5 figures, and Supplemental Material

Journal ref: Phys. Rev. Lett. 132, 056204 (2024)

arXiv:2309.00781 [pdf, other]

Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction

Authors: Alejandro Rodriguez Dominguez, Muhammad Shahzad, Xia Hong

Abstract: Multi-modal problems can be effectively addressed using multiple hypothesis frameworks, but integrating these frameworks into learning models poses significant challenges. This paper introduces a Structured Radial Basis Function Network (s-RBFN) as an ensemble of multiple hypothesis predictors for regression. During the training of the predictors, first the centroidal Voronoi tessellations are for… ▽ More Multi-modal problems can be effectively addressed using multiple hypothesis frameworks, but integrating these frameworks into learning models poses significant challenges. This paper introduces a Structured Radial Basis Function Network (s-RBFN) as an ensemble of multiple hypothesis predictors for regression. During the training of the predictors, first the centroidal Voronoi tessellations are formed based on their losses and the true labels, representing geometrically the set of multiple hypotheses. Then, the trained predictors are used to compute a structured dataset with their predictions, including centers and scales for the basis functions. A radial basis function network, with each basis function focused on a particular hypothesis, is subsequently trained using this structured dataset for multiple hypotheses prediction. The s-RBFN is designed to train efficiently while controlling diversity in ensemble learning parametrically. The least-squares approach for training the structured ensemble model provides a closed-form solution for multiple hypotheses and structured predictions. During the formation of the structured dataset, a parameter is employed to avoid mode collapse by controlling tessellation shapes. This parameter provides a mechanism to balance diversity and generalization performance for the s-RBFN. The empirical validation on two multivariate prediction datasets-air quality and energy appliance predictions-demonstrates the superior generalization performance and computational efficiency of the structured ensemble model compared to other models and their single-hypothesis counterparts. △ Less

Submitted 20 September, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: Acepted Paper for AI-2024 Forty-fourth SGAI International Conference on Artificial Intelligence CAMBRIDGE, ENGLAND 17-19 DECEMBER 2024

MSC Class: 28-08; 28-11; 26B25; 26C15; 46A03; 46T12; 49Q05; 51-08; 60D05; 62J02; 62H10; 62-08; 68W25; 68T07; 68T20 ACM Class: I.2.1; I.2.6; I.5.1; I.6.4; I.6.5

arXiv:2308.11066 [pdf, other]

doi 10.1109/ACCESS.2024.3446274

CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection

Authors: Songhui Yue, Xiaoyan Hong, Randy K. Smith

Abstract: The automation of High-Level Context (HLC) reasoning across intelligent systems at scale is imperative because of the unceasing accumulation of contextual data, the trend of the fusion of data from multiple sources (e.g., sensors, intelligent systems), and the intrinsic complexity and dynamism of context-based decision-making processes. To mitigate the challenges posed by these issues, we propose… ▽ More The automation of High-Level Context (HLC) reasoning across intelligent systems at scale is imperative because of the unceasing accumulation of contextual data, the trend of the fusion of data from multiple sources (e.g., sensors, intelligent systems), and the intrinsic complexity and dynamism of context-based decision-making processes. To mitigate the challenges posed by these issues, we propose a novel Hierarchical Ontology-State Modeling (HOSM) framework CSM-H-R, which programmatically combines ontologies and states at the modeling phase and runtime phase for attaining the ability to recognize meaningful HLC. It builds on the model of our prior work on the Context State Machine (CSM) engine by incorporating the H (Hierarchy) and R (Relationship and tRansition) dimensions to take care of the dynamic aspects of context. The design of the framework supports the sharing and interoperation of context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition. Case studies are developed for IntellElevator and IntellRestaurant, two intelligent applications in a smart campus setting. The prototype implementation of the framework experiments on translating the HLC reasoning into vector and matrix computing and presents the potential of using advanced probabilistic models to reach the next level of automation in integrating intelligent systems; meanwhile, privacy protection support is achieved in the application domain by anonymization through indexing and reducing information correlation. An implementation of the framework is available at https://github.com/songhui01/CSM-H-R. △ Less

Submitted 5 April, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: 13 pages, 10 figures, Keywords: Automation, Context Dynamism, Context Modeling, Context Reasoning, Intelligent System, Interoperability, Privacy Protection, System Integration

arXiv:2308.02993

Plurifinely open sets and complex Monge-Ampère measures

Authors: Nguyen Xuan Hong

Abstract: The aim of the paper is to investigate the structure of plurifinely open sets. As an application, we will prove an equality on complex Monge-Ampère measures in plurifinely open sets. The aim of the paper is to investigate the structure of plurifinely open sets. As an application, we will prove an equality on complex Monge-Ampère measures in plurifinely open sets. △ Less

Submitted 13 September, 2023; v1 submitted 5 August, 2023; originally announced August 2023.

Comments: There is an error in my article. Associate Professor Do Hoang Son pointed it out. I would like to thank Associate Professor Do Hoang Son. I would also like to thank Professor Mohamed for his comments

arXiv:2308.00440 [pdf, other]

Decision Diagrams for Symbolic Verification of Quantum Circuits

Authors: Xin Hong, Wei-Jia Huang, Wei-Chen Chien, Yuan Feng, Min-Hsiu Hsieh, Sanjiang Li, Chia-Shun Yeh, Mingsheng Ying

Abstract: With the rapid development of quantum computing, automatic verification of quantum circuits becomes more and more important. While several decision diagrams (DDs) have been introduced in quantum circuit simulation and verification, none of them supports symbolic computation. Algorithmic manipulations of symbolic objects, however, have been identified as crucial, if not indispensable, for several v… ▽ More With the rapid development of quantum computing, automatic verification of quantum circuits becomes more and more important. While several decision diagrams (DDs) have been introduced in quantum circuit simulation and verification, none of them supports symbolic computation. Algorithmic manipulations of symbolic objects, however, have been identified as crucial, if not indispensable, for several verification tasks. This paper proposes the first decision-diagram approach for operating symbolic objects and verifying quantum circuits with symbolic terms. As a notable example, our symbolic tensor decision diagrams (symbolic TDD) could verify the functionality of the 160-qubit quantum Fourier transform circuit within three minutes. Moreover, as demonstrated on Bernstein-Vazirani algorithm, Grover's algorithm, and the bit-flip error correction code, the symbolic TDD enables efficient verification of quantum circuits with user-supplied oracles and/or classical controls. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.15899 [pdf, other]

Exponential DG methods for Vlasov equations

Authors: Nicolas Crouseilles, Xue Hong

Abstract: In this work, an exponential Discontinuous Galerkin (DG) method is proposed to solve numerically Vlasov type equations. The DG method is used for space discretization which is combined exponential Lawson Runge-Kutta method for time discretization to get high order accuracy in time and space. In addition to get high order accuracy in time, the use of Lawson methods enables to overcome the stringent… ▽ More In this work, an exponential Discontinuous Galerkin (DG) method is proposed to solve numerically Vlasov type equations. The DG method is used for space discretization which is combined exponential Lawson Runge-Kutta method for time discretization to get high order accuracy in time and space. In addition to get high order accuracy in time, the use of Lawson methods enables to overcome the stringent condition on the time step induced by the linear part of the system. Moreover, it can be proved that a discrete Poisson equation is preserved. Numerical results on Vlasov-Poisson and Vlasov Maxwell equations are presented to illustrate the good behavior of the exponential DG method. △ Less

Submitted 29 July, 2023; originally announced July 2023.

arXiv:2306.16963 [pdf, other]

doi 10.1038/s41535-024-00628-4

Phonon thermal transport shaped by strong spin-phonon scattering in a Kitaev material Na$_2$Co$_2$TeO$_6$

Authors: Xiaochen Hong, Matthias Gillig, Weiliang Yao, Lukas Janssen, Vilmos Kocsis, Sebastian Gass, Yuan Li, Anja U. B. Wolter, Bernd Büchner, Christian Hess

Abstract: The recent report of a half-quantized thermal Hall effect in the Kitaev material $α$-RuCl$_3$ has sparked a strong debate on whether it is generated by Majorana fermion edge currents or whether other more conventional mechanisms involving magnons or phonons are at its origin. A more direct evidence for Majorana fermions which could be expected to arise from a contribution to the longitudinal heat… ▽ More The recent report of a half-quantized thermal Hall effect in the Kitaev material $α$-RuCl$_3$ has sparked a strong debate on whether it is generated by Majorana fermion edge currents or whether other more conventional mechanisms involving magnons or phonons are at its origin. A more direct evidence for Majorana fermions which could be expected to arise from a contribution to the longitudinal heat conductivity $κ_{xx}$ at $T\rightarrow0$ is elusive due to a very complex magnetic field dependence of $κ_{xx}$. Here, we report very low temperature (below 1~K) thermal conductivity ($κ$) of another candidate Kitaev material, Na$_2$Co$_2$TeO$_6$. The application of a magnetic field along different principal axes of the crystal reveals a strong directional-dependent magnetic-field ($\bf B$) impact on $κ$. We show that no evidence for mobile quasiparticles except phonons can be concluded at any field from 0~T to the field polarized state. In particular, severely scattered phonon transport is observed across the $B-T$ phase diagram, which is attributed to prominent magnetic fluctuations. Cascades of phase transitions are uncovered for all $\bf B$ directions by probing the strength of magnetic fluctuations via a precise record of $κ$($B$). Our results thus rule out recent proposals for itinerant magnetic excitations in Na$_2$Co$_2$TeO$_6$, and emphasise the importance of discriminating true spin liquid transport properties from scattered phonons in candidate materials. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Journal ref: npj Quantum Mater. 9, 18 (2024)

arXiv:2305.12525 [pdf, other]

Unveiling the small-scale jets in the rapidly growing supermassive black hole IZw1

Authors: Xiaolong Yang, Su Yao, Luigi C. Gallo, Jun Yang, Luis C. Ho, Minfeng Gu, Willem A. Baan, Jiri Svoboda, Ran Wang, Xiang Liu, Xiaoyu Hong, Xue-Bing Wu, Wei Zhao

Abstract: Accretion of black holes at near-Eddington or super-Eddington rates is the most powerful episode that drives black hole growth, and it may work in several types of objects. However, the physics of accretion and jet-disc coupling in such a state remains unclear, mainly because the associated jets are not easily detectable due to the extremely weak emission or possibly episodic nature of the jets. O… ▽ More Accretion of black holes at near-Eddington or super-Eddington rates is the most powerful episode that drives black hole growth, and it may work in several types of objects. However, the physics of accretion and jet-disc coupling in such a state remains unclear, mainly because the associated jets are not easily detectable due to the extremely weak emission or possibly episodic nature of the jets. Only a few near/super-Eddington systems have demonstrated radio activity, and it remains unclear whether there is a jet and what are their properties, in super-Eddington active galactic nuclei (AGNs) (and ultraluminous X-ray sources). The deficit is mainly due to the complex radio mixing between the origins of jets and others, such as star formation activity, photo-ionized gas, accretion disk wind, and coronal activity. In this work, we conducted high-resolution very long baseline interferometry (VLBI) observations to explore the jets in the highly accreting narrow-line Seyfert I system IZw1. Our observations successfully revealed small-scale jets (with a linear size of $\sim45$ parsec) at both 1.5 and 5 GHz, based on the high radio brightness temperature, radio morphology, and spectral index distribution. Interestingly, the lack of a flat-spectrum radio core and knotty jet structures imply episodic ejections in IZw1, which resemble the ejection process in Galactic X-ray binaries that are in the canonical very high state. The high accretion rates and jet properties in the AGN IZw1 may support the AGN/XRB analogy in the extreme state. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 19 pages, 8 figures and 4 tables, submitted to ApJ. 2nd round referee report received. comments welcome

arXiv:2305.01928 [pdf, other]

Visual Transformation Telling

Authors: Wanqing Cui, Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng

Abstract: Humans can naturally reason from superficial state differences (e.g. ground wetness) to transformations descriptions (e.g. raining) according to their life experience. In this paper, we propose a new visual reasoning task to test this transformation reasoning ability in real-world scenarios, called \textbf{V}isual \textbf{T}ransformation \textbf{T}elling (VTT). Given a series of states (i.e. image… ▽ More Humans can naturally reason from superficial state differences (e.g. ground wetness) to transformations descriptions (e.g. raining) according to their life experience. In this paper, we propose a new visual reasoning task to test this transformation reasoning ability in real-world scenarios, called \textbf{V}isual \textbf{T}ransformation \textbf{T}elling (VTT). Given a series of states (i.e. images), VTT requires to describe the transformation occurring between every two adjacent states. Different from existing visual reasoning tasks that focus on surface state reasoning, the advantage of VTT is that it captures the underlying causes, e.g. actions or events, behind the differences among states. We collect a novel dataset to support the study of transformation reasoning from two existing instructional video datasets, CrossTask and COIN, comprising 13,547 samples. Each sample involves the key state images along with their transformation descriptions. Our dataset covers diverse real-world activities, providing a rich resource for training and evaluation. To construct an initial benchmark for VTT, we test several models, including traditional visual storytelling methods (CST, GLACNet, Densecap) and advanced multimodal large language models (LLaVA v1.5-7B, Qwen-VL-chat, Gemini Pro Vision, GPT-4o, and GPT-4). Experimental results reveal that even state-of-the-art models still face challenges in VTT, highlighting substantial areas for improvement. △ Less

Submitted 11 June, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

Showing 1–50 of 297 results for author: Hong, X