subscribe to arXiv mailings

Streaming and Communication Complexity of Load-Balancing via Matching Contractors

Authors: Sepehr Assadi, Aaron Bernstein, Zachary Langley, Lap Chi Lau, Robert Wang

Abstract: In the load-balancing problem, we have an $n$-vertex bipartite graph $G=(L, R, E)$ between a set of clients and servers. The goal is to find an assignment of all clients to the servers, while minimizing the maximum load on each server, where load of a server is the number of clients assigned to it. We study load-balancing in the one-way communication model: the edges of the input graph are partiti… ▽ More In the load-balancing problem, we have an $n$-vertex bipartite graph $G=(L, R, E)$ between a set of clients and servers. The goal is to find an assignment of all clients to the servers, while minimizing the maximum load on each server, where load of a server is the number of clients assigned to it. We study load-balancing in the one-way communication model: the edges of the input graph are partitioned between Alice and Bob, and Alice needs to send a message to Bob for him to output the solution. We show that settling the one-way communication complexity of load-balancing is equivalent to a natural sparsification problem for load-balancing. We then prove a dual interpretation of this sparsifier, showing that the minimum density of a sparsifier is effectively the same as the maximum density one can achieve for an extremal graph family that is new to this paper, called Matching-Contractors; these graphs are intimately connected to the well-known Ruzsa-Szemeredi graphs and generalize them in certain aspects. Our chain of equivalences thus shows that the one-way communication complexity of load-balancing can be reduced to a purely graph theoretic question: what is the maximum density of a Matching-Contractor on $n$ vertices? Finally, we present a novel combinatorial construction of some-what dense Matching-Contractors, which implies a strong one-way communication lower bound for load-balancing: any one-way protocol (even randomized) with $\tilde{O}(n)$ communication cannot achieve a better than $n^{\frac14-o(1)}$-approximation. Previously, no non-trivial lower bounds were known for protocols with even $O(n\log{n})$ bits of communication. Our result also implies the first non-trivial lower bounds for semi-streaming load-balancing in the edge-arrival model, ruling out $n^{\frac14-o(1)}$-approximation in a single-pass. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: In SODA 2025

arXiv:2410.15907 [pdf, other]

Seismic Phase Picking

Authors: Yuchen Wang, Ruihuan Wang

Abstract: Seismic phase picking, which aims to determine the arrival time of P- and S-waves according to seismic waveforms, is fundamental to earthquake monitoring. Generally, manual phase picking is trustworthy, but with the increasing number of worldwide stations and seismic monitors, it becomes more challenging for human to complete the task comprehensively. In this work, we explore multiple ways to do a… ▽ More Seismic phase picking, which aims to determine the arrival time of P- and S-waves according to seismic waveforms, is fundamental to earthquake monitoring. Generally, manual phase picking is trustworthy, but with the increasing number of worldwide stations and seismic monitors, it becomes more challenging for human to complete the task comprehensively. In this work, we explore multiple ways to do automatic phase picking, including traditional and learning-based methods. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.15789 [pdf, other]

High-resolution Observations of Clustered Dynamic Extreme-Ultraviolet Bright Tadpoles near the Footpoints of Corona Loops

Authors: Rui Wang, Ying D. Liu, L. P. Chitta, Huidong Hu, Xiaowei Zhao

Abstract: An extreme ultraviolet (EUV) close-up view of the Sun offers unprecedented detail of heating events in the solar corona. Enhanced temporal and spatial images obtained by the Solar Orbiter during its first science perihelion enabled us to identify clustered EUV bright tadpoles (CEBTs) occurring near the footpoints of coronal loops. Combining SDO/AIA observations, we determine the altitudes of six d… ▽ More An extreme ultraviolet (EUV) close-up view of the Sun offers unprecedented detail of heating events in the solar corona. Enhanced temporal and spatial images obtained by the Solar Orbiter during its first science perihelion enabled us to identify clustered EUV bright tadpoles (CEBTs) occurring near the footpoints of coronal loops. Combining SDO/AIA observations, we determine the altitudes of six distinct CEBTs by stereoscopy, ranging from $\sim$1300 to 3300 km. We then notice a substantial presence of dark, cooler filamentary structures seemingly beneath the CEBTs, displaying periodic up-and-down motions lasting 3 to 5 minutes. This periodic behavior suggests an association of the majority of CEBTs with Type I spicules. Out of the ten selected CEBTs with fast downward velocity, six exhibit corrected velocities close to or exceeding 50 km $s^{-1}$. These velocities notably surpass the typical speeds of Type I spicules. We explore the generation of such velocities. It indicates that due to the previous limited observations of spicules in the EUV wavelengths, they may reveal novel observational features beyond our current understanding. Gaining insights into these features contributes to a better comprehension of small-scale coronal heating dynamics. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: 17 pages, 10 figures, accepted for publication in RAA

arXiv:2410.15657 [pdf, other]

CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models

Authors: Jianjun Gao, Chen Cai, Ruoyu Wang, Wenyang Liu, Kim-Hui Yap, Kratika Garg, Boon-Siew Han

Abstract: Human-object interaction (HOI) detection has seen advancements with Vision Language Models (VLMs), but these methods often depend on extensive manual annotations. Vision Large Language Models (VLLMs) can inherently recognize and reason about interactions at the image level but are computationally heavy and not designed for instance-level HOI detection. To overcome these limitations, we propose a C… ▽ More Human-object interaction (HOI) detection has seen advancements with Vision Language Models (VLMs), but these methods often depend on extensive manual annotations. Vision Large Language Models (VLLMs) can inherently recognize and reason about interactions at the image level but are computationally heavy and not designed for instance-level HOI detection. To overcome these limitations, we propose a Cross-Level HOI distillation (CL-HOI) framework, which distills instance-level HOIs from VLLMs image-level understanding without the need for manual annotations. Our approach involves two stages: context distillation, where a Visual Linguistic Translator (VLT) converts visual information into linguistic form, and interaction distillation, where an Interaction Cognition Network (ICN) reasons about spatial, visual, and context relations. We design contrastive distillation losses to transfer image-level context and interaction knowledge from the teacher to the student model, enabling instance-level HOI detection. Evaluations on HICO-DET and V-COCO datasets demonstrate that our CL-HOI surpasses existing weakly supervised methods and VLLM supervised methods, showing its efficacy in detecting HOIs without manual labels. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.15026 [pdf]

A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining

Authors: Wenyi Liu, Rui Wang, Yuanshuai Luo, Jianjun Wei, Zihao Zhao, Junming Huang

Abstract: With the explosive growth of Internet data, users are facing the problem of information overload, which makes it a challenge to efficiently obtain the required resources. Recommendation systems have emerged in this context. By filtering massive amounts of information, they provide users with content that meets their needs, playing a key role in scenarios such as advertising recommendation and prod… ▽ More With the explosive growth of Internet data, users are facing the problem of information overload, which makes it a challenge to efficiently obtain the required resources. Recommendation systems have emerged in this context. By filtering massive amounts of information, they provide users with content that meets their needs, playing a key role in scenarios such as advertising recommendation and product recommendation. However, traditional click-through rate prediction and TOP-K recommendation mechanisms are gradually unable to meet the recommendations needs in modern life scenarios due to high computational complexity, large memory consumption, long feature selection time, and insufficient feature interaction. This paper proposes a recommendations system model based on a separation embedding cross-network. The model uses an embedding neural network layer to transform sparse feature vectors into dense embedding vectors, and can independently perform feature cross operations on different dimensions, thereby improving the accuracy and depth of feature mining. Experimental results show that the model shows stronger adaptability and higher prediction accuracy in processing complex data sets, effectively solving the problems existing in existing models. △ Less

Submitted 19 October, 2024; originally announced October 2024.

arXiv:2410.14200 [pdf, other]

E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model

Authors: Haoran Lai, Zihang Jiang, Qingsong Yao, Rongsheng Wang, Zhiyang He, Xiaodong Tao, Wei Wei, Weifu Lv, S. Kevin Zhou

Abstract: The development of 3D medical vision-language models holds significant potential for disease diagnosis and patient treatment. However, compared to 2D medical images, 3D medical images, such as CT scans, face challenges related to limited training data and high dimension, which severely restrict the progress of 3D medical vision-language models. To address these issues, we collect a large amount of… ▽ More The development of 3D medical vision-language models holds significant potential for disease diagnosis and patient treatment. However, compared to 2D medical images, 3D medical images, such as CT scans, face challenges related to limited training data and high dimension, which severely restrict the progress of 3D medical vision-language models. To address these issues, we collect a large amount of unlabeled 3D CT data and utilize self-supervised learning to construct a 3D visual foundation model for extracting 3D visual features. Then, we apply 3D spatial convolutions to aggregate and project high-level image features, reducing computational complexity while preserving spatial information. We also construct two instruction-tuning datasets based on BIMCV-R and CT-RATE to fine-tune the 3D vision-language model. Our model demonstrates superior performance compared to existing methods in report generation, visual question answering, and disease diagnosis. Code and data will be made publicly available soon. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.14165 [pdf]

Automated Genre-Aware Article Scoring and Feedback Using Large Language Models

Authors: Chihang Wang, Yuxin Dong, Zhenhong Zhang, Ruotong Wang, Shuo Wang, Jiajing Chen

Abstract: This paper focuses on the development of an advanced intelligent article scoring system that not only assesses the overall quality of written work but also offers detailed feature-based scoring tailored to various article genres. By integrating the pre-trained BERT model with the large language model Chat-GPT, the system gains a deep understanding of both the content and structure of the text, ena… ▽ More This paper focuses on the development of an advanced intelligent article scoring system that not only assesses the overall quality of written work but also offers detailed feature-based scoring tailored to various article genres. By integrating the pre-trained BERT model with the large language model Chat-GPT, the system gains a deep understanding of both the content and structure of the text, enabling it to provide a thorough evaluation along with targeted suggestions for improvement. Experimental results demonstrate that this system outperforms traditional scoring methods across multiple public datasets, particularly in feature-based assessments, offering a more accurate reflection of the quality of different article types. Moreover, the system generates personalized feedback to assist users in enhancing their writing skills, underscoring the potential and practical value of automated scoring technologies in educational contexts. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.14145 [pdf, other]

CAPE: A Chinese Dataset for Appraisal-based Emotional Generation using Large Language Models

Authors: June M. Liu, He Cao, Renliang Sun, Rui Wang, Yu Li, Jiaxing Zhang

Abstract: Generating emotionally appropriate responses in conversations with large language models presents a significant challenge due to the complexities of human emotions and cognitive processes, which remain largely underexplored in their critical role in social interactions. In this study, we introduce a two-stage automatic data generation framework to create CAPE, a Chinese dataset named Cognitive App… ▽ More Generating emotionally appropriate responses in conversations with large language models presents a significant challenge due to the complexities of human emotions and cognitive processes, which remain largely underexplored in their critical role in social interactions. In this study, we introduce a two-stage automatic data generation framework to create CAPE, a Chinese dataset named Cognitive Appraisal theory-based Emotional corpus. This corpus facilitates the generation of dialogues with contextually appropriate emotional responses by accounting for diverse personal and situational factors. We propose two tasks utilizing this dataset: emotion prediction and next utterance prediction. Both automated and human evaluations demonstrate that agents trained on our dataset can deliver responses that are more aligned with human emotional expressions. Our study shows the potential for advancing emotional expression in conversational agents, paving the way for more nuanced and meaningful human-computer interactions. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.14099 [pdf, other]

ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility Prediction

Authors: Haoyu He, Haozheng Luo, Qi R. Wang

Abstract: Predicting human mobility across multiple cities presents significant challenges due to the complex and diverse spatial-temporal dynamics inherent in different urban environments. In this study, we propose a robust approach to predict human mobility patterns called ST-MoE-BERT. Compared to existing methods, our approach frames the prediction task as a spatial-temporal classification problem. Our m… ▽ More Predicting human mobility across multiple cities presents significant challenges due to the complex and diverse spatial-temporal dynamics inherent in different urban environments. In this study, we propose a robust approach to predict human mobility patterns called ST-MoE-BERT. Compared to existing methods, our approach frames the prediction task as a spatial-temporal classification problem. Our methodology integrates the Mixture-of-Experts architecture with BERT model to capture complex mobility dynamics and perform the downstream human mobility prediction task. Additionally, transfer learning is integrated to solve the challenge of data scarcity in cross-city prediction. We demonstrate the effectiveness of the proposed model on GEO-BLEU and DTW, comparing it to several state-of-the-art methods. Notably, ST-MoE-BERT achieves an average improvement of 8.29%. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 2nd ACM SIGSPATIAL International Workshop on the Human Mobility Prediction Challenge

arXiv:2410.13748 [pdf, other]

Test of lepton flavour universality with $B_s^0 \rightarrow φ\ell^+\ell^-$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1124 additional authors not shown)

Abstract: Lepton flavour universality in rare $b\rightarrow s$ transitions is tested for the first time using $B_s^0$ meson decays. The measurements are performed using $pp$ collision data collected by the LHCb experiment between 2011 and 2018, corresponding to a total integrated luminosity of 9$\,{\rm fb}^{-1}$. Branching fraction ratios between the $B_s^0 \rightarrow φe^+e^-$ and… ▽ More Lepton flavour universality in rare $b\rightarrow s$ transitions is tested for the first time using $B_s^0$ meson decays. The measurements are performed using $pp$ collision data collected by the LHCb experiment between 2011 and 2018, corresponding to a total integrated luminosity of 9$\,{\rm fb}^{-1}$. Branching fraction ratios between the $B_s^0 \rightarrow φe^+e^-$ and $B_s^0 \rightarrow φμ^+μ^-$ decays are measured in three regions of dilepton mass squared, $q^2$, with $0.1 < q^2 < 1.1$, $1.1 < q^2 < 6.0$, and $15 < q^2 < 19\,{\rm GeV}^2/c^4$. The results agree with the Standard Model expectation of lepton flavour universality. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3513/ (LHCb public pages)

Report number: LHCb-PAPER-2024-032, CERN-EP-2024-255

arXiv:2410.13640 [pdf, other]

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Authors: Yiming Wang, Pei Zhang, Baosong Yang, Derek F. Wong, Rui Wang

Abstract: LLM self-evaluation relies on the LLM's own ability to estimate response correctness, which can greatly improve its deployment reliability. In this research track, we propose the Chain-of-Embedding (CoE) in the latent space to enable LLMs to perform output-free self-evaluation. CoE consists of all progressive hidden states produced during the inference time, which can be treated as the latent thin… ▽ More LLM self-evaluation relies on the LLM's own ability to estimate response correctness, which can greatly improve its deployment reliability. In this research track, we propose the Chain-of-Embedding (CoE) in the latent space to enable LLMs to perform output-free self-evaluation. CoE consists of all progressive hidden states produced during the inference time, which can be treated as the latent thinking path of LLMs. We find that when LLMs respond correctly and incorrectly, their CoE features differ, these discrepancies assist us in estimating LLM response correctness. Experiments in four diverse domains and seven LLMs fully demonstrate the effectiveness of our method. Meanwhile, its label-free design intent without any training and millisecond-level computational cost ensure real-time feedback in large-scale scenarios. More importantly, we provide interesting insights into LLM response correctness from the perspective of hidden state changes inside LLMs. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 33 pages, 18 figures, 12 tables

arXiv:2410.13217 [pdf, other]

MixEHR-Nest: Identifying Subphenotypes within Electronic Health Records through Hierarchical Guided-Topic Modeling

Authors: Ruohan Wang, Zilong Wang, Ziyang Song, David Buckeridge, Yue Li

Abstract: Automatic subphenotyping from electronic health records (EHRs)provides numerous opportunities to understand diseases with unique subgroups and enhance personalized medicine for patients. However, existing machine learning algorithms either focus on specific diseases for better interpretability or produce coarse-grained phenotype topics without considering nuanced disease patterns. In this study, w… ▽ More Automatic subphenotyping from electronic health records (EHRs)provides numerous opportunities to understand diseases with unique subgroups and enhance personalized medicine for patients. However, existing machine learning algorithms either focus on specific diseases for better interpretability or produce coarse-grained phenotype topics without considering nuanced disease patterns. In this study, we propose a guided topic model, MixEHR-Nest, to infer sub-phenotype topics from thousands of disease using multi-modal EHR data. Specifically, MixEHR-Nest detects multiple subtopics from each phenotype topic, whose prior is guided by the expert-curated phenotype concepts such as Phenotype Codes (PheCodes) or Clinical Classification Software (CCS) codes. We evaluated MixEHR-Nest on two EHR datasets: (1) the MIMIC-III dataset consisting of over 38 thousand patients from intensive care unit (ICU) from Beth Israel Deaconess Medical Center (BIDMC) in Boston, USA; (2) the healthcare administrative database PopHR, comprising 1.3 million patients from Montreal, Canada. Experimental results demonstrate that MixEHR-Nest can identify subphenotypes with distinct patterns within each phenotype, which are predictive for disease progression and severity. Consequently, MixEHR-Nest distinguishes between type 1 and type 2 diabetes by inferring subphenotypes using CCS codes, which do not differentiate these two subtype concepts. Additionally, MixEHR-Nest not only improved the prediction accuracy of short-term mortality of ICU patients and initial insulin treatment in diabetic patients but also revealed the contributions of subphenotypes. For longitudinal analysis, MixEHR-Nest identified subphenotypes of distinct age prevalence under the same phenotypes, such as asthma, leukemia, epilepsy, and depression. The MixEHR-Nest software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Nest. △ Less

Submitted 17 October, 2024; originally announced October 2024.

ACM Class: J.3

arXiv:2410.12831 [pdf, other]

Segment as You Wish -- Free-Form Language-Based Segmentation for Medical Images

Authors: Longchao Da, Rui Wang, Xiaojian Xu, Parminder Bhatia, Taha Kass-Hout, Hua Wei, Cao Xiao

Abstract: Medical imaging is crucial for diagnosing a patient's health condition, and accurate segmentation of these images is essential for isolating regions of interest to ensure precise diagnosis and treatment planning. Existing methods primarily rely on bounding boxes or point-based prompts, while few have explored text-related prompts, despite clinicians often describing their observations and instruct… ▽ More Medical imaging is crucial for diagnosing a patient's health condition, and accurate segmentation of these images is essential for isolating regions of interest to ensure precise diagnosis and treatment planning. Existing methods primarily rely on bounding boxes or point-based prompts, while few have explored text-related prompts, despite clinicians often describing their observations and instructions in natural language. To address this gap, we first propose a RAG-based free-form text prompt generator, that leverages the domain corpus to generate diverse and realistic descriptions. Then, we introduce FLanS, a novel medical image segmentation model that handles various free-form text prompts, including professional anatomy-informed queries, anatomy-agnostic position-driven queries, and anatomy-agnostic size-driven queries. Additionally, our model also incorporates a symmetry-aware canonicalization module to ensure consistent, accurate segmentations across varying scan orientations and reduce confusion between the anatomical position of an organ and its appearance in the scan. FLanS is trained on a large-scale dataset of over 100k medical images from 7 public datasets. Comprehensive experiments demonstrate the model's superior language understanding and segmentation precision, along with a deep comprehension of the relationship between them, outperforming SOTA baselines on both in-domain and out-of-domain datasets. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2410.12624 [pdf]

Field-free superconducting diode effect and magnetochiral anisotropy in FeTe0.7Se0.3 junctions with the inherent asymmetric barrier

Authors: Shengyao Li, Ya Deng, Dianyi Hu, Chao Zhu, Zherui Yang, Wanghao Tian, Xueyan Wang, Ming Yue, Qiong Wu, Zheng Liu, Xiao Renshaw Wang

Abstract: Nonreciprocal electrical transport, characterized by an asymmetric relationship between current and voltage, plays a crucial role in modern electronic industries. Recent studies have extended this phenomenon to superconductors, introducing the concept of the superconducting diode effect (SDE). The SDE is characterized by unequal critical supercurrents along opposite directions. Due to the requirem… ▽ More Nonreciprocal electrical transport, characterized by an asymmetric relationship between current and voltage, plays a crucial role in modern electronic industries. Recent studies have extended this phenomenon to superconductors, introducing the concept of the superconducting diode effect (SDE). The SDE is characterized by unequal critical supercurrents along opposite directions. Due to the requirement on broken inversion symmetry, the SDE is commonly accompanied by electrical magnetochiral anisotropy (eMCA) in the resistive state. Achieving a magnetic field-free SDE with field tunability is pivotal for advancements in superconductor devices. Conventionally, the field-free SDE has been achieved in Josephson junctions by intentionally intercalating an asymmetric barrier layer. Alternatively, internal magnetism was employed. Both approaches pose challenges in the selection of superconductors and fabrication processes, thereby impeding the development of SDE. Here, we present a field-free SDE in FeTe0.7Se0.3 (FTS) junction with eMCA, a phenomenon absent in FTS single nanosheets. The field-free property is associated with the presence of a gradient oxide layer on the upper surface of each FTS nanosheet, while the eMCA is linked to spin-splitting arising from the absence of inversion symmetry. Both the SDE and eMCA respond to magnetic fields with distinct temperature dependencies. This work presents a versatile and straightforward strategy for advancing superconducting electronics. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.12478

MlingConf: A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models

Authors: Boyang Xue, Hongru Wang, Rui Wang, Sheng Wang, Zezhong Wang, Yiming Du, Bin Liang, Kam-Fai Wong

Abstract: The tendency of Large Language Models (LLMs) to generate hallucinations raises concerns regarding their reliability. Therefore, confidence estimations indicating the extent of trustworthiness of the generations become essential. However, current LLM confidence estimations in languages other than English remain underexplored. This paper addresses this gap by introducing a comprehensive investigatio… ▽ More The tendency of Large Language Models (LLMs) to generate hallucinations raises concerns regarding their reliability. Therefore, confidence estimations indicating the extent of trustworthiness of the generations become essential. However, current LLM confidence estimations in languages other than English remain underexplored. This paper addresses this gap by introducing a comprehensive investigation of Multilingual Confidence estimation (MlingConf) on LLMs, focusing on both language-agnostic (LA) and language-specific (LS) tasks to explore the performance and language dominance effects of multilingual confidence estimations on different tasks. The benchmark comprises four meticulously checked and human-evaluate high-quality multilingual datasets for LA tasks and one for the LS task tailored to specific social, cultural, and geographical contexts of a language. Our experiments reveal that on LA tasks English exhibits notable linguistic dominance in confidence estimations than other languages, while on LS tasks, using question-related language to prompt LLMs demonstrates better linguistic dominance in multilingual confidence estimations. The phenomena inspire a simple yet effective native-tone prompting strategy by employing language-specific prompts for LS tasks, effectively improving LLMs' reliability and accuracy on LS tasks. △ Less

Submitted 17 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

Comments: Comments: This work was intended as a replacement of arXiv:2402.13606 and any subsequent updates will appear there

arXiv:2410.12425 [pdf, other]

Perseus: Leveraging Common Data Patterns with Curriculum Learning for More Robust Graph Neural Networks

Authors: Kaiwen Xia, Huijun Wu, Duanyu Li, Min Xie, Ruibo Wang, Wenzhe Zhang

Abstract: Graph Neural Networks (GNNs) excel at handling graph data but remain vulnerable to adversarial attacks. Existing defense methods typically rely on assumptions like graph sparsity and homophily to either preprocess the graph or guide structure learning. However, preprocessing methods often struggle to accurately distinguish between normal edges and adversarial perturbations, leading to suboptimal r… ▽ More Graph Neural Networks (GNNs) excel at handling graph data but remain vulnerable to adversarial attacks. Existing defense methods typically rely on assumptions like graph sparsity and homophily to either preprocess the graph or guide structure learning. However, preprocessing methods often struggle to accurately distinguish between normal edges and adversarial perturbations, leading to suboptimal results due to the loss of valuable edge information. Robust graph neural network models train directly on graph data affected by adversarial perturbations, without preprocessing. This can cause the model to get stuck in poor local optima, negatively affecting its performance. To address these challenges, we propose Perseus, a novel adversarial defense method based on curriculum learning. Perseus assesses edge difficulty using global homophily and applies a curriculum learning strategy to adjust the learning order, guiding the model to learn the full graph structure while adaptively focusing on common data patterns. This approach mitigates the impact of adversarial perturbations. Experiments show that models trained with Perseus achieve superior performance and are significantly more robust to adversarial attacks. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.11954 [pdf, other]

DAXA: Traversing the X-ray desert by Democratising Archival X-ray Astronomy

Authors: David J. Turner, Jessica E. Pilling, Megan Donahue, Paul A. Giles, Kathy Romer, Agrim Gupta, Toby Wallage, Ray Wang

Abstract: We introduce a new, open-source, Python module for the acquisition and processing of archival data from many X-ray telescopes - Democratising Archival X-ray Astronomy (hereafter referred to as DAXA). Our software is built to increase access to, and use of, large archives of X-ray astronomy data; providing a unified, easy-to-use, Python interface to the disparate archives and processing tools. We p… ▽ More We introduce a new, open-source, Python module for the acquisition and processing of archival data from many X-ray telescopes - Democratising Archival X-ray Astronomy (hereafter referred to as DAXA). Our software is built to increase access to, and use of, large archives of X-ray astronomy data; providing a unified, easy-to-use, Python interface to the disparate archives and processing tools. We provide this interface for the majority of X-ray telescopes launched within the last 30 years. This module enables much greater access to X-ray data for non-specialists, while preserving low-level control of processing for X-ray experts. It is useful for identifying relevant observations of a single object of interest but it excels at creating multi-mission datasets for serendipitous or targeted studies of large samples of X-ray emitting objects. The management and organization of datasets is also made easier; DAXA archives can be version controlled and updated if new data become available. Once relevant observations are identified, the raw data can be downloaded (and optionally processed) through DAXA, or pre-processed event lists, images, and exposure maps can be downloaded if they are available. X-ray observations are perfectly suited to serendipitous discoveries and archival analyses, and with a decade-long `X-ray desert' potentially on the horizon archival data will take on even greater importance; enhanced access to those archives will be vital to the continuation of X-ray astronomy. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 5 pages, 1 figure, submitted to JOSS; GitHub repository - https://github.com/DavidT3/DAXA; Documentation - https://daxa.readthedocs.io/

arXiv:2410.11925 [pdf, other]

A Study of Decay Rate of Bound Negative Muons

Authors: Jian-Bo Deng, Miao-Yi Deng, Shi-Jie Ma, Rui-Bo Wang, Qi-Qi Fan, Peng-Zhang He, Yi-Peng He, Shuo-Wen Li, Xian-Ru Hu

Abstract: A number of experiments show that the decay lifetimes of muons bound to atomic nuclei are longer than the decay lifetimes of free muons. In this paper, a scheme of extending quantum mechanics (EQM) is proposed to resolve this problem. The Schr$\ddot{\text{o}}$dinger's equation is obtained to prove the validation of this attempt. The decay ratio of bound muons is also calculated in EQM, and the res… ▽ More A number of experiments show that the decay lifetimes of muons bound to atomic nuclei are longer than the decay lifetimes of free muons. In this paper, a scheme of extending quantum mechanics (EQM) is proposed to resolve this problem. The Schr$\ddot{\text{o}}$dinger's equation is obtained to prove the validation of this attempt. The decay ratio of bound muons is also calculated in EQM, and the result is in good agreement with the experimental data. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 5 pages, 1 figure, 2 tables

arXiv:2410.11913 [pdf]

Development and Testing of a Wood Panels Bark Removal Equipment Based on Deep Learning

Authors: Rijun Wang, Guanghao Zhang, Hongyang Chen, Xinye Yu, Yesheng Chen, Fulong Liang, Xiangwei Mou, Bo Wang

Abstract: Attempting to apply deep learning methods to wood panels bark removal equipment to enhance the quality and efficiency of bark removal is a significant and challenging endeavor. This study develops and tests a deep learning-based wood panels bark removal equipment. In accordance with the practical requirements of sawmills, a wood panels bark removal equipment equipped with a vision inspection syste… ▽ More Attempting to apply deep learning methods to wood panels bark removal equipment to enhance the quality and efficiency of bark removal is a significant and challenging endeavor. This study develops and tests a deep learning-based wood panels bark removal equipment. In accordance with the practical requirements of sawmills, a wood panels bark removal equipment equipped with a vision inspection system is designed. Based on a substantial collection of wood panel images obtained using the visual inspection system, the first general wood panels semantic segmentation dataset is constructed for training the BiSeNetV1 model employed in this study. Furthermore, the calculation methods and processes for the essential key data required in the bark removal process are presented in detail. Comparative experiments of the BiSeNetV1 model and tests of bark removal effectiveness are conducted in both laboratory and sawmill environments. The results of the comparative experiments indicate that the application of the BiSeNetV1 segmentation model is rational and feasible. The results of the bark removal effectiveness tests demonstrate a significant improvement in both the quality and efficiency of bark removal. The developed equipment fully meets the sawmill's requirements for precision and efficiency in bark removal processing. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.11848 [pdf, other]

A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference

Authors: Yuan Li, Dapeng Wu, Yaping Cui, Peng He, Yuan Zhang, Ruyan Wang

Abstract: Image matching is a fundamental and critical task of multisource remote sensing image applications. However, remote sensing images are susceptible to various noises. Accordingly, how to effectively achieve accurate matching in noise images is a challenging problem. To solve this issue, we propose a robust multisource remote sensing image matching method utilizing attention and feature enhancement… ▽ More Image matching is a fundamental and critical task of multisource remote sensing image applications. However, remote sensing images are susceptible to various noises. Accordingly, how to effectively achieve accurate matching in noise images is a challenging problem. To solve this issue, we propose a robust multisource remote sensing image matching method utilizing attention and feature enhancement against noise interference. In the first stage, we combine deep convolution with the attention mechanism of transformer to perform dense feature extraction, constructing feature descriptors with higher discriminability and robustness. Subsequently, we employ a coarse-to-fine matching strategy to achieve dense matches. In the second stage, we introduce an outlier removal network based on a binary classification mechanism, which can establish effective and geometrically consistent correspondences between images; through weighting for each correspondence, inliers vs. outliers classification are performed, as well as removing outliers from dense matches. Ultimately, we can accomplish more efficient and accurate matches. To validate the performance of the proposed method, we conduct experiments using multisource remote sensing image datasets for comparison with other state-of-the-art methods under different scenarios, including noise-free, additive random noise, and periodic stripe noise. Comparative results indicate that the proposed method has a more well-balanced performance and robustness. The proposed method contributes a valuable reference for solving the difficult problem of noise image matching. △ Less

Submitted 30 September, 2024; originally announced October 2024.

Comments: 21 pages, 13 figures

arXiv:2410.11531 [pdf, other]

AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data

Authors: Xinjie Zhao, Moritz Blum, Rui Yang, Boming Yang, Luis Márquez Carpintero, Mónica Pina-Navarro, Tony Wang, Xin Li, Huitao Li, Yanran Fu, Rongrong Wang, Juntao Zhang, Irene Li

Abstract: Large Language Models~(LLMs) have demonstrated capabilities across various applications but face challenges such as hallucination, limited reasoning abilities, and factual inconsistencies, especially when tackling complex, domain-specific tasks like question answering~(QA). While Knowledge Graphs~(KGs) have been shown to help mitigate these issues, research on the integration of LLMs with backgrou… ▽ More Large Language Models~(LLMs) have demonstrated capabilities across various applications but face challenges such as hallucination, limited reasoning abilities, and factual inconsistencies, especially when tackling complex, domain-specific tasks like question answering~(QA). While Knowledge Graphs~(KGs) have been shown to help mitigate these issues, research on the integration of LLMs with background KGs remains limited. In particular, user accessibility and the flexibility of the underlying KG have not been thoroughly explored. We introduce AGENTiGraph (Adaptive Generative ENgine for Task-based Interaction and Graphical Representation), a platform for knowledge management through natural language interaction. It integrates knowledge extraction, integration, and real-time visualization. AGENTiGraph employs a multi-agent architecture to dynamically interpret user intents, manage tasks, and integrate new knowledge, ensuring adaptability to evolving user requirements and data contexts. Our approach demonstrates superior performance in knowledge graph interactions, particularly for complex domain-specific tasks. Experimental results on a dataset of 3,500 test cases show AGENTiGraph significantly outperforms state-of-the-art zero-shot baselines, achieving 95.12\% accuracy in task classification and 90.45\% success rate in task execution. User studies corroborate its effectiveness in real-world scenarios. To showcase versatility, we extended AGENTiGraph to legislation and healthcare domains, constructing specialized KGs capable of answering complex queries in legal and medical contexts. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 30 pages, 7 figures; Submitted to COLING 2025 System Demonstrations Track

arXiv:2410.11390 [pdf, ps, other]

Experimental Design Using Interlacing Polynomials

Authors: Lap Chi Lau, Robert Wang, Hong Zhou

Abstract: We present a unified deterministic approach for experimental design problems using the method of interlacing polynomials. Our framework recovers the best-known approximation guarantees for the well-studied D/A/E-design problems with simple analysis. Furthermore, we obtain improved non-trivial approximation guarantee for E-design in the challenging small budget regime. Additionally, our approach pr… ▽ More We present a unified deterministic approach for experimental design problems using the method of interlacing polynomials. Our framework recovers the best-known approximation guarantees for the well-studied D/A/E-design problems with simple analysis. Furthermore, we obtain improved non-trivial approximation guarantee for E-design in the challenging small budget regime. Additionally, our approach provides an optimal approximation guarantee for a generalized ratio objective that generalizes both D-design and A-design. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 16 pages

arXiv:2410.10689 [pdf, other]

Fully Programmable Spatial Photonic Ising Machine by Focal Plane Division

Authors: Daniele Veraldi, Davide Pierangeli, Silvia Gentilini, Marcello Calvanese Strinati, Jason Sakellariou, James S. Cummins, Airat Kamaletdinov, Marvin Syed, Richard Zhipeng Wang, Natalia G. Berloff, Dimitrios Karanikolopoulos, Pavlos G. Savvidis, Claudio Conti

Abstract: Ising machines are an emerging class of hardware that promises ultrafast and energy-efficient solutions to NP-hard combinatorial optimization problems. Spatial photonic Ising machines (SPIMs) exploit optical computing in free space to accelerate the computation, showcasing parallelism, scalability, and low power consumption. However, current SPIMs can implement only a restricted class of problems.… ▽ More Ising machines are an emerging class of hardware that promises ultrafast and energy-efficient solutions to NP-hard combinatorial optimization problems. Spatial photonic Ising machines (SPIMs) exploit optical computing in free space to accelerate the computation, showcasing parallelism, scalability, and low power consumption. However, current SPIMs can implement only a restricted class of problems. This partial programmability is a critical limitation that hampers their benchmark. Achieving full programmability of the device while preserving its scalability is an open challenge. Here, we report a fully programmable SPIM achieved through a novel operation method based on the division of the focal plane. In our scheme, a general Ising problem is decomposed into a set of Mattis Hamiltonians, whose energies are simultaneously computed optically by measuring the intensity on different regions of the camera sensor. Exploiting this concept, we experimentally demonstrate the computation with high success probability of ground-state solutions of up to 32-spin Ising models on unweighted maximum cut graphs with and without ferromagnetic bias. Simulations of the hardware prove a favorable scaling of the accuracy with the number of spins. Our fully programmable SPIM enables the implementation of many quadratic unconstrained binary optimization problems, further establishing SPIMs as a leading paradigm in non von Neumann hardware. △ Less

Submitted 14 October, 2024; originally announced October 2024.

arXiv:2410.10437 [pdf, other]

Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Authors: Boheng Li, Yanhao Wei, Yankai Fu, Zhenting Wang, Yiming Li, Jie Zhang, Run Wang, Tianwei Zhang

Abstract: Text-to-image diffusion models are pushing the boundaries of what generative AI can achieve in our lives. Beyond their ability to generate general images, new personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles. Such a lightweight solution, enabling AI practitioners and developers to easily build their own person… ▽ More Text-to-image diffusion models are pushing the boundaries of what generative AI can achieve in our lives. Beyond their ability to generate general images, new personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles. Such a lightweight solution, enabling AI practitioners and developers to easily build their own personalized models, also poses a new concern regarding whether the personalized models are trained from unauthorized data. A promising solution is to proactively enable data traceability in generative models, where data owners embed external coatings (e.g., image watermarks or backdoor triggers) onto the datasets before releasing. Later the models trained over such datasets will also learn the coatings and unconsciously reproduce them in the generated mimicries, which can be extracted and used as the data usage evidence. However, we identify the existing coatings cannot be effectively learned in personalization tasks, making the corresponding verification less reliable. In this paper, we introduce SIREN, a novel methodology to proactively trace unauthorized data usage in black-box personalized text-to-image diffusion models. Our approach optimizes the coating in a delicate way to be recognized by the model as a feature relevant to the personalization task, thus significantly improving its learnability. We also utilize a human perceptual-aware constraint, a hypersphere classification technique, and a hypothesis-testing-guided verification method to enhance the stealthiness and detection accuracy of the coating. The effectiveness of SIREN is verified through extensive experiments on a diverse set of benchmark datasets, models, and learning algorithms. SIREN is also effective in various real-world scenarios and evaluated against potential countermeasures. Our code is publicly available. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: To appear in the IEEE Symposium on Security & Privacy, May 2025

arXiv:2410.09921 [pdf, other]

The Roles of Contextual Semantic Relevance Metrics in Human Visual Processing

Authors: Kun Sun, Rong Wang

Abstract: Semantic relevance metrics can capture both the inherent semantics of individual objects and their relationships to other elements within a visual scene. Numerous previous research has demonstrated that these metrics can influence human visual processing. However, these studies often did not fully account for contextual information or employ the recent deep learning models for more accurate comput… ▽ More Semantic relevance metrics can capture both the inherent semantics of individual objects and their relationships to other elements within a visual scene. Numerous previous research has demonstrated that these metrics can influence human visual processing. However, these studies often did not fully account for contextual information or employ the recent deep learning models for more accurate computation. This study investigates human visual perception and processing by introducing the metrics of contextual semantic relevance. We evaluate semantic relationships between target objects and their surroundings from both vision-based and language-based perspectives. Testing a large eye-movement dataset from visual comprehension, we employ state-of-the-art deep learning techniques to compute these metrics and analyze their impacts on fixation measures on human visual processing through advanced statistical models. These metrics could also simulate top-down and bottom-up processing in visual perception. This study further integrates vision-based and language-based metrics into a novel combined metric, addressing a critical gap in previous research that often treated visual and semantic similarities separately. Results indicate that all metrics could precisely predict fixation measures in visual perception and processing, but with distinct roles in prediction. The combined metric outperforms other metrics, supporting theories that emphasize the interaction between semantic and visual information in shaping visual perception/processing. This finding aligns with growing recognition of the importance of multi-modal information processing in human cognition. These insights enhance our understanding of cognitive mechanisms underlying visual processing and have implications for developing more accurate computational models in fields such as cognitive science and human-computer interaction. △ Less

Submitted 13 October, 2024; originally announced October 2024.

arXiv:2410.09531 [pdf, other]

doi 10.1145/3676536.3676661

PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization

Authors: Tianshi Xu, Shuzhang Zhong, Wenxuan Zeng, Runsheng Wang, Meng Li

Abstract: Private deep neural network (DNN) inference based on secure two-party computation (2PC) enables secure privacy protection for both the server and the client. However, existing secure 2PC frameworks suffer from a high inference latency due to enormous communication. As the communication of both linear and non-linear DNN layers reduces with the bit widths of weight and activation, in this paper, we… ▽ More Private deep neural network (DNN) inference based on secure two-party computation (2PC) enables secure privacy protection for both the server and the client. However, existing secure 2PC frameworks suffer from a high inference latency due to enormous communication. As the communication of both linear and non-linear DNN layers reduces with the bit widths of weight and activation, in this paper, we propose PrivQuant, a framework that jointly optimizes the 2PC-based quantized inference protocols and the network quantization algorithm, enabling communication-efficient private inference. PrivQuant proposes DNN architecture-aware optimizations for the 2PC protocols for communication-intensive quantized operators and conducts graph-level operator fusion for communication reduction. Moreover, PrivQuant also develops a communication-aware mixed precision quantization algorithm to improve inference efficiency while maintaining high accuracy. The network/protocol co-optimization enables PrivQuant to outperform prior-art 2PC frameworks. With extensive experiments, we demonstrate PrivQuant reduces communication by $11\times, 2.5\times \mathrm{and}~ 2.8\times$, which results in $8.7\times, 1.8\times ~ \mathrm{and}~ 2.4\times$ latency reduction compared with SiRNN, COINN, and CoPriv, respectively. △ Less

Submitted 12 October, 2024; originally announced October 2024.

Comments: ICCAD 2024

arXiv:2410.09289 [pdf, other]

AuD-Former: A Hierarchical Transformer Network for Multimodal Audio-Based Disease Prediction

Authors: Jinjin Cai, Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Victoria McKenna, Aaron Friedman, Rachel Foot, Susan Storey, Ryan Boente, Sudip Vhaduri, Byung-Cheol Min

Abstract: Audio-based disease prediction is emerging as a promising supplement to traditional medical diagnosis methods, facilitating early, convenient, and non-invasive disease detection and prevention. Multimodal fusion, which integrates features from various domains within or across bio-acoustic modalities, has proven effective in enhancing diagnostic performance. However, most existing methods in the fi… ▽ More Audio-based disease prediction is emerging as a promising supplement to traditional medical diagnosis methods, facilitating early, convenient, and non-invasive disease detection and prevention. Multimodal fusion, which integrates features from various domains within or across bio-acoustic modalities, has proven effective in enhancing diagnostic performance. However, most existing methods in the field employ unilateral fusion strategies that focus solely on either intra-modal or inter-modal fusion. This approach limits the full exploitation of the complementary nature of diverse acoustic feature domains and bio-acoustic modalities. Additionally, the inadequate and isolated exploration of latent dependencies within modality-specific and modality-shared spaces curtails their capacity to manage the inherent heterogeneity in multimodal data. To fill these gaps, we propose AuD-Former, a hierarchical transformer network designed for general multimodal audio-based disease prediction. Specifically, we seamlessly integrate intra-modal and inter-modal fusion in a hierarchical manner and proficiently encode the necessary intra-modal and inter-modal complementary correlations, respectively. Comprehensive experiments demonstrate that AuD-Former achieves state-of-the-art performance in predicting three diseases: COVID-19, Parkinson's disease, and pathological dysarthria, showcasing its promising potential in a broad context of audio-based disease prediction tasks. Additionally, extensive ablation studies and qualitative analyses highlight the significant benefits of each main component within our model. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.08920 [pdf, other]

Efficient Hyperparameter Importance Assessment for CNNs

Authors: Ruinan Wang, Ian Nabney, Mohammad Golbabaee

Abstract: Hyperparameter selection is an essential aspect of the machine learning pipeline, profoundly impacting models' robustness, stability, and generalization capabilities. Given the complex hyperparameter spaces associated with Neural Networks and the constraints of computational resources and time, optimizing all hyperparameters becomes impractical. In this context, leveraging hyperparameter importanc… ▽ More Hyperparameter selection is an essential aspect of the machine learning pipeline, profoundly impacting models' robustness, stability, and generalization capabilities. Given the complex hyperparameter spaces associated with Neural Networks and the constraints of computational resources and time, optimizing all hyperparameters becomes impractical. In this context, leveraging hyperparameter importance assessment (HIA) can provide valuable guidance by narrowing down the search space. This enables machine learning practitioners to focus their optimization efforts on the hyperparameters with the most significant impact on model performance while conserving time and resources. This paper aims to quantify the importance weights of some hyperparameters in Convolutional Neural Networks (CNNs) with an algorithm called N-RReliefF, laying the groundwork for applying HIA methodologies in the Deep Learning field. We conduct an extensive study by training over ten thousand CNN models across ten popular image classification datasets, thereby acquiring a comprehensive dataset containing hyperparameter configuration instances and their corresponding performance metrics. It is demonstrated that among the investigated hyperparameters, the top five important hyperparameters of the CNN model are the number of convolutional layers, learning rate, dropout rate, optimizer and epoch. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: 15 pages

arXiv:2410.08821 [pdf, other]

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

Authors: Ruobing Wang, Daren Zha, Shi Yu, Qingfei Zhao, Yuxuan Chen, Yixuan Wang, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun

Abstract: Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge. For complex QA, however, existing RAG methods use LLMs to actively predict retrieval timing and directly use the retrieved information for generation, regardless of whether… ▽ More Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge. For complex QA, however, existing RAG methods use LLMs to actively predict retrieval timing and directly use the retrieved information for generation, regardless of whether the retrieval timing accurately reflects the actual information needs, or sufficiently considers prior retrieved knowledge, which may result in insufficient information gathering and interaction, yielding low-quality answers. To address these, we propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks, which includes the iterative information collector, adaptive memory reviewer, and task-oriented generator, while following a new Retriever-and-Memory paradigm. Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes and updating them into the existing optimal knowledge structure, enhancing high-quality knowledge interactions. In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration. We conduct extensive experiments on five complex QA datasets, and the results demonstrate the superiority and effectiveness of our method and its components. The code and data are at https://github.com/thunlp/Adaptive-Note. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: 15 pages, 2 figures

arXiv:2410.08628 [pdf, other]

Is the Gum Nebula an Important Interstellar Scattering Disk of Background Pulsars?

Authors: Rui Wang, Zhen Yan, Zhiqiang Shen, KeJia Lee, Yajun Wu, Rongbing Zhao, Zhipeng Huang, Xiaowei Wang, Jie Liu

Abstract: The Gum Nebula is a faint supernova remnant extending about 40 degrees across the southern sky, potentially affecting tens of background pulsars. Though the view that the Gum Nebula acts as a potential scattering screen for background pulsars has been recurrently mentioned over the past five decades, it has not been directly confirmed. We chose the strong background pulsar PSR~B0740$-$28 as a prob… ▽ More The Gum Nebula is a faint supernova remnant extending about 40 degrees across the southern sky, potentially affecting tens of background pulsars. Though the view that the Gum Nebula acts as a potential scattering screen for background pulsars has been recurrently mentioned over the past five decades, it has not been directly confirmed. We chose the strong background pulsar PSR~B0740$-$28 as a probe and monitored its diffractive interstellar scintillation (DISS) at 2.25~$\&$~8.60~GHz simultaneously for about two years using the Shanghai Tian Ma Radio Telescope (TMRT). DISS was detected at both frequencies and quantified by two-dimensional autocorrelation analysis. We calculated their scattering spectral index $α$ and found that 9/21 of the observations followed the theoretical predictions, while 4/21 of them clearly showed $α< 4$. This finding provides strong support for anomalous scattering along the pulsar line of sight, due to the large frequency lever arm and the simultaneous features of our dual-frequency observations. In comparison to the 2.25~GHz observations, scintillation arcs were observed in 10/21 of the secondary spectrum plots for 8.60~GHz observations. Consequently, the highest frequency record for pulsar scintillation arc detection was updated to 8.60~GHz. Our fitting results were the most direct evidence for the view that the Gum Nebula acts as the scattering screen for background pulsars, because both the distance ($245^{+69}_{-72}$~pc) and transverse speed ($22.4^{+4.1}_{-4.2}$~km/s) of the scintillation screen are comparable with related parameters of the Gum Nebula. Our findings indicated that anisotropic scattering provides a superior explanation for the annual modulation of scintillation arcs than isotropic scattering. Additionally, the orientation of its long axis was also fitted. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: Accepted by SCIENCE CHINA Physics, Mechanics & Astronomy

arXiv:2410.08553 [pdf]

Balancing Innovation and Privacy: Data Security Strategies in Natural Language Processing Applications

Authors: Shaobo Liu, Guiran Liu, Binrong Zhu, Yuanshuai Luo, Linxiao Wu, Rui Wang

Abstract: This research addresses privacy protection in Natural Language Processing (NLP) by introducing a novel algorithm based on differential privacy, aimed at safeguarding user data in common applications such as chatbots, sentiment analysis, and machine translation. With the widespread application of NLP technology, the security and privacy protection of user data have become important issues that need… ▽ More This research addresses privacy protection in Natural Language Processing (NLP) by introducing a novel algorithm based on differential privacy, aimed at safeguarding user data in common applications such as chatbots, sentiment analysis, and machine translation. With the widespread application of NLP technology, the security and privacy protection of user data have become important issues that need to be solved urgently. This paper proposes a new privacy protection algorithm designed to effectively prevent the leakage of user sensitive information. By introducing a differential privacy mechanism, our model ensures the accuracy and reliability of data analysis results while adding random noise. This method not only reduces the risk caused by data leakage but also achieves effective processing of data while protecting user privacy. Compared to traditional privacy methods like data anonymization and homomorphic encryption, our approach offers significant advantages in terms of computational efficiency and scalability while maintaining high accuracy in data analysis. The proposed algorithm's efficacy is demonstrated through performance metrics such as accuracy (0.89), precision (0.85), and recall (0.88), outperforming other methods in balancing privacy and utility. As privacy protection regulations become increasingly stringent, enterprises and developers must take effective measures to deal with privacy risks. Our research provides an important reference for the application of privacy protection technology in the field of NLP, emphasizing the need to achieve a balance between technological innovation and user privacy. In the future, with the continuous advancement of technology, privacy protection will become a core element of data-driven applications and promote the healthy development of the entire industry. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.08501 [pdf]

High-Throughput Discovery of Kagome Materials in Transition Metal Oxide Monolayers

Authors: Renhong Wang, Cong Wang, Deping Guo, Jiaqi Dai, Canbo Zong, Weihan Zhang, Wei Ji

Abstract: Kagome materials have been found to exhibit exotic physical properties such as spin frustration, charge density waves, and unconventional superconductivity. However, the number of materials with kagome lattice-related properties discovered so far is relatively small, limiting the exploration of the physical phenomena associated with kagome materials. Due to the weaker interlayer coupling in two-di… ▽ More Kagome materials have been found to exhibit exotic physical properties such as spin frustration, charge density waves, and unconventional superconductivity. However, the number of materials with kagome lattice-related properties discovered so far is relatively small, limiting the exploration of the physical phenomena associated with kagome materials. Due to the weaker interlayer coupling in two-dimensional kagome materials, they are more likely to exhibit kagome lattice-related physical properties. Therefore, the search for potential two-dimensional kagome materials is crucial for understanding the underlying physics of kagome lattices. In this work, we performed high-throughput workflow to discover thermodynamically stable kagome transition metal oxide monolayers based on "1+3" strategy. Starting from a pool of 349 candidate materials, we identified 12 globally stable kagome monolayers, including both magnetic and non-magnetic structures. These monolayers were classified into four categories based on their electronic structures, lattice types, symmetry, band gaps, and magnetic properties. A detailed analysis was performed on kagome structures exhibiting band features near the Fermi level. This study demonstrates the feasibility of the "1+3" strategy in constructing kagome lattices, providing a pathway for further theoretical and experimental exploration of kagome materials and their potential quantum phenomena. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.08487 [pdf, other]

Higher-Order Band Topology in Twisted Bilayer Kagome Lattice

Authors: Xiaolin Wan, Junjie Zeng, Ruixiang Zhu, Dong-Hui Xu, Baobing Zheng, Rui Wang

Abstract: Topologically protected corner states serve as a key indicator for two-dimensional higher-order topological insulators, yet they have not been experimentally identified in realistic materials. Here, by utilizing the effective tight-binding model and symmetry arguments, we establish a connection between higher-order topological insulators and twisted bilayer kagome lattices. We find that the topolo… ▽ More Topologically protected corner states serve as a key indicator for two-dimensional higher-order topological insulators, yet they have not been experimentally identified in realistic materials. Here, by utilizing the effective tight-binding model and symmetry arguments, we establish a connection between higher-order topological insulators and twisted bilayer kagome lattices. We find that the topologically nontrivial bulk band gap arises in the twisted bilayer kagome lattice system due to twist-induced intervalley scattering, leading to the emergence of higher-order topological insulators with a range of commensurate twist angles, and the higher-order band topology is verified by the second Stiefel-Whitney number and fractionally quantized corner charges. Moreover, we investigate the influence of disorder and charge density wave order on the stability of higher-order topological insulator phases. The results show that the corner states of twisted bilayer kagome lattice systems are robust with respect to disorder and charge density wave. Our work not only provides a feasible approach to realize the readily controllable higher-order topological insulator phases by employing a simple twist technique, but also demonstrates that the twisted bilayer kagome lattice systems exhibit the robustness of higher-order band topology, making it feasible to check above prediction in experiments. △ Less

Submitted 14 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

Comments: 7 pages, 4 figures

arXiv:2410.08103 [pdf, other]

A Multi-station Meteor Monitoring (M$^3$) System. II. system upgrade and a pathfinder network

Authors: Z. Li, H. Zou, J. Liu, J. Ma, Q. Meng, Y. Cai, X. Zhao, X. Li, Z. Tu, B. Zhang, R. Wang, S. Wang, F. Lu

Abstract: Meteors are important phenomenon reflecting many properties of interplanetary dust particles. The study of their origin, mass distribution, and orbit evolution all require large data volume, which can only be obtained using large meteor networks. After meteor networks in Europe and America, we present our designs and upgrades of a proposing network in China. The new designs are mainly aimed for fa… ▽ More Meteors are important phenomenon reflecting many properties of interplanetary dust particles. The study of their origin, mass distribution, and orbit evolution all require large data volume, which can only be obtained using large meteor networks. After meteor networks in Europe and America, we present our designs and upgrades of a proposing network in China. The new designs are mainly aimed for facilitating data gathering process. Each of the newly designed meteor stations now can support up to 4 cameras to cover the full sky. Newer version of meteor station software now works as an integral system, which can streamline the process of detecting, measuring and uploading meteors. We have built a meteor data platform to store, process and display the meteor data automatically. The software and data platform are designed to be easy to learn and use, so it can attract more people to join and operate meteor stations. Four stations are installed as the first phase of the network, and during the operation in 10 months, the network detected 8,683 orbits, and we find that half of the orbits can be related to established meteoroid streams. The statistical analysis of sporadic meteoroids shows a bimodal distribution of the velocities, which coincides with previous studies. The distribution of Tisserand parameters, $T_j$, shows the two peaks at $T_j=0$ and 3, indicating the different orbits of parent bodies (isotropic and ecliptic), which are divided by $T_j=2$. The falling trajectory of a meteorite was also predicted using observational data of the network. We are currently expanding the network, and in the future we will carry out detailed analysis of the key parameters of the distribution of the meteoroids. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 24 pages, 25 figures, Accepted for publication in the Publications of the Astronomical Society of the Pacific (PASP)

arXiv:2410.07576 [pdf]

Simplified radar architecture based on information metasurface

Authors: Si Ran Wang, Zhan Ye Chen, Shao Nan Chen, Jun Yan Dai, Jun Wei Zhang, Zhen Jie Qi, Li Jie Wu, Meng Ke Sun, Qun Yan Zhou, Hui Dong Li, Zhang Jie Luo, Qiang Cheng, Tie Jun Cui

Abstract: Modern radar typically employs a chain architecture that consists of radio-frequency (RF) and intermediate frequency (IF) units, baseband digital signal processor, and information display. However, this architecture often results in high costs, significant hardware demands, and integration challenges. Here we propose a simplified radar architecture based on space-time-coding (STC) information meta… ▽ More Modern radar typically employs a chain architecture that consists of radio-frequency (RF) and intermediate frequency (IF) units, baseband digital signal processor, and information display. However, this architecture often results in high costs, significant hardware demands, and integration challenges. Here we propose a simplified radar architecture based on space-time-coding (STC) information metasurfaces. With their powerful capabilities to generate multiple harmonic frequencies and customize their phases, the STC metasurfaces play a key role in chirp signal generation, transmission, and echo reception. Remarkably, the receiving STC metasurface can implement dechirp processing directly on the RF level and realize the digital information outputs, which are beneficial to lower the hardware requirement at the receiving end while potentially shortening the time needed for conventional digital processing. As a proof of concept, the proposed metasurface radar is tested in a series of experiments for target detection and range/speed measurement, yielding results comparable to those obtained by conventional methods. This study provides valuable inspiration for a new radar system paradigm to combine the RF front ends and signal processors on the information metasurface platform that offers essential functionalities while significantly reducing the system complexity and cost. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 25 pages, 10 figures

arXiv:2410.07019 [pdf, ps, other]

Graph identification index

Authors: Runze Wang

Abstract: We introduce the \emph{ID-index} of a finite simple connected graph. For a graph $G=(V,\ E)$ with diameter $d$, we let $f:V\longrightarrow \mathbb{R}$ assign \emph{ranks} to the vertices, then under $f$, each vertex $v$ gets a \emph{string}, which is a $d$-vector with the $i$-th coordinate being the sum of the ranks of the vertices that are of distance $i$ from $v$. The \emph{ID-index} of $G$, den… ▽ More We introduce the \emph{ID-index} of a finite simple connected graph. For a graph $G=(V,\ E)$ with diameter $d$, we let $f:V\longrightarrow \mathbb{R}$ assign \emph{ranks} to the vertices, then under $f$, each vertex $v$ gets a \emph{string}, which is a $d$-vector with the $i$-th coordinate being the sum of the ranks of the vertices that are of distance $i$ from $v$. The \emph{ID-index} of $G$, denoted by $IDI(G)$, is defined to be the minimum number $k$ for which there is an $f$ with $|f(V)|=k$, such that each vertex gets a distinct string under $f$. We present some relations between ID-graphs, which were defined by Chartrand, Kono, and Zhang, and their ID-indices; give a lower bound on the ID-index of a graph; and determine the ID-indices of paths, grids, cycles, prisms, complete graphs, some complete multipartite graphs, and some caterpillars. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.06333 [pdf, other]

Batched Bayesian optimization with correlated candidate uncertainties

Authors: Jenna Fromer, Runzhong Wang, Mrunali Manjrekar, Austin Tripp, José Miguel Hernández-Lobato, Connor W. Coley

Abstract: Batched Bayesian optimization (BO) can accelerate molecular design by efficiently identifying top-performing compounds from a large chemical library. Existing acquisition strategies for batch design in BO aim to balance exploration and exploitation. This often involves optimizing non-additive batch acquisition functions, necessitating approximation via myopic construction and/or diversity heuristi… ▽ More Batched Bayesian optimization (BO) can accelerate molecular design by efficiently identifying top-performing compounds from a large chemical library. Existing acquisition strategies for batch design in BO aim to balance exploration and exploitation. This often involves optimizing non-additive batch acquisition functions, necessitating approximation via myopic construction and/or diversity heuristics. In this work, we propose an acquisition strategy for discrete optimization that is motivated by pure exploitation, qPO (multipoint Probability of Optimality). qPO maximizes the probability that the batch includes the true optimum, which is expressible as the sum over individual acquisition scores and thereby circumvents the combinatorial challenge of optimizing a batch acquisition function. We differentiate the proposed strategy from parallel Thompson sampling and discuss how it implicitly captures diversity. Finally, we apply our method to the model-guided exploration of large chemical libraries and provide empirical evidence that it performs better than or on par with state-of-the-art methods in batched Bayesian optimization. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05940 [pdf, other]

doi 10.1145/3654777.3676330

TouchInsight: Uncertainty-aware Rapid Touch and Text Input for Mixed Reality from Egocentric Vision

Authors: Paul Streli, Mark Richardson, Fadi Botros, Shugao Ma, Robert Wang, Christian Holz

Abstract: While passive surfaces offer numerous benefits for interaction in mixed reality, reliably detecting touch input solely from head-mounted cameras has been a long-standing challenge. Camera specifics, hand self-occlusion, and rapid movements of both head and fingers introduce considerable uncertainty about the exact location of touch events. Existing methods have thus not been capable of achieving t… ▽ More While passive surfaces offer numerous benefits for interaction in mixed reality, reliably detecting touch input solely from head-mounted cameras has been a long-standing challenge. Camera specifics, hand self-occlusion, and rapid movements of both head and fingers introduce considerable uncertainty about the exact location of touch events. Existing methods have thus not been capable of achieving the performance needed for robust interaction. In this paper, we present a real-time pipeline that detects touch input from all ten fingers on any physical surface, purely based on egocentric hand tracking. Our method TouchInsight comprises a neural network to predict the moment of a touch event, the finger making contact, and the touch location. TouchInsight represents locations through a bivariate Gaussian distribution to account for uncertainties due to sensing inaccuracies, which we resolve through contextual priors to accurately infer intended user input. We first evaluated our method offline and found that it locates input events with a mean error of 6.3 mm, and accurately detects touch events (F1=0.99) and identifies the finger used (F1=0.96). In an online evaluation, we then demonstrate the effectiveness of our approach for a core application of dexterous touch input: two-handed text entry. In our study, participants typed 37.0 words per minute with an uncorrected error rate of 2.9% on average. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST'24)

ACM Class: I.4; I.5; H.5

arXiv:2410.05938 [pdf, other]

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

Authors: Yifei Xing, Xiangyuan Lan, Ruiping Wang, Dongmei Jiang, Wenjun Huang, Qingfang Zheng, Yaowei Wang

Abstract: Mamba-based architectures have shown to be a promising new direction for deep learning models owing to their competitive performance and sub-quadratic deployment speed. However, current Mamba multi-modal large language models (MLLM) are insufficient in extracting visual features, leading to imbalanced cross-modal alignment between visual and textural latents, negatively impacting performance on mu… ▽ More Mamba-based architectures have shown to be a promising new direction for deep learning models owing to their competitive performance and sub-quadratic deployment speed. However, current Mamba multi-modal large language models (MLLM) are insufficient in extracting visual features, leading to imbalanced cross-modal alignment between visual and textural latents, negatively impacting performance on multi-modal tasks. In this work, we propose Empowering Multi-modal Mamba with Structural and Hierarchical Alignment (EMMA), which enables the MLLM to extract fine-grained visual information. Specifically, we propose a pixel-wise alignment module to autoregressively optimize the learning and processing of spatial image-level features along with textual tokens, enabling structural alignment at the image level. In addition, to prevent the degradation of visual information during the cross-model alignment process, we propose a multi-scale feature fusion (MFF) module to combine multi-scale visual features from intermediate layers, enabling hierarchical alignment at the feature level. Extensive experiments are conducted across a variety of multi-modal benchmarks. Our model shows lower latency than other Mamba-based MLLMs and is nearly four times faster than transformer-based MLLMs of similar scale during inference. Due to better cross-modal alignment, our model exhibits lower degrees of hallucination and enhanced sensitivity to visual details, which manifests in superior performance across diverse multi-modal benchmarks. Code will be provided. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05791 [pdf, other]

FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance

Authors: Ruocheng Wang, Pei Xu, Haochen Shi, Elizabeth Schumann, C. Karen Liu

Abstract: Piano playing requires agile, precise, and coordinated hand control that stretches the limits of dexterity. Hand motion models with the sophistication to accurately recreate piano playing have a wide range of applications in character animation, embodied AI, biomechanics, and VR/AR. In this paper, we construct a first-of-its-kind large-scale dataset that contains approximately 10 hours of 3D hand… ▽ More Piano playing requires agile, precise, and coordinated hand control that stretches the limits of dexterity. Hand motion models with the sophistication to accurately recreate piano playing have a wide range of applications in character animation, embodied AI, biomechanics, and VR/AR. In this paper, we construct a first-of-its-kind large-scale dataset that contains approximately 10 hours of 3D hand motion and audio from 15 elite-level pianists playing 153 pieces of classical music. To capture natural performances, we designed a markerless setup in which motions are reconstructed from multi-view videos using state-of-the-art pose estimation models. The motion data is further refined via inverse kinematics using the high-resolution MIDI key-pressing data obtained from sensors in a specialized Yamaha Disklavier piano. Leveraging the collected dataset, we developed a pipeline that can synthesize physically-plausible hand motions for musical scores outside of the dataset. Our approach employs a combination of imitation learning and reinforcement learning to obtain policies for physics-based bimanual control involving the interaction between hands and piano keys. To solve the sampling efficiency problem with the large motion dataset, we use a diffusion model to generate natural reference motions, which provide high-level trajectory and fingering (finger order and placement) information. However, the generated reference motion alone does not provide sufficient accuracy for piano performance modeling. We then further augmented the data by using musical similarity to retrieve similar motions from the captured dataset to boost the precision of the RL policy. With the proposed method, our model generates natural, dexterous motions that generalize to music from outside the training dataset. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: SIGGRAPH Asia 2024. Project page: https://for-elise.github.io/

arXiv:2410.05243 [pdf, other]

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Authors: Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, Yu Su

Abstract: Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms. However, the effectiveness of these agents hinges on the robustness of their grounding capability. Current GUI agents predominantly utilize text-based representati… ▽ More Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms. However, the effectiveness of these agents hinges on the robustness of their grounding capability. Current GUI agents predominantly utilize text-based representations such as HTML or accessibility trees, which, despite their utility, often introduce noise, incompleteness, and increased computational overhead. In this paper, we advocate a human-like embodiment for GUI agents that perceive the environment entirely visually and directly take pixel-level operations on the GUI. The key is visual grounding models that can accurately map diverse referring expressions of GUI elements to their coordinates on the GUI across different platforms. We show that a simple recipe, which includes web-based synthetic data and slight adaptation of the LLaVA architecture, is surprisingly effective for training such visual grounding models. We collect the largest dataset for GUI visual grounding so far, containing 10M GUI elements and their referring expressions over 1.3M screenshots, and use it to train UGround, a strong universal visual grounding model for GUI agents. Empirical results on six benchmarks spanning three categories (grounding, offline agent, and online agent) show that 1) UGround substantially outperforms existing visual grounding models for GUI agents, by up to 20% absolute, and 2) agents with UGround outperform state-of-the-art agents, despite the fact that existing agents use additional text-based input while ours only uses visual perception. These results provide strong support for the feasibility and promises of GUI agents that navigate the digital world as humans do. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.04764 [pdf, other]

Double Oracle Neural Architecture Search for Game Theoretic Deep Learning Models

Authors: Aye Phyu Phyu Aung, Xinrun Wang, Ruiyu Wang, Hau Chan, Bo An, Xiaoli Li, J. Senthilnath

Abstract: In this paper, we propose a new approach to train deep learning models using game theory concepts including Generative Adversarial Networks (GANs) and Adversarial Training (AT) where we deploy a double-oracle framework using best response oracles. GAN is essentially a two-player zero-sum game between the generator and the discriminator. The same concept can be applied to AT with attacker and class… ▽ More In this paper, we propose a new approach to train deep learning models using game theory concepts including Generative Adversarial Networks (GANs) and Adversarial Training (AT) where we deploy a double-oracle framework using best response oracles. GAN is essentially a two-player zero-sum game between the generator and the discriminator. The same concept can be applied to AT with attacker and classifier as players. Training these models is challenging as a pure Nash equilibrium may not exist and even finding the mixed Nash equilibrium is difficult as training algorithms for both GAN and AT have a large-scale strategy space. Extending our preliminary model DO-GAN, we propose the methods to apply the double oracle framework concept to Adversarial Neural Architecture Search (NAS for GAN) and Adversarial Training (NAS for AT) algorithms. We first generalize the players' strategies as the trained models of generator and discriminator from the best response oracles. We then compute the meta-strategies using a linear program. For scalability of the framework where multiple network models of best responses are stored in the memory, we prune the weakly-dominated players' strategies to keep the oracles from becoming intractable. Finally, we conduct experiments on MNIST, CIFAR-10 and TinyImageNet for DONAS-GAN. We also evaluate the robustness under FGSM and PGD attacks on CIFAR-10, SVHN and TinyImageNet for DONAS-AT. We show that all our variants have significant improvements in both subjective qualitative evaluation and quantitative metrics, compared with their respective base architectures. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.04734 [pdf, other]

TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Authors: Deqing Fu, Tong Xiao, Rui Wang, Wang Zhu, Pengchuan Zhang, Guan Pang, Robin Jia, Lawrence Chen

Abstract: Although reward models have been successful in improving multimodal large language models, the reward models themselves remain brutal and contain minimal information. Notably, existing reward models only mimic human annotations by assigning only one binary feedback to any text, no matter how long the text is. In the realm of multimodal language models, where models are required to process both ima… ▽ More Although reward models have been successful in improving multimodal large language models, the reward models themselves remain brutal and contain minimal information. Notably, existing reward models only mimic human annotations by assigning only one binary feedback to any text, no matter how long the text is. In the realm of multimodal language models, where models are required to process both images and texts, a naive reward model may learn implicit biases toward texts and become less grounded in images. In this paper, we propose a $\textbf{T}$oken-$\textbf{L}$evel $\textbf{D}$etective $\textbf{R}$eward Model ($\textbf{TLDR}$) to provide fine-grained annotations to each text token. We first introduce a perturbation-based method to generate synthetic hard negatives and their token-level labels to train TLDR models. Then we show the rich usefulness of TLDR models both in assisting off-the-shelf models to self-correct their generations, and in serving as a hallucination evaluation tool. Finally, we show that TLDR models can significantly speed up human annotation by 3 times to acquire a broader range of high-quality vision language data. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: Work done at Meta

arXiv:2410.04482 [pdf, other]

uDiG-DIP: Unrolled Diffusion-Guided Deep Image Prior For Medical Image Reconstruction

Authors: Shijun Liang, Ismail Alkhouri, Qing Qu, Rongrong Wang, Saiprasad Ravishankar

Abstract: Deep learning (DL) methods have been extensively applied to various image recovery problems, including magnetic resonance imaging (MRI) and computed tomography (CT) reconstruction. Beyond supervised models, other approaches have been recently explored including two key recent schemes: Deep Image Prior (DIP) that is an unsupervised scan-adaptive method that leverages the network architecture as imp… ▽ More Deep learning (DL) methods have been extensively applied to various image recovery problems, including magnetic resonance imaging (MRI) and computed tomography (CT) reconstruction. Beyond supervised models, other approaches have been recently explored including two key recent schemes: Deep Image Prior (DIP) that is an unsupervised scan-adaptive method that leverages the network architecture as implicit regularization but can suffer from noise overfitting, and diffusion models (DMs), where the sampling procedure of a pre-trained generative model is modified to allow sampling from the measurement-conditioned distribution through approximations. In this paper, we propose combining DIP and DMs for MRI and CT reconstruction, motivated by (i) the impact of the DIP network input and (ii) the use of DMs as diffusion purifiers (DPs). Specifically, we propose an unrolled procedure that iteratively optimizes the DIP network with a DM-refined adaptive input using a loss with data consistency and autoencoding terms. We term the approach unrolled Diffusion-Guided DIP (uDiG-DIP). Our experimental results demonstrate that uDiG-DIP achieves superior reconstruction results compared to leading DM-based baselines and the original DIP for MRI and CT tasks. △ Less

Submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.04479 [pdf, other]

SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems

Authors: Ismail Alkhouri, Shijun Liang, Cheng-Han Huang, Jimmy Dai, Qing Qu, Saiprasad Ravishankar, Rongrong Wang

Abstract: Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such… ▽ More Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse imaging problems (IPs), the reverse sampling steps of DMs are typically modified to approximately sample from a measurement-conditioned distribution in the image space. However, these modifications may be unsuitable for certain settings (such as in the presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier sampling steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces the standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions, either implicitly or explicitly, our sampler requires significantly fewer reverse steps. Therefore, we refer to our accelerated method as Step-wise Triple-Consistent Sampling (SITCOM). Compared to existing state-of-the-art baseline methods, under different levels of measurement noise, our extensive experiments across five linear and three non-linear image restoration tasks demonstrate that SITCOM achieves competitive or superior results in terms of standard image similarity metrics while requiring a significantly reduced run-time across all considered tasks. △ Less

Submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.04425 [pdf, other]

LHAASO detection of very-high-energy gamma-ray emission surrounding PSR J0248+6021

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with… ▽ More We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with 7.3 $σ$ and 13.5 $σ$, respectively. The best-fit position derived through WCDA data is R.A. = 42.06$^\circ \pm$ 0.12$^\circ$ and Dec. = 60.24$^\circ \pm $ 0.13$^\circ$ with an extension of 0.69$^\circ\pm$0.15$^\circ$ and that of the KM2A data is R.A.= 42.29$^\circ \pm $ 0.13$^\circ$ and Dec. = 60.38$^\circ \pm$ 0.07$^\circ$ with an extension of 0.37$^\circ\pm$0.07$^\circ$. No clear extended multiwavelength counterpart of this LHAASO source has been found from the radio band to the GeV band. The most plausible explanation of the VHE \gray emission is the inverse Compton process of highly relativistic electrons and positrons injected by the pulsar. These electrons/positrons are hypothesized to be either confined within the pulsar wind nebula or to have already escaped into the interstellar medium, forming a pulsar halo. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Comments: 12 pages, 10 figures, Accepted by Sci. China-Phys. Mech. Astron

arXiv:2410.04045 [pdf, other]

Neuron-Level Sequential Editing for Large Language Models

Authors: Houcheng Jiang, Junfeng Fang, Tianyu Zhang, An Zhang, Ruipeng Wang, Tao Liang, Xiang Wang

Abstract: This work explores sequential model editing in large language models (LLMs), a critical task that involves modifying internal knowledge within LLMs continuously through multi-round editing, each incorporating updates or corrections to adjust the model outputs without the need for costly retraining. Existing model editing methods, especially those that alter model parameters, typically focus on sin… ▽ More This work explores sequential model editing in large language models (LLMs), a critical task that involves modifying internal knowledge within LLMs continuously through multi-round editing, each incorporating updates or corrections to adjust the model outputs without the need for costly retraining. Existing model editing methods, especially those that alter model parameters, typically focus on single-round editing and often face significant challenges in sequential model editing-most notably issues of model forgetting and failure. To address these challenges, we introduce a new model editing method, namely \textbf{N}euron-level \textbf{S}equential \textbf{E}diting (NSE), tailored for supporting sequential model editing. Specifically, we optimize the target layer's hidden states using the model's original weights to prevent model failure. Furthermore, we iteratively select neurons in multiple layers for editing based on their activation values to mitigate model forgetting. Our empirical experiments demonstrate that NSE significantly outperforms current modifying parameters model editing methods, marking a substantial advancement in the field of sequential model editing. Our code is released on \url{https://github.com/jianghoucheng/NSE}. △ Less

Submitted 5 October, 2024; originally announced October 2024.

arXiv:2410.03650 [pdf, other]

Thermodynamics of Schwarzschild-AdS black hole in non-commutative geometry

Authors: Rui-Bo Wang, Shi-Jie Ma, Lei You, Jian-Bo Deng, Xian-Ru Hu

Abstract: In this paper, we study the thermodynamics of Schwarzschild-anti-de Sitter black holes within the framework of non-commutative geometry. By solving the Einstein's equations, we derive the corrected Schwarzschild-AdS black hole with Lorentzian distribution and analyze the thermodynamics. Our results confirm that if the energy-momentum tensor outside the event horizon is related to the mass of the b… ▽ More In this paper, we study the thermodynamics of Schwarzschild-anti-de Sitter black holes within the framework of non-commutative geometry. By solving the Einstein's equations, we derive the corrected Schwarzschild-AdS black hole with Lorentzian distribution and analyze the thermodynamics. Our results confirm that if the energy-momentum tensor outside the event horizon is related to the mass of the black hole, the conventional first law of thermodynamics will be violated. The study of criticality reveals that the black hole undergoes a small black hole-large black hole phase transition similar to that of the Van der Waals system, with a critical point and a critical ratio slightly smaller than that of the Van der Waals fluid. As the non-commutative parameter increases, the phase transition process shortens, leading to a critical point, and ultimately to the disappearance of the phase transition. The violation of the conventional first law results in a discontinuity of the Gibbs free energy during the phase transition, indicating the occurrence of zeroth-order phase transition. Moreover, we investigate the Joule-Thomson expansion, obtaining the minimum inversion temperature and the minimum inversion mass. △ Less

Submitted 7 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

Comments: 37pages, 11figures

arXiv:2410.03439 [pdf, other]

ToolGen: Unified Tool Retrieval and Calling via Generation

Authors: Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li

Abstract: As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowled… ▽ More As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation. Traditional methods rely on inputting tool descriptions as context, which is constrained by context length and requires separate, often inefficient, retrieval mechanisms. We introduce ToolGen, a paradigm shift that integrates tool knowledge directly into the LLM's parameters by representing each tool as a unique token. This enables the LLM to generate tool calls and arguments as part of its next token prediction capabilities, seamlessly blending tool invocation with language generation. Our framework allows the LLM to access and utilize a vast amount of tools with no additional retrieval step, significantly enhancing both performance and scalability. Experimental results with over 47,000 tools show that ToolGen not only achieves superior results in both tool retrieval and autonomous task completion but also sets the stage for a new era of AI agents that can adapt to tools across diverse domains. By fundamentally transforming tool retrieval into a generative process, ToolGen paves the way for more versatile, efficient, and autonomous AI systems. ToolGen enables end-to-end tool learning and opens opportunities for integration with other advanced techniques such as chain-of-thought and reinforcement learning, thereby expanding the practical capabilities of LLMs. △ Less

Submitted 8 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

ACM Class: I.2.7

arXiv:2410.03274 [pdf, other]

Performance assessment of the HERD calorimeter with a photo-diode read-out system for high-energy electron beams

Authors: O. Adriani, G. Ambrosi, M. Antonelli, Y. Bai, X. Bai, T. Bao, M. Barbanera, E. Berti, P. Betti, G. Bigongiari, M. Bongi, V. Bonvicini, S. Bottai, I. Cagnoli, W. Cao, J. Casaus, D. Cerasole, Z. Chen, X. Cui, R. D'Alessandro, L. Di Venere, C. Diaz, Y. Dong, S. Detti, M. Duranti , et al. (41 additional authors not shown)

Abstract: The measurement of cosmic rays at energies exceeding 100 TeV per nucleon is crucial for enhancing the understanding of high-energy particle propagation and acceleration models in the Galaxy. HERD is a space-borne calorimetric experiment that aims to extend the current direct measurements of cosmic rays to unexplored energies. The payload is scheduled to be installed on the Chinese Space Station in… ▽ More The measurement of cosmic rays at energies exceeding 100 TeV per nucleon is crucial for enhancing the understanding of high-energy particle propagation and acceleration models in the Galaxy. HERD is a space-borne calorimetric experiment that aims to extend the current direct measurements of cosmic rays to unexplored energies. The payload is scheduled to be installed on the Chinese Space Station in 2027. The primary peculiarity of the instrument is its capability to measure particles coming from all directions, with the main detector being a deep, homogeneous, 3D calorimeter. The active elements are read out using two independent systems: one based on wavelength shifter fibers coupled to CMOS cameras, and the other based on photo-diodes read-out with custom front-end electronics. A large calorimeter prototype was tested in 2023 during an extensive beam test campaign at CERN. In this paper, the performance of the calorimeter for high-energy electron beams, as obtained from the photo-diode system data, is presented. The prototype demonstrated excellent performance, e.g., an energy resolution better than 1% for electrons at 250 GeV. A comparison between beam test data and Monte Carlo simulation data is also presented. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Showing 1–50 of 3,969 results for author: Wang, R