subscribe to arXiv mailings

Cryogenic Control and Readout Integrated Circuits for Solid-State Quantum Computing

Authors: Lingxiao Lei, Heng Huang, Pingxing Chen, Mingtang Deng

Abstract: In the pursuit of quantum computing, solid-state quantum systems, particularly superconducting ones, have made remarkable advancements over the past two decades. However, achieving fault-tolerant quantum computing for next-generation applications necessitates the integration of several million qubits, which presents significant challenges in terms of interconnection complexity and latency that are… ▽ More In the pursuit of quantum computing, solid-state quantum systems, particularly superconducting ones, have made remarkable advancements over the past two decades. However, achieving fault-tolerant quantum computing for next-generation applications necessitates the integration of several million qubits, which presents significant challenges in terms of interconnection complexity and latency that are currently unsolvable with state-of-the-art room-temperature control and readout electronics. Recently, cryogenic integrated circuits (ICs), including CMOS radio-frequency ICs and rapid-single-flux-quantum-logic ICs, have emerged as potential alternatives to room-temperature electronics. Unlike their room-temperature counterparts, these ICs are deployed within cryostats to enhance scalability by reducing the number and length of transmission lines. Additionally, operating at cryogenic temperatures can suppress electronic noise and improve qubit control fidelity. However, for CMOS ICs specifically, circuit design uncertainties arise due to a lack of reliable models for cryogenic field effect transistors as well as issues related to severe fickle noises and power dissipation at cryogenic temperatures. This paper provides a comprehensive review of recent research on both types of cryogenic control and readout ICs but primarily focuses on the more mature CMOS technology. The discussion encompasses principles underlying control and readout techniques employed in cryogenic CMOS ICs along with their architectural designs; characterization and modeling approaches for field effect transistors under cryogenic conditions; as well as fundamental concepts pertaining to rapid single flux quantum circuits. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.15257 [pdf, other]

Learning-Augmented Algorithms for the Bahncard Problem

Authors: Hailiang Zhao, Xueyan Tang, Peng Chen, Shuiguang Deng

Abstract: In this paper, we study learning-augmented algorithms for the Bahncard problem. The Bahncard problem is a generalization of the ski-rental problem, where a traveler needs to irrevocably and repeatedly decide between a cheap short-term solution and an expensive long-term one with an unknown future. Even though the problem is canonical, only a primal-dual-based learning-augmented algorithm was expli… ▽ More In this paper, we study learning-augmented algorithms for the Bahncard problem. The Bahncard problem is a generalization of the ski-rental problem, where a traveler needs to irrevocably and repeatedly decide between a cheap short-term solution and an expensive long-term one with an unknown future. Even though the problem is canonical, only a primal-dual-based learning-augmented algorithm was explicitly designed for it. We develop a new learning-augmented algorithm, named PFSUM, that incorporates both history and short-term future to improve online decision making. We derive the competitive ratio of PFSUM as a function of the prediction error and conduct extensive experiments to show that PFSUM outperforms the primal-dual-based algorithm. △ Less

Submitted 19 October, 2024; originally announced October 2024.

Comments: This paper has been accepted by the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

arXiv:2410.14195 [pdf, other]

Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis

Authors: Honglin Li, Yunlong Zhang, Pingyi Chen, Zhongyi Shui, Chenglu Zhu, Lin Yang

Abstract: Histopathology Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors. To develop computer-aided diagnosis model for WSIs, previous methods typically employ Multi-Instance Learning to enable slide-level prediction given only slide-level labels. Among these models, vanilla attention mechanisms without pairwise interactions have tr… ▽ More Histopathology Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors. To develop computer-aided diagnosis model for WSIs, previous methods typically employ Multi-Instance Learning to enable slide-level prediction given only slide-level labels. Among these models, vanilla attention mechanisms without pairwise interactions have traditionally been employed but are unable to model contextual information. More recently, self-attention models have been utilized to address this issue. To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs approximation of full self-attention. Both approaches suffer from suboptimal performance due to the loss of key information. Moreover, their use of absolute positional embedding struggles to effectively handle long contextual dependencies in shape-varying WSIs. In this paper, we first analyze how the low-rank nature of the long-sequence attention matrix constrains the representation ability of WSI modelling. Then, we demonstrate that the rank of attention matrix can be improved by focusing on local interactions via a local attention mask. Our analysis shows that the local mask aligns with the attention patterns in the lower layers of the Transformer. Furthermore, the local attention mask can be implemented during chunked attention calculation, reducing the quadratic computational complexity to linear with a small local bandwidth. Building on this, we propose a local-global hybrid Transformer for both computational acceleration and local-global information interactions modelling. Our method, Long-contextual MIL (LongMIL), is evaluated through extensive experiments on various WSI tasks to validate its superiority. Our code will be available at github.com/invoker-LL/Long-MIL. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Comments: NeurIPS-2024. arXiv admin note: text overlap with arXiv:2311.12885

arXiv:2410.14182 [pdf, other]

LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs

Authors: Yujun Zhou, Jingdong Yang, Kehan Guo, Pin-Yu Chen, Tian Gao, Werner Geyer, Nuno Moniz, Nitesh V Chawla, Xiangliang Zhang

Abstract: Laboratory accidents pose significant risks to human life and property, underscoring the importance of robust safety protocols. Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices. With the increasing reliance on large language models (LLMs) for guidance in various fields, including laboratory settings, there is a growing concern about the… ▽ More Laboratory accidents pose significant risks to human life and property, underscoring the importance of robust safety protocols. Despite advancements in safety training, laboratory personnel may still unknowingly engage in unsafe practices. With the increasing reliance on large language models (LLMs) for guidance in various fields, including laboratory settings, there is a growing concern about their reliability in critical safety-related decision-making. Unlike trained human researchers, LLMs lack formal lab safety education, raising questions about their ability to provide safe and accurate guidance. Existing research on LLM trustworthiness primarily focuses on issues such as ethical compliance, truthfulness, and fairness but fails to fully cover safety-critical real-world applications, like lab safety. To address this gap, we propose the Laboratory Safety Benchmark (LabSafety Bench), a comprehensive evaluation framework based on a new taxonomy aligned with Occupational Safety and Health Administration (OSHA) protocols. This benchmark includes 765 multiple-choice questions verified by human experts, assessing LLMs and vision language models (VLMs) performance in lab safety contexts. Our evaluations demonstrate that while GPT-4o outperforms human participants, it is still prone to critical errors, highlighting the risks of relying on LLMs in safety-critical environments. Our findings emphasize the need for specialized benchmarks to accurately assess the trustworthiness of LLMs in real-world safety applications. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Comments: 50 pages, 19 figures

arXiv:2410.13907 [pdf, other]

NSmark: Null Space Based Black-box Watermarking Defense Framework for Pre-trained Language Models

Authors: Haodong Zhao, Jinming Hu, Peixuan Li, Fangqi Li, Jinrui Sha, Peixuan Chen, Zhuosheng Zhang, Gongshen Liu

Abstract: Pre-trained language models (PLMs) have emerged as critical intellectual property (IP) assets that necessitate protection. Although various watermarking strategies have been proposed, they remain vulnerable to Linear Functionality Equivalence Attacks (LFEA), which can invalidate most existing white-box watermarks without prior knowledge of the watermarking scheme or training data. This paper furth… ▽ More Pre-trained language models (PLMs) have emerged as critical intellectual property (IP) assets that necessitate protection. Although various watermarking strategies have been proposed, they remain vulnerable to Linear Functionality Equivalence Attacks (LFEA), which can invalidate most existing white-box watermarks without prior knowledge of the watermarking scheme or training data. This paper further analyzes and extends the attack scenarios of LFEA to the commonly employed black-box settings for PLMs by considering Last-Layer outputs (dubbed LL-LFEA). We discover that the null space of the output matrix remains invariant against LL-LFEA attacks. Based on this finding, we propose NSmark, a task-agnostic, black-box watermarking scheme capable of resisting LL-LFEA attacks. NSmark consists of three phases: (i) watermark generation using the digital signature of the owner, enhanced by spread spectrum modulation for increased robustness; (ii) watermark embedding through an output mapping extractor that preserves PLM performance while maximizing watermark capacity; (iii) watermark verification, assessed by extraction rate and null space conformity. Extensive experiments on both pre-training and downstream tasks confirm the effectiveness, reliability, fidelity, and robustness of our approach. Code is available at https://github.com/dongdongzhaoUP/NSmark. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.13178 [pdf, other]

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Authors: Ziwei Yang, Zheng Chen, Xin Liu, Rikuto Kotoge, Peng Chen, Yasuko Matsubara, Yasushi Sakurai, Jimeng Sun

Abstract: Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubN… ▽ More Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes. Graphs generated by such representations can be considered subtype-specific networks. GeSubNet is a multi-step representation learning framework with three modules: First, a deep generative model learns distinct disease subtypes from patient gene expression profiles. Second, a graph neural network captures representations of prior gene networks from knowledge databases, ensuring accurate physical gene interactions. Finally, we integrate these two representations using an inference loss that leverages graph generation capabilities, conditioned on the patient separation loss, to refine subtype-specific information in the learned representation. GeSubNet consistently outperforms traditional methods, with average improvements of 30.6%, 21.0%, 20.1%, and 56.6% across four graph evaluation metrics, averaged over four cancer datasets. Particularly, we conduct a biological simulation experiment to assess how the behavior of selected genes from over 11,000 candidates affects subtypes or patient distributions. The results show that the generated network has the potential to identify subtype-specific genes with an 83% likelihood of impacting patient distribution shifts. The GeSubNet resource is available: https://anonymous.4open.science/r/GeSubNet/ △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: Under review as a conference paper at ICLR 2025

arXiv:2410.12655 [pdf, other]

Position Specific Scoring Is All You Need? Revisiting Protein Sequence Classification Tasks

Authors: Sarwan Ali, Taslim Murad, Prakash Chourasia, Haris Mansoor, Imdad Ullah Khan, Pin-Yu Chen, Murray Patterson

Abstract: Understanding the structural and functional characteristics of proteins are crucial for developing preventative and curative strategies that impact fields from drug discovery to policy development. An important and popular technique for examining how amino acids make up these characteristics of the protein sequences with position-specific scoring (PSS). While the string kernel is crucial in natura… ▽ More Understanding the structural and functional characteristics of proteins are crucial for developing preventative and curative strategies that impact fields from drug discovery to policy development. An important and popular technique for examining how amino acids make up these characteristics of the protein sequences with position-specific scoring (PSS). While the string kernel is crucial in natural language processing (NLP), it is unclear if string kernels can extract biologically meaningful information from protein sequences, despite the fact that they have been shown to be effective in the general sequence analysis tasks. In this work, we propose a weighted PSS kernel matrix (or W-PSSKM), that combines a PSS representation of protein sequences, which encodes the frequency information of each amino acid in a sequence, with the notion of the string kernel. This results in a novel kernel function that outperforms many other approaches for protein sequence classification. We perform extensive experimentation to evaluate the proposed method. Our findings demonstrate that the W-PSSKM significantly outperforms existing baselines and state-of-the-art methods and achieves up to 45.1\% improvement in classification accuracy. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.12056 [pdf]

Utilizing Spatiotemporal Data Analytics to Pinpoint Outage Location

Authors: Reddy Mandati, Po-Chen Chen, Vladyslav Anderson, Bishwa Sapkota, Michael Jarrell Warren, Bobby Besharati, Ankush Agarwal, Samuel Johnston III

Abstract: Understanding the exact fault location in the post-event analysis is the key to improving the accuracy of outage management. Unfortunately, the fault location is not generally well documented during the restoration process, creating a big challenge for post-event analysis. By utilizing various data source systems, including outage management system (OMS) data, asset geospatial information system (… ▽ More Understanding the exact fault location in the post-event analysis is the key to improving the accuracy of outage management. Unfortunately, the fault location is not generally well documented during the restoration process, creating a big challenge for post-event analysis. By utilizing various data source systems, including outage management system (OMS) data, asset geospatial information system (GIS) data, and vehicle location data, this paper creates a novel method to pinpoint the outage location accurately to create additional insights for distribution operations and performance teams during the post-event analysis. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.11967 [pdf]

Integrating Artificial Intelligence Models and Synthetic Image Data for Enhanced Asset Inspection and Defect Identification

Authors: Reddy Mandati, Vladyslav Anderson, Po-chen Chen, Ankush Agarwal, Tatjana Dokic, David Barnard, Michael Finn, Jesse Cromer, Andrew Mccauley, Clay Tutaj, Neha Dave, Bobby Besharati, Jamie Barnett, Timothy Krall

Abstract: In the past utilities relied on in-field inspections to identify asset defects. Recently, utilities have started using drone-based inspections to enhance the field-inspection process. We consider a vast repository of drone images, providing a wealth of information about asset health and potential issues. However, making the collected imagery data useful for automated defect detection requires sign… ▽ More In the past utilities relied on in-field inspections to identify asset defects. Recently, utilities have started using drone-based inspections to enhance the field-inspection process. We consider a vast repository of drone images, providing a wealth of information about asset health and potential issues. However, making the collected imagery data useful for automated defect detection requires significant manual labeling effort. We propose a novel solution that combines synthetic asset defect images with manually labeled drone images. This solution has several benefits: improves performance of defect detection, reduces the number of hours spent on manual labeling, and enables the capability to generate realistic images of rare defects where not enough real-world data is available. We employ a workflow that combines 3D modeling tools such as Maya and Unreal Engine to create photorealistic 3D models and 2D renderings of defective assets and their surroundings. These synthetic images are then integrated into our training pipeline augmenting the real data. This study implements an end-to-end Artificial Intelligence solution to detect assets and asset defects from the combined imagery repository. The unique contribution of this research lies in the application of advanced computer vision models and the generation of photorealistic 3D renderings of defective assets, aiming to transform the asset inspection process. Our asset detection model has achieved an accuracy of 92 percent, we achieved a performance lift of 67 percent when introducing approximately 2,000 synthetic images of 2k resolution. In our tests, the defect detection model achieved an accuracy of 73 percent across two batches of images. Our analysis demonstrated that synthetic data can be successfully used in place of real-world manually labeled data to train defect detection model. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.11802 [pdf, other]

FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting

Authors: Zhe Li, Xiangfei Qiu, Peng Chen, Yihang Wang, Hanyin Cheng, Yang Shu, Jilin Hu, Chenjuan Guo, Aoying Zhou, Qingsong Wen, Christian S. Jensen, Bin Yang

Abstract: Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale languag… ▽ More Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management. While TSF methods are emerging these days, many of them require domain-specific data collection and model training and struggle with poor generalization performance on new domains. Foundation models aim to overcome this limitation. Pre-trained on large-scale language or time series data, they exhibit promising inferencing capabilities in new or unseen data. This has spurred a surge in new TSF foundation models. We propose a new benchmark, FoundTS, to enable thorough and fair evaluation and comparison of such models. FoundTS covers a variety of TSF foundation models, including those based on large language models and those pretrained on time series. Next, FoundTS supports different forecasting strategies, including zero-shot, few-shot, and full-shot, thereby facilitating more thorough evaluations. Finally, FoundTS offers a pipeline that standardizes evaluation processes such as dataset splitting, loading, normalization, and few-shot sampling, thereby facilitating fair evaluations. Building on this, we report on an extensive evaluation of TSF foundation models on a broad range of datasets from diverse domains and with different statistical characteristics. Specifically, we identify pros and cons and inherent limitations of existing foundation models, and we identify directions for future model design. We make our code and datasets available at https://anonymous.4open.science/r/FoundTS-C2B0. △ Less

Submitted 21 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.11290 [pdf, other]

Backdoor Attack on Vertical Federated Graph Neural Network Learning

Authors: Jirui Yang, Peng Chen, Zhihui Lu, Ruijun Deng, Qiang Duan, Jianping Zeng

Abstract: Federated Graph Neural Network (FedGNN) is a privacy-preserving machine learning technology that combines federated learning (FL) and graph neural networks (GNNs). It offers a privacy-preserving solution for training GNNs using isolated graph data. Vertical Federated Graph Neural Network (VFGNN) is an important branch of FedGNN, where data features and labels are distributed among participants, an… ▽ More Federated Graph Neural Network (FedGNN) is a privacy-preserving machine learning technology that combines federated learning (FL) and graph neural networks (GNNs). It offers a privacy-preserving solution for training GNNs using isolated graph data. Vertical Federated Graph Neural Network (VFGNN) is an important branch of FedGNN, where data features and labels are distributed among participants, and each participant has the same sample space. Due to the difficulty of accessing and modifying distributed data and labels, the vulnerability of VFGNN to backdoor attacks remains largely unexplored. In this context, we propose BVG, the first method for backdoor attacks in VFGNN. Without accessing or modifying labels, BVG uses multi-hop triggers and requires only four target class nodes for an effective backdoor attack. Experiments show that BVG achieves high attack success rates (ASR) across three datasets and three different GNN models, with minimal impact on main task accuracy (MTA). We also evaluate several defense methods, further validating the robustness and effectiveness of BVG. This finding also highlights the need for advanced defense mechanisms to counter sophisticated backdoor attacks in practical VFGNN applications. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.10280 [pdf, other]

Dual-Mode Calorimetric Superconducting Nanowire Single Photon Detectors

Authors: Hsin-Yeh Wu, Marc Besançon, Jia-Wern Chen, Pisin Chen, Jean-François Glicenstein, Shu-Xiao Liu, Yu-Jung Lu, Xavier-François Navick, Stathes Paganis, Boris Tuchming, Dimitra Tsionou, Feng-Yang Tsai

Abstract: A dual-operation mode SNSPD is demonstrated. In the conventional Geiger SNSPD mode the sensor operates at temperatures well below the critical temperature, Tc, working as an event counter without sensitivity to the number of photons impinging the sensor. In the calorimetric mode, the detector is operated at temperatures just below Tc and displays photon-number sensitivity for wavelengths in the op… ▽ More A dual-operation mode SNSPD is demonstrated. In the conventional Geiger SNSPD mode the sensor operates at temperatures well below the critical temperature, Tc, working as an event counter without sensitivity to the number of photons impinging the sensor. In the calorimetric mode, the detector is operated at temperatures just below Tc and displays photon-number sensitivity for wavelengths in the optical spectrum. In this energy sensitive mode, photon absorption causes Joule heating of the SNSPD that becomes partially resistive without the presence of latching. Depending on the application, by tuning the sample temperature and bias current using the same readout system, the SNSPD can readily switch between the two modes. In the calorimetric mode, SNSPD recovery times shorter than the ones in the Geiger mode are observed, reaching values as low as 580ps. Dual-mode SNSPD's may provide significant advancements in spectroscopy and calorimetry, where precise timing, photon counting and energy resolution are required. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: Manuscript prepared for APL

arXiv:2410.07471 [pdf, other]

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Authors: Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen

Abstract: Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety… ▽ More Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github https://github.com/hanshen95/SEAL. △ Less

Submitted 10 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.06755 [pdf, other]

Magnetic field dependence of $V_B^-$ Defects in hexagonal boron nitride

Authors: Mulin Zheng, Shizhuo Ale, Peiqin Chen, Jingpu Tu, Qiang Zhou, Haizhi Song, You Wang, Junfeng Wang, Guangcan Guo, Guangwei Deng

Abstract: The interface with spin defects in hexagonal boron nitride has recently become a promising platform and has shown great potential in a wide range of quantum technologies. Varieties of spin properties of $V_B^-$ defects in hexagonal boron nitride (hBN) have been researched widely and deeply, like their structure and coherent control. However, little is known about the influence of off-axis magnetic… ▽ More The interface with spin defects in hexagonal boron nitride has recently become a promising platform and has shown great potential in a wide range of quantum technologies. Varieties of spin properties of $V_B^-$ defects in hexagonal boron nitride (hBN) have been researched widely and deeply, like their structure and coherent control. However, little is known about the influence of off-axis magnetic fields on the coherence properties of $V_B^-$ defects in hBN. Here, by using the optically detected magnetic resonance (ODMR) spectroscopy, we systematically investigated the variations in ODMR resonance frequencies under different transverse and longitudinal external magnetic field, respectively. In addition, we measured the ODMR spectra under off-axis magnetic fields of constant strength but various angles, and observed that the splitting of the resonance frequencies decreases as the angle increases, aligning with our theoretical calculation based on the Hamiltonian, from which we come up with a solution of detecting the off-axis magnetic field angle. Through Rabi oscillation measurements, we found that the off-axis magnetic field suppresses the spin coherence time. These results are crucial for optimizing $V_B^-$ defects in hBN, establishing their significance as robust quantum sensors for quantum information processing and magnetic sensing in varied environments. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 5pages, 4 figures

arXiv:2410.05255 [pdf, other]

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment

Authors: Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo

Abstract: Reinforcement learning from human feedback (RLHF) methods are emerging as a way to fine-tune diffusion models (DMs) for visual generation. However, commonly used on-policy strategies are limited by the generalization capability of the reward model, while off-policy approaches require large amounts of difficult-to-obtain paired human-annotated data, particularly in visual generation tasks. To addre… ▽ More Reinforcement learning from human feedback (RLHF) methods are emerging as a way to fine-tune diffusion models (DMs) for visual generation. However, commonly used on-policy strategies are limited by the generalization capability of the reward model, while off-policy approaches require large amounts of difficult-to-obtain paired human-annotated data, particularly in visual generation tasks. To address the limitations of both on- and off-policy RLHF, we propose a preference optimization method that aligns DMs with preferences without relying on reward models or paired human-annotated data. Specifically, we introduce a Semi-Policy Preference Optimization (SePPO) method. SePPO leverages previous checkpoints as reference models while using them to generate on-policy reference samples, which replace "losing images" in preference pairs. This approach allows us to optimize using only off-policy "winning images." Furthermore, we design a strategy for reference model selection that expands the exploration in the policy space. Notably, we do not simply treat reference samples as negative examples for learning. Instead, we design an anchor-based criterion to assess whether the reference samples are likely to be winning or losing images, allowing the model to selectively learn from the generated reference samples. This approach mitigates performance degradation caused by the uncertainty in reference sample quality. We validate SePPO across both text-to-image and text-to-video benchmarks. SePPO surpasses all previous approaches on the text-to-image benchmarks and also demonstrates outstanding performance on the text-to-video benchmarks. Code will be released in https://github.com/DwanZhang-AI/SePPO. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.05137 [pdf, other]

Field-angle evolution of the superconducting and magnetic phases of UTe$_2$ around the $b$ axis

Authors: Sylvia K. Lewin, Josephine J. Yu, Corey E. Frank, David Graf, Patrick Chen, Sheng Ran, Yun Suk Eo, Johnpierre Paglione, S. Raghu, Nicholas P. Butch

Abstract: We experimentally determine the bounds of the magnetic-field-induced superconducting and magnetic phases near the crystalline $b$ axis of uranium ditelluride (UTe$_2$). By measuring the magnetoresistance as a function of rotation angle and field strength in magnetic fields as large as 41.5 T, we have studied these boundaries in three dimensions of magnetic field direction. The phase boundaries in… ▽ More We experimentally determine the bounds of the magnetic-field-induced superconducting and magnetic phases near the crystalline $b$ axis of uranium ditelluride (UTe$_2$). By measuring the magnetoresistance as a function of rotation angle and field strength in magnetic fields as large as 41.5 T, we have studied these boundaries in three dimensions of magnetic field direction. The phase boundaries in all cases obey crystallographic symmetries and no additional symmetries, evidence against any symmetry-breaking quadrupolar or higher magnetic order. We find that the upper critical field of the zero-field superconducting state is well-described by an anisotropic mass model. In contrast, the angular boundaries of the $b$-axis-oriented field-reentrant superconducting phase are nearly constant as a function of field up to the metamagnetic transition, with anisotropy between the $ab$ and $bc$ planes that is comparable to the angular anisotropy of the metamagnetic transition itself. We discuss the relationship between the observed superconducting boundaries and the underlying $\mathbf{d}$ vector that represents the spin-triplet order parameter. Additionally, we report an unexplained normal-state feature in resistance and track its evolution as a function of field strength and angle. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: 17 pages, 16 figures

arXiv:2410.05111 [pdf, other]

LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting

Authors: Qifeng Chen, Sheng Yang, Sicong Du, Tao Tang, Peng Chen, Yuchi Huo

Abstract: LiDAR simulation plays a crucial role in closed-loop simulation for autonomous driving. Although recent advancements, such as the use of reconstructed mesh and Neural Radiance Fields (NeRF), have made progress in simulating the physical properties of LiDAR, these methods have struggled to achieve satisfactory frame rates and rendering quality. To address these limitations, we present LiDAR-GS, the… ▽ More LiDAR simulation plays a crucial role in closed-loop simulation for autonomous driving. Although recent advancements, such as the use of reconstructed mesh and Neural Radiance Fields (NeRF), have made progress in simulating the physical properties of LiDAR, these methods have struggled to achieve satisfactory frame rates and rendering quality. To address these limitations, we present LiDAR-GS, the first LiDAR Gaussian Splatting method, for real-time high-fidelity re-simulation of LiDAR sensor scans in public urban road scenes. The vanilla Gaussian Splatting, designed for camera models, cannot be directly applied to LiDAR re-simulation. To bridge the gap between passive camera and active LiDAR, our LiDAR-GS designs a differentiable laser beam splatting, grounded in the LiDAR range view model. This innovation allows for precise surface splatting by projecting lasers onto micro cross-sections, effectively eliminating artifacts associated with local affine approximations. Additionally, LiDAR-GS leverages Neural Gaussian Fields, which further integrate view-dependent clues, to represent key LiDAR properties that are influenced by the incident angle and external factors. Combining these practices with some essential adaptations, e.g., dynamic instances decomposition, our approach succeeds in simultaneously re-simulating depth, intensity, and ray-drop channels, achieving state-of-the-art results in both rendering frame rate and quality on publically available large scene datasets. Our source code will be made publicly available. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.04324 [pdf, other]

SONAR: A Synthetic AI-Audio Detection Framework and Benchmark

Authors: Xiang Li, Pin-Yu Chen, Wenqi Wei

Abstract: Recent advances in Text-to-Speech (TTS) and Voice-Conversion (VC) using generative Artificial Intelligence (AI) technology have made it possible to generate high-quality and realistic human-like audio. This introduces significant challenges to distinguishing AI-synthesized speech from the authentic human voice and could raise potential issues of misuse for malicious purposes such as impersonation… ▽ More Recent advances in Text-to-Speech (TTS) and Voice-Conversion (VC) using generative Artificial Intelligence (AI) technology have made it possible to generate high-quality and realistic human-like audio. This introduces significant challenges to distinguishing AI-synthesized speech from the authentic human voice and could raise potential issues of misuse for malicious purposes such as impersonation and fraud, spreading misinformation, deepfakes, and scams. However, existing detection techniques for AI-synthesized audio have not kept pace and often exhibit poor generalization across diverse datasets. In this paper, we introduce SONAR, a synthetic AI-Audio Detection Framework and Benchmark, aiming to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. SONAR includes a novel evaluation dataset sourced from 9 diverse audio synthesis platforms, including leading TTS providers and state-of-the-art TTS models. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems. Through extensive experiments, we reveal the generalization limitations of existing detection methods and demonstrate that foundation models exhibit stronger generalization capabilities, which can be attributed to their model size and the scale and quality of pretraining data. Additionally, we explore the effectiveness and efficiency of few-shot fine-tuning in improving generalization, highlighting its potential for tailored applications, such as personalized detection systems for specific entities or individuals. Code and dataset are available at https://github.com/Jessegator/SONAR. △ Less

Submitted 10 October, 2024; v1 submitted 5 October, 2024; originally announced October 2024.

arXiv:2410.04041 [pdf, other]

Hybrid NeRF-Stereo Vision: Pioneering Depth Estimation and 3D Reconstruction in Endoscopy

Authors: Pengcheng Chen, Wenhao Li, Nicole Gunderson, Jeremy Ruthberg, Randall Bly, Waleed M. Abuzeid, Zhenglong Sun, Eric J. Seibel

Abstract: The 3D reconstruction of the surgical field in minimally invasive endoscopic surgery has posed a formidable challenge when using conventional monocular endoscopes. Existing 3D reconstruction methodologies are frequently encumbered by suboptimal accuracy and limited generalization capabilities. In this study, we introduce an innovative pipeline using Neural Radiance Fields (NeRF) for 3D reconstruct… ▽ More The 3D reconstruction of the surgical field in minimally invasive endoscopic surgery has posed a formidable challenge when using conventional monocular endoscopes. Existing 3D reconstruction methodologies are frequently encumbered by suboptimal accuracy and limited generalization capabilities. In this study, we introduce an innovative pipeline using Neural Radiance Fields (NeRF) for 3D reconstruction. Our approach utilizes a preliminary NeRF reconstruction that yields a coarse model, then creates a binocular scene within the reconstructed environment, which derives an initial depth map via stereo vision. This initial depth map serves as depth supervision for subsequent NeRF iterations, progressively refining the 3D reconstruction with enhanced accuracy. The binocular depth is iteratively recalculated, with the refinement process continuing until the depth map converges, and exhibits negligible variations. Through this recursive process, high-fidelity depth maps are generated from monocular endoscopic video of a realistic cranial phantom. By repeated measures of the final 3D reconstruction compared to X-ray computed tomography, all differences of relevant clinical distances result in sub-millimeter accuracy. △ Less

Submitted 10 October, 2024; v1 submitted 5 October, 2024; originally announced October 2024.

arXiv:2410.03920 [pdf, other]

Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction

Authors: Peter Yichen Chen, Chao Liu, Pingchuan Ma, John Eastman, Daniela Rus, Dylan Randle, Yuri Ivanov, Wojciech Matusik

Abstract: Differentiable simulation has become a powerful tool for system identification. While prior work has focused on identifying robot properties using robot-specific data or object properties using object-specific data, our approach calibrates object properties by using information from the robot, without relying on data from the object itself. Specifically, we utilize robot joint encoder information,… ▽ More Differentiable simulation has become a powerful tool for system identification. While prior work has focused on identifying robot properties using robot-specific data or object properties using object-specific data, our approach calibrates object properties by using information from the robot, without relying on data from the object itself. Specifically, we utilize robot joint encoder information, which is commonly available in standard robotic systems. Our key observation is that by analyzing the robot's reactions to manipulated objects, we can infer properties of those objects, such as inertia and softness. Leveraging this insight, we develop differentiable simulations of robot-object interactions to inversely identify the properties of the manipulated objects. Our approach relies solely on proprioception -- the robot's internal sensing capabilities -- and does not require external measurement tools or vision-based tracking systems. This general method is applicable to any articulated robot and requires only joint position information. We demonstrate the effectiveness of our method on a low-cost robotic platform, achieving accurate mass and elastic modulus estimations of manipulated objects with just a few seconds of computation on a laptop. △ Less

Submitted 4 October, 2024; originally announced October 2024.

arXiv:2410.03818 [pdf, other]

Large Language Models can be Strong Self-Detoxifiers

Authors: Ching-Yun Ko, Pin-Yu Chen, Payel Das, Youssef Mroueh, Soham Dan, Georgios Kollias, Subhajit Chaudhury, Tejaswini Pedapati, Luca Daniel

Abstract: Reducing the likelihood of generating harmful and toxic output is an essential task when aligning large language models (LLMs). Existing methods mainly rely on training an external reward model (i.e., another language model) or fine-tuning the LLM using self-generated data to influence the outcome. In this paper, we show that LLMs have the capability of self-detoxification without the use of an ad… ▽ More Reducing the likelihood of generating harmful and toxic output is an essential task when aligning large language models (LLMs). Existing methods mainly rely on training an external reward model (i.e., another language model) or fine-tuning the LLM using self-generated data to influence the outcome. In this paper, we show that LLMs have the capability of self-detoxification without the use of an additional reward model or re-training. We propose \textit{Self-disciplined Autoregressive Sampling (SASA)}, a lightweight controlled decoding algorithm for toxicity reduction of LLMs. SASA leverages the contextual representations from an LLM to learn linear subspaces characterizing toxic v.s. non-toxic output in analytical forms. When auto-completing a response token-by-token, SASA dynamically tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy. Evaluated on LLMs of different scale and nature, namely Llama-3.1-Instruct (8B), Llama-2 (7B), and GPT2-L models with the RealToxicityPrompts, BOLD, and AttaQ benchmarks, SASA markedly enhances the quality of the generated sentences relative to the original models and attains comparable performance to state-of-the-art detoxification techniques, significantly reducing the toxicity level by only using the LLM's internal representations. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: 20 pages

arXiv:2410.03312 [pdf, other]

Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models

Authors: Pavel Stepachev, Pinzhen Chen, Barry Haddow

Abstract: Large language models (LLMs) have started to play a vital role in modelling speech and text. To explore the best use of context and multiple systems' outputs for post-ASR speech emotion prediction, we study LLM prompting on a recent task named GenSEC. Our techniques include ASR transcript ranking, variable conversation context, and system output fusion. We show that the conversation context has di… ▽ More Large language models (LLMs) have started to play a vital role in modelling speech and text. To explore the best use of context and multiple systems' outputs for post-ASR speech emotion prediction, we study LLM prompting on a recent task named GenSEC. Our techniques include ASR transcript ranking, variable conversation context, and system output fusion. We show that the conversation context has diminishing returns and the metric used to select the transcript for prediction is crucial. Finally, our best submission surpasses the provided baseline by 20% in absolute accuracy. △ Less

Submitted 4 October, 2024; originally announced October 2024.

arXiv:2410.03128 [pdf, other]

Spontaneously formed phonon frequency combs in van der Waals solid CrXTe$_3$ (X=Ge,Si)

Authors: Lebing Chen, Gaihua Ye, Cynthia Nnokwe, Xing-Chen Pan, Katsumi Tanigaki, Guanghui Cheng, Yong P. Chen, Jiaqiang Yan, David G. Mandrus, Andres E. Llacsahuanga Allcca, Nathan Giles-Donovan, Robert J. Birgeneau, Rui He

Abstract: Optical phonon engineering through nonlinear effects has been utilized in ultrafast control of material properties. However, nonlinear optical phonons typically exhibit rapid decay due to strong mode-mode couplings, limiting their effectiveness in temperature or frequency sensitive applications. In this study, we report the observation of long-lived nonlinear optical phonons through the spontaneou… ▽ More Optical phonon engineering through nonlinear effects has been utilized in ultrafast control of material properties. However, nonlinear optical phonons typically exhibit rapid decay due to strong mode-mode couplings, limiting their effectiveness in temperature or frequency sensitive applications. In this study, we report the observation of long-lived nonlinear optical phonons through the spontaneous formation of phonon frequency combs in the van der Waals material CrXTe$_3$ (X=Ge, Si) using high-resolution Raman scattering. Unlike conventional optical phonons, the highest $A_g$ mode in CrGeTe$_3$ splits into equidistant, sharp peaks forming a frequency comb that persists for hundreds of oscillations and survives up to 100K before decaying. These modes correspond to localized oscillations of Ge$_2$Te$_6$ clusters, isolated from Cr hexagons, behaving as independent quantum oscillators. Introducing a cubic nonlinear term to the harmonic oscillator model, we simulate the phonon time evolution and successfully replicate the observed comb structure. Similar frequency comb behavior is observed in CrSiTe$_3$, demonstrating the generalizability of this phenomenon. Our findings reveal that Raman scattering effectively probes high-frequency nonlinear phonon modes, providing new insight into generating long-lived, tunable phonon frequency combs with applications in ultrafast material control and phonon-based technologies. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: 22 pages, 10 figures

arXiv:2410.02736 [pdf, other]

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Authors: Jiayi Ye, Yanbo Wang, Yue Huang, Dongping Chen, Qihui Zhang, Nuno Moniz, Tian Gao, Werner Geyer, Chao Huang, Pin-Yu Chen, Nitesh V Chawla, Xiangliang Zhang

Abstract: LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-wh… ▽ More LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications. △ Less

Submitted 3 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

arXiv:2410.02167 [pdf, other]

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Authors: Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Abstract: Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability of large language models by augmenting the query using multiple examples with multiple intermediate steps. Despite the empirical success, the theoretical understanding of how to train a Transformer to achieve the CoT ability remains less explored. This is primarily due to the technical challenges involved in… ▽ More Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability of large language models by augmenting the query using multiple examples with multiple intermediate steps. Despite the empirical success, the theoretical understanding of how to train a Transformer to achieve the CoT ability remains less explored. This is primarily due to the technical challenges involved in analyzing the nonconvex optimization on nonlinear attention models. To the best of our knowledge, this work provides the first theoretical study of training Transformers with nonlinear attention to obtain the CoT generalization capability so that the resulting model can inference on unseen tasks when the input is augmented by examples of the new task. We first quantify the required training samples and iterations to train a Transformer model towards CoT ability. We then prove the success of its CoT generalization on unseen tasks with distribution-shifted testing data. Moreover, we theoretically characterize the conditions for an accurate reasoning output by CoT even when the provided reasoning examples contain noises and are not always accurate. In contrast, in-context learning (ICL), which can be viewed as one-step CoT without intermediate steps, may fail to provide an accurate output when CoT does. These theoretical findings are justified through experiments. △ Less

Submitted 5 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

arXiv:2410.01164 [pdf, ps, other]

On maximal functions generated by Hörmander-type spectral multipliers

Authors: Peng Chen, Xixi Lin, Liangchuan Wu, Lixin Yan

Abstract: Let $(X,d,μ)$ be a metric space with doubling measure and $L$ be a nonnegative self-adjoint operator on $L^2(X)$ whose heat kernel satisfies the Gaussian upper bound. We assume that there exists an $L$-harmonic function $h$ such that the semigroup $\exp(-tL)$, after applying the Doob transform related to $h$, satisfies the upper and lower Gaussian estimates. In this paper we apply the Doob transfo… ▽ More Let $(X,d,μ)$ be a metric space with doubling measure and $L$ be a nonnegative self-adjoint operator on $L^2(X)$ whose heat kernel satisfies the Gaussian upper bound. We assume that there exists an $L$-harmonic function $h$ such that the semigroup $\exp(-tL)$, after applying the Doob transform related to $h$, satisfies the upper and lower Gaussian estimates. In this paper we apply the Doob transform and some techniques as in Grafakos-Honzík-Seeger \cite{GHS2006} to obtain an optimal $\sqrt{\log(1+N)}$ bound in $L^p$ for the maximal function $\sup_{1\leq i\leq N}|m_i(L)f|$ for multipliers $m_i,1\leq i\leq N,$ with uniform estimates. Based on this, we establish sufficient conditions on the bounded Borel function $m$ such that the maximal function $M_{m,L}f(x) = \sup_{t>0} |m(tL)f(x)|$ is bounded on $L^p(X)$. The applications include Schrödinger operators with inverse square potential, Scattering operators, Bessel operators and Laplace-Beltrami operators. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: 37 pages

MSC Class: 42B15; 42B25; 47F10

arXiv:2410.00938 [pdf, other]

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

Authors: Sheng Wang, Liheng Chen, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu

Abstract: The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable rol… ▽ More The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously. Targeting more parameter-efficient low-rank adaptation (LoRA), parameter sharing presents a promising solution. Empirically, our research into high-level sharing principles highlights the indispensable role of differentiation in reversing the detrimental effects of pure sharing. Guided by this finding, we propose Mixture of Shards (MoS), incorporating both inter-layer and intra-layer sharing schemes, and integrating four nearly cost-free differentiation strategies, namely subset selection, pair dissociation, vector sharding, and shard privatization. Briefly, it selects a designated number of shards from global pools with a Mixture-of-Experts (MoE)-like routing mechanism before sequentially concatenating them to low-rank matrices. Hence, it retains all the advantages of LoRA while offering enhanced parameter efficiency, and effectively circumvents the drawbacks of peer parameter-sharing methods. Our empirical experiments demonstrate approximately 8x parameter savings in a standard LoRA setting. The ablation study confirms the significance of each component. Our insights into parameter sharing and MoS method may illuminate future developments of more parameter-efficient finetuning methods. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2409.18296 [pdf, other]

An Eccentric Binary with a Misaligned Circumbinary Disk

Authors: Zhecheng Hu, Wei Zhu, Fei Dai, Ping Chen, Yang Huang, Min Fang, Richard S. Post

Abstract: We present spectroscopic and photometric observations of Bernhard-2, which was previously identified as a candidate system to host a misaligned circumbinary disk. Our spectroscopic measurements confirm that Bernhard-2 indeed contains an eccentric ($e=0.69 \pm 0.08$) binary and thus that the periodic variability in the photometric light curve is best explained by the occultation by the misaligned c… ▽ More We present spectroscopic and photometric observations of Bernhard-2, which was previously identified as a candidate system to host a misaligned circumbinary disk. Our spectroscopic measurements confirm that Bernhard-2 indeed contains an eccentric ($e=0.69 \pm 0.08$) binary and thus that the periodic variability in the photometric light curve is best explained by the occultation by the misaligned circumbinary disk. By modeling the spectral energy distributions at different phases, we infer the system age to be $\sim 20\,$Myr and the masses of the two binary components to be $\sim 1.1\,M_\odot$ and $\sim 0.9\,M_\odot$, respectively. Our new photometric observations show clear deviations from the model prediction based on the archival data, suggesting ongoing precession of the circumbinary disk. The H$α$ line of Bernhard-2 also shows an inverse P-Cygni profile at epochs close to the pericenter passage, which could be attributed to the pulsed accretion around the pericenter. Bernhard-2 therefore closely resembles the well studied KH 15D system. Further detailed observations and studies of such rare systems can provide useful information about disk physics and evolution. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: 10 pages, 4 figures, submitted to AAS Journals, comments welcome

arXiv:2409.17892 [pdf, other]

EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models

Authors: Shaoxiong Ji, Zihao Li, Indraneil Paul, Jaakko Paavola, Peiqin Lin, Pinzhen Chen, Dayyán O'Brien, Hengyu Luo, Hinrich Schütze, Jörg Tiedemann, Barry Haddow

Abstract: In this work, we introduce EMMA-500, a large-scale multilingual language model continue-trained on texts across 546 languages designed for enhanced multilingual performance, focusing on improving language coverage for low-resource languages. To facilitate continual pre-training, we compile the MaLA corpus, a comprehensive multilingual dataset enriched with curated datasets across diverse domains.… ▽ More In this work, we introduce EMMA-500, a large-scale multilingual language model continue-trained on texts across 546 languages designed for enhanced multilingual performance, focusing on improving language coverage for low-resource languages. To facilitate continual pre-training, we compile the MaLA corpus, a comprehensive multilingual dataset enriched with curated datasets across diverse domains. Leveraging this corpus, we conduct extensive continual pre-training of the Llama 2 7B model, resulting in EMMA-500, which demonstrates robust performance across a wide collection of benchmarks, including a comprehensive set of multilingual tasks and PolyWrite, an open-ended generation benchmark developed in this study. Our results highlight the effectiveness of continual pre-training in expanding large language models' language capacity, particularly for underrepresented languages, demonstrating significant gains in cross-lingual transfer, task generalization, and language adaptability. △ Less

Submitted 26 September, 2024; originally announced September 2024.

arXiv:2409.15398 [pdf, other]

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

Authors: Ambrish Rawat, Stefan Schoepf, Giulio Zizzo, Giandomenico Cornacchia, Muhammad Zaid Hameed, Kieran Fraser, Erik Miehling, Beat Buesser, Elizabeth M. Daly, Mark Purcell, Prasanna Sattigeri, Pin-Yu Chen, Kush R. Varshney

Abstract: As generative AI, particularly large language models (LLMs), become increasingly integrated into production applications, new attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversar… ▽ More As generative AI, particularly large language models (LLMs), become increasingly integrated into production applications, new attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks. Despite growing academic interest in adversarial risks for generative AI, there is limited guidance tailored for practitioners to assess and mitigate these challenges in real-world environments. To address this, our contributions include: (1) a practical examination of red- and blue-teaming strategies for securing generative AI, (2) identification of key challenges and open questions in defense development and evaluation, and (3) the Attack Atlas, an intuitive framework that brings a practical approach to analyzing single-turn input attacks, placing it at the forefront for practitioners. This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.14757 [pdf]

Giant and Flexible Toroidal Circular Dichroism from Planar Chiral Metasurface

Authors: Shijie Kang, Haitao Li, Jiayu Fan, Jiusi Yu, Boyang Qu, Peng Chen, Xiaoxiao Wu

Abstract: Chirality, a fundamental concept describing an object cannot superpose with its mirror image, is crucial in optics and photonics and leads to various exotic phenomena, such as circular dichroism, and optical activity. Recent findings reveal that, besides electric and magnetic dipoles, toroidal dipoles, an elusive part of dynamic multipoles, can also contribute significantly to chirality. However,… ▽ More Chirality, a fundamental concept describing an object cannot superpose with its mirror image, is crucial in optics and photonics and leads to various exotic phenomena, such as circular dichroism, and optical activity. Recent findings reveal that, besides electric and magnetic dipoles, toroidal dipoles, an elusive part of dynamic multipoles, can also contribute significantly to chirality. However, as toroidal dipoles are typically represented by solenoidal currents circulating on a three-dimensional (3D) torus, toroidal circular dichroism is usually observed in 3D intricate microstructures. Facing corresponding challenges in fabrication, integration and application, it is generally difficult to employ toroidal circular dichroism in compact metasurfaces for flexible modulation of chiral interactions between electromagnetic waves and matter. To overcome these stringent challenges, we propose and experimentally demonstrate the giant toroidal circular dichroism in a bilayer metasurface that is comprised of only planar layers, effectively bypassing various restrictions imposed by 3D microstructures. With the introduction of a displacement, or bilayer offset, between the opposite layers, we experimentally achieve giant chiral responses with the intrinsic circular dichroism (CD) reaching 0.69 in measurements, and the CD can be quantitatively manipulated in a simple manner. The giant intrinsic chirality primarily originates from distinct excitations of in-plane toroidal dipole moments under circular polarized incidences, and the toroidal chiral response is quantitatively controlled by the bilayer offset. Therefore, our work provides a straightforward and versatile approach for development of giant and flexible intrinsic chirality through toroidal dipoles with inherently planar layers, important for applications in communications, sensing, and chiroptical devices. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.13058 [pdf, other]

Mixed Reality Tele-ultrasound over 750 km: a Clinical Study

Authors: Ryan Yeung, David Black, Patrick B. Chen, Victoria Lessoway, Janice Reid, Sergio Rangel-Suarez, Silvia D. Chang, Septimiu E. Salcudean

Abstract: Ultrasound is a hand-held, low-cost, non-invasive medical imaging modality which plays a vital role in diagnosing various diseases. Despite this, many rural and remote communities do not have access to ultrasound scans due to the lack of local experts trained to perform them. To address this challenge, we built a mixed reality and haptics-based tele-ultrasound system to enable an expert to precise… ▽ More Ultrasound is a hand-held, low-cost, non-invasive medical imaging modality which plays a vital role in diagnosing various diseases. Despite this, many rural and remote communities do not have access to ultrasound scans due to the lack of local experts trained to perform them. To address this challenge, we built a mixed reality and haptics-based tele-ultrasound system to enable an expert to precisely guide a novice remotely in carrying out an ultrasound exam. The precision and flexibility of our solution makes it more practical than existing tele-ultrasound solutions. We tested the system in Skidegate on the islands of Haida Gwaii, BC, Canada, with the experts positioned 754 km away at the University of British Columbia, Vancouver, Canada. We performed 11 scans with 10 novices and 2 experts. The experts were tasked with acquiring 5 target images and measurements in the epigastric region. The novices of various backgrounds and ages were all inexperienced in mixed reality and were not required to have prior ultrasound experience. The captured images were evaluated by two radiologists who were not present for the tests. These results are discussed along with new insights into the human computer interaction in such a system. We show that human teleoperation is feasible and can achieve high performance for completing remote ultrasound procedures, even at a large distance and with completely novice followers. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: 8 pages, 10 figures, submitted to IEEE VR 2025

arXiv:2409.12889 [pdf, other]

Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case

Authors: Peng Chen, Pi Bu, Jun Song, Yuan Gao, Bo Zheng

Abstract: Recently, large language model (LLM)-based agents have made significant advances across various fields. One of the most popular research areas involves applying these agents to video games. Traditionally, these methods have relied on game APIs to access in-game environmental and action data. However, this approach is limited by the availability of APIs and does not reflect how humans play games. W… ▽ More Recently, large language model (LLM)-based agents have made significant advances across various fields. One of the most popular research areas involves applying these agents to video games. Traditionally, these methods have relied on game APIs to access in-game environmental and action data. However, this approach is limited by the availability of APIs and does not reflect how humans play games. With the advent of vision language models (VLMs), agents now have enhanced visual understanding capabilities, enabling them to interact with games using only visual inputs. Despite these advances, current approaches still face challenges in action-oriented tasks, particularly in action role-playing games (ARPGs), where reinforcement learning methods are prevalent but suffer from poor generalization and require extensive training. To address these limitations, we select an ARPG, ``Black Myth: Wukong'', as a research platform to explore the capability boundaries of existing VLMs in scenarios requiring visual-only input and complex action output. We define 12 tasks within the game, with 75% focusing on combat, and incorporate several state-of-the-art VLMs into this benchmark. Additionally, we will release a human operation dataset containing recorded gameplay videos and operation logs, including mouse and keyboard actions. Moreover, we propose a novel VARP (Vision Action Role-Playing) agent framework, consisting of an action planning system and a visual trajectory system. Our framework demonstrates the ability to perform basic tasks and succeed in 90% of easy and medium-level combat scenarios. This research aims to provide new insights and directions for applying multimodal agents in complex action game environments. The code and datasets will be made available at https://varp-agent.github.io/. △ Less

Submitted 22 September, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.12814 [pdf]

GeSn 320 \times 256 Focal Plane Array for Silicon-Based Short-wave Infrared Imaging

Authors: Guoyin Xu, Hui Cong, Yue Li, Zhengjie Wu, Fenghe Fu, Ping Chen, Chao Zhao, Chi Xu, Chunlai Xue

Abstract: Short-wave infrared (SWIR) imaging arrays have demonstrated great potential in applications spanning from military to civilian consumer electronics. However, the current focal plane arrays (FPAs), which are based on compound semiconductors, have limited applications in civilian circumstances due to elevated manufacturing costs and prolonged fabrication cycle time. To address this, a high-performan… ▽ More Short-wave infrared (SWIR) imaging arrays have demonstrated great potential in applications spanning from military to civilian consumer electronics. However, the current focal plane arrays (FPAs), which are based on compound semiconductors, have limited applications in civilian circumstances due to elevated manufacturing costs and prolonged fabrication cycle time. To address this, a high-performance 320 $\times$ 256 focal plane array based on group-IV semiconductors has been designed and manufactured on a Si substrate using a complementary metal-oxide semiconductor (CMOS) compatible fabrication process. The optical absorption layer is composed of GeSn alloy, whose bandgap could be tailored by choosing the appropriate Sn concentration. In this work, a 10% Sn concentration was employed, yielding a response cutoff wavelength of 2308 nm for the Si-based photodetector, which was measured at 298 K. Moreover, a specific detectivity of 9.7 $\times$ 10$^{11}$ cm$\cdot$ Hz$^{1/2}$ $\cdot$ W$^{-1}$ has been achieved at 77 K, surpassing all previously reported GeSn devices, and rivals commercial extended InGaAs photodetectors. With the help of read-out circuits (ROIC), SWIR images have been successfully captured for the first time by using Si-based GeSn FPA. This work demonstrates the potential of group IV imaging arrays for various applications in the commercial SWIR imaging field. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.11905 [pdf, other]

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

Authors: Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li

Abstract: This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-t… ▽ More This paper presents AlignBot, a novel framework designed to optimize VLM-powered customized task planning for household robots by effectively aligning with user reminders. In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders. To address these challenges, AlignBot employs a fine-tuned LLaVA-7B model, functioning as an adapter for GPT-4o. This adapter model internalizes diverse forms of user reminders-such as personalized preferences, corrective guidance, and contextual assistance-into structured instruction-formatted cues that prompt GPT-4o in generating customized task plans. Additionally, AlignBot integrates a dynamic retrieval mechanism that selects task-relevant historical successes as prompts for GPT-4o, further enhancing task planning accuracy. To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings. A multimodal dataset with over 1,500 entries derived from volunteer reminders is used for training and evaluation. The results demonstrate that AlignBot significantly improves customized task planning, outperforming existing LLM- and VLM-powered planners by interpreting and aligning with user reminders, achieving 86.8% success rate compared to the vanilla GPT-4o baseline at 21.6%, reflecting a 65% improvement and over four times greater effectiveness. Supplementary materials are available at: https://yding25.com/AlignBot/ △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2409.09668 [pdf, other]

EditBoard: Towards A Comprehensive Evaluation Benchmark for Text-based Video Editing Models

Authors: Yupeng Chen, Penglin Chen, Xiaoyu Zhang, Yixian Huang, Qian Xie

Abstract: The rapid development of diffusion models has significantly advanced AI-generated content (AIGC), particularly in Text-to-Image (T2I) and Text-to-Video (T2V) generation. Text-based video editing, leveraging these generative capabilities, has emerged as a promising field, enabling precise modifications to videos based on text prompts. Despite the proliferation of innovative video editing models, th… ▽ More The rapid development of diffusion models has significantly advanced AI-generated content (AIGC), particularly in Text-to-Image (T2I) and Text-to-Video (T2V) generation. Text-based video editing, leveraging these generative capabilities, has emerged as a promising field, enabling precise modifications to videos based on text prompts. Despite the proliferation of innovative video editing models, there is a conspicuous lack of comprehensive evaluation benchmarks that holistically assess these models' performance across various dimensions. Existing evaluations are limited and inconsistent, typically summarizing overall performance with a single score, which obscures models' effectiveness on individual editing tasks. To address this gap, we propose EditBoard, the first comprehensive evaluation benchmark for text-based video editing models. EditBoard encompasses nine automatic metrics across four dimensions, evaluating models on four task categories and introducing three new metrics to assess fidelity. This task-oriented benchmark facilitates objective evaluation by detailing model performance and providing insights into each model's strengths and weaknesses. By open-sourcing EditBoard, we aim to standardize evaluation and advance the development of robust video editing models. △ Less

Submitted 15 September, 2024; originally announced September 2024.

arXiv:2409.09601 [pdf, other]

A Survey of Foundation Models for Music Understanding

Authors: Wenjun Li, Ying Cai, Ziyang Wu, Wenyi Zhang, Yifan Chen, Rundong Qi, Mengqi Dong, Peigen Chen, Xiao Dong, Fenghao Shi, Lei Guo, Junwei Han, Bao Ge, Tianming Liu, Lin Gan, Tuo Zhang

Abstract: Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally. A better understanding of music can enhance our emotions, cognitive skills, and cultural connections. The rapid advancement of artificial intelligence (AI) has introduced new ways to analyze music, aiming to replicate human understanding of music and provide relat… ▽ More Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting us personally, socially, and culturally. A better understanding of music can enhance our emotions, cognitive skills, and cultural connections. The rapid advancement of artificial intelligence (AI) has introduced new ways to analyze music, aiming to replicate human understanding of music and provide related services. While the traditional models focused on audio features and simple tasks, the recent development of large language models (LLMs) and foundation models (FMs), which excel in various fields by integrating semantic information and demonstrating strong reasoning abilities, could capture complex musical features and patterns, integrate music with language and incorporate rich musical, emotional and psychological knowledge. Therefore, they have the potential in handling complex music understanding tasks from a semantic perspective, producing outputs closer to human perception. This work, to our best knowledge, is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities. We also discussed their limitations and proposed possible future directions, offering insights for researchers in this field. △ Less

Submitted 14 September, 2024; originally announced September 2024.

Comments: 20 pages, 2 figures

arXiv:2409.09572 [pdf, other]

A Novel Aerial-Aquatic Locomotion Robot with Variable Stiffness Propulsion Module

Authors: Junzhe Hu, Pengyu Chen, Tianxiang Feng, Yuxuan Wen, Ke Wu, Janet Dong

Abstract: In recent years, the development of robots capable of operating in both aerial and aquatic environments has gained significant attention. This study presents the design and fabrication of a novel aerial-aquatic locomotion robot (AALR). Inspired by the diving beetle, the AALR incorporates a biomimetic propulsion mechanism with power and recovery strokes. The variable stiffness propulsion module (VS… ▽ More In recent years, the development of robots capable of operating in both aerial and aquatic environments has gained significant attention. This study presents the design and fabrication of a novel aerial-aquatic locomotion robot (AALR). Inspired by the diving beetle, the AALR incorporates a biomimetic propulsion mechanism with power and recovery strokes. The variable stiffness propulsion module (VSPM) uses low melting point alloy (LMPA) and variable stiffness joints (VSJ) to achieve efficient aquatic locomotion while reduce harm to marine life. The AALR's innovative design integrates the VSPM into the arms of a traditional quadrotor, allowing for effective aerial-aquatic locomotion. The VSPM adjusts joint stiffness through temperature control, meeting locomotion requirements in both aerial and aquatic modes. A dynamic model for the VSPM was developed, with optimized dimensional parameters to increase propulsion force. Experiments focused on aquatic mode analysis and demonstrated the AALR's swimming capability, achieving a maximum swimming speed of 77 mm/s underwater. The results confirm the AALR's effective performance in water environment, highlighting its potential for versatile, eco-friendly operations. △ Less

Submitted 14 September, 2024; originally announced September 2024.

Comments: 8 pages, 10 figures, ICRA

arXiv:2409.09141 [pdf, other]

Sequential infinite-dimensional Bayesian optimal experimental design with derivative-informed latent attention neural operator

Authors: Jinwoo Go, Peng Chen

Abstract: We develop a new computational framework to solve sequential Bayesian optimal experimental design (SBOED) problems constrained by large-scale partial differential equations with infinite-dimensional random parameters. We propose an adaptive terminal formulation of the optimality criteria for SBOED to achieve adaptive global optimality. We also establish an equivalent optimization formulation to ac… ▽ More We develop a new computational framework to solve sequential Bayesian optimal experimental design (SBOED) problems constrained by large-scale partial differential equations with infinite-dimensional random parameters. We propose an adaptive terminal formulation of the optimality criteria for SBOED to achieve adaptive global optimality. We also establish an equivalent optimization formulation to achieve computational simplicity enabled by Laplace and low-rank approximations of the posterior. To accelerate the solution of the SBOED problem, we develop a derivative-informed latent attention neural operator (LANO), a new neural network surrogate model that leverages (1) derivative-informed dimension reduction for latent encoding, (2) an attention mechanism to capture the dynamics in the latent space, (3) an efficient training in the latent space augmented by projected Jacobian, which collectively leads to an efficient, accurate, and scalable surrogate in computing not only the parameter-to-observable (PtO) maps but also their Jacobians. We further develop the formulation for the computation of the MAP points, the eigenpairs, and the sampling from posterior by LANO in the reduced spaces and use these computations to solve the SBOED problem. We demonstrate the superior accuracy of LANO compared to two other neural architectures and the high accuracy of LANO compared to the finite element method (FEM) for the computation of MAP points and eigenvalues in solving the SBOED problem with application to the experimental design of the time to take MRI images in monitoring tumor growth. We show that the proposed computational framework achieves an amortized $180\times$ speedup. △ Less

Submitted 2 October, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.08365 [pdf, other]

Measurement of the nucleon spin structure functions for $0.01<Q^2<1$~GeV$^2$ using CLAS

Authors: A. Deur, S. E. Kuhn, M. Ripani, X. Zheng, A. G. Acar, P. Achenbach, K. P. Adhikari, J. S. Alvarado, M. J. Amaryan, W. R. Armstrong, H. Atac, H. Avakian, L. Baashen, N. A. Baltzell, L. Barion, M. Bashkanov, M. Battaglieri, B. Benkel, F. Benmokhtar, A. Bianconi, A. S. Biselli, W. A. Booth, F. B ossu, P. Bosted, S. Boiarinov , et al. (124 additional authors not shown)

Abstract: The spin structure functions of the proton and the deuteron were measured during the EG4 experiment at Jefferson Lab in 2006. Data were collected for longitudinally polarized electron scattering off longitudinally polarized NH$_3$ and ND$_3$ targets, for $Q^2$ values as small as 0.012 and 0.02 GeV$^2$, respectively, using the CEBAF Large Acceptance Spectrometer (CLAS). This is the archival paper o… ▽ More The spin structure functions of the proton and the deuteron were measured during the EG4 experiment at Jefferson Lab in 2006. Data were collected for longitudinally polarized electron scattering off longitudinally polarized NH$_3$ and ND$_3$ targets, for $Q^2$ values as small as 0.012 and 0.02 GeV$^2$, respectively, using the CEBAF Large Acceptance Spectrometer (CLAS). This is the archival paper of the EG4 experiment that summaries the previously reported results of the polarized structure functions $g_1$, $A_1F_1$, and their moments $\overline Γ_1$, $\overline γ_0$, and $\overline I_{TT}$, for both the proton and the deuteron. In addition, we report on new results on the neutron $g_1$ extracted by combining proton and deuteron data and correcting for Fermi smearing, and on the neutron moments $\overline Γ_1$, $\overline γ_0$, and $\overline I_{TT}$ formed directly from those of the proton and the deuteron. Our data are in good agreement with the Gerasimov-Drell-Hearn sum rule for the proton, deuteron, and neutron. Furthermore, the isovector combination was formed for $g_1$ and the Bjorken integral $\overline Γ_1^{p-n}$, and compared to available theoretical predictions. All of our results provide for the first time extensive tests of spin observable predictions from chiral effective field theory ($χ$EFT) in a $Q^2$ range commensurate with the pion mass. They motivate further improvement in $χ$EFT calculations from other approaches such as the lattice gauge method. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: 33 pages. 26 figures. Data table provided in supplementary material (30 pages)

Report number: JLAB-PHY-24-4184, DOE/OR/23177-7672

arXiv:2409.06577 [pdf, other]

Compressed Sensing based Detection Schemes for Differential Spatial Modulation in Visible Light Communication Systems

Authors: Zichun Shi, Pu Miao, Peng Chen, Lei Xue, Li-Yang Zheng, Laiyuan Wang, Gaojie Chen

Abstract: Differential spatial modulation (DSM) exploits the time dimension to facilitate the differential modulation, which can perfectly avoid the challenge in acquiring of heavily entangled channel state information of visible light communication (VLC) system. However, it has huge search space and high complexity for large number of transmitters. In this paper, a novel vector correction (VC)-based orthog… ▽ More Differential spatial modulation (DSM) exploits the time dimension to facilitate the differential modulation, which can perfectly avoid the challenge in acquiring of heavily entangled channel state information of visible light communication (VLC) system. However, it has huge search space and high complexity for large number of transmitters. In this paper, a novel vector correction (VC)-based orthogonal matching pursuit (OMP) detection algorithm is proposed to reduce the complexity, which exploits the sparsity and relativity of all transmitters, and then employs a novel correction criterion by correcting the index vectors of the error estimation for improving the demodulation performance. To overcome the local optimum dilemma in the atoms searching, an OMP-assisted genetic algorithm is also proposed to further improve the bit error rate (BER) performance of the VLC-DSM system. Simulation results demonstrate that the proposed schemes can significantly reduce the computational complexity at least by 62.5% while achieving an excellent BER performance as compared with traditional maximum likelihood based receiver. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: This paper has been accepted by 2024 IEEE 24th International Conference on Communication Technology (ICCT 2024)

arXiv:2409.05987 [pdf, other]

doi 10.1117/12.3020603

Simulated performance of energy-resolving detectors towards exoplanet imaging with the Habitable Worlds Observatory

Authors: Sarah Steiger, Laurent Pueyo, Emiel H. Por, Pin Chen, Rémi Soummer, Raphaël Pourcelot, Iva Laginja, Vanessa P. Bailey

Abstract: One of the primary science goals of the Habitable Worlds Observatory (HWO) as defined by the Astro2020 decadal survey is the imaging of the first Earth-like planet around a Sun-like star. A key technology gap towards reaching this goal are the development of ultra-low-noise photon counting detectors capable of measuring the incredibly low count rates coming from these planets which are at contrast… ▽ More One of the primary science goals of the Habitable Worlds Observatory (HWO) as defined by the Astro2020 decadal survey is the imaging of the first Earth-like planet around a Sun-like star. A key technology gap towards reaching this goal are the development of ultra-low-noise photon counting detectors capable of measuring the incredibly low count rates coming from these planets which are at contrasts of $\sim 1 \times 10^{-10}$. Superconducting energy-resolving detectors (ERDs) are a promising technology for this purpose as, despite their technological challenges, needing to be cooled below their superconducting transition temperature ($< 1\mathrm{K}$), they have essentially zero read noise, dark current, or clock-induced charge, and can get the wavelength of each incident photon without the use of additional throughput-reducing filters or gratings that spread light over many pixels. The use of these detectors on HWO will not only impact the science of the mission by decreasing the required exposure times for exo-Earth detection and characterization, but also in a wavefront sensing and control context when used for starlight suppression to generate a dark zone. We show simulated results using both an EMCCD and an ERD to ``dig a dark zone'' demonstrating that ERDs can achieve the same final contrast as an EMCCD in about half of the total time. We also perform a simple case study using an exposure time calculator tool called the Error Budget Software (EBS) to determine the required integration times to detect water for HWO targets of interest using both EMCCDs and ERDs. This shows that once a dark zone is achieved, using an ERD can decrease these exposure times by factors of 1.5--2 depending on the specific host star properties. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 13 pages, 7 figures

Journal ref: Proc. SPIE 13092, Space Telescopes and Instrumentation 2024: Optical, Infrared, and Millimeter Wave; 130921W

arXiv:2409.05337 [pdf, other]

Masses and radiative decay widths of the $D_{s0}^*(2317)$ and $D_{s1}^{\prime}(2460)$ and their bottom analogs

Authors: Zi-Le Zhang, Zhan-Wei Liu, Si-Qiang Luo, Ping Chen, Zhi-Hui Guo

Abstract: We study the mass spectra and radiative decays of $D_{s0}^*(2317)$ and $D_{s1}^{\prime}(2460)$ in an unquenched framework. In addition to coupled channel effects between the $c\bar{s}$ cores and $D^{(*)}K$ channels, $D^{(*)}K$-$D^{(*)}K$ self interactions are also considered in this work and we succeed to reproduce their mass spectra. Furthermore, we study the radiative decays of the… ▽ More We study the mass spectra and radiative decays of $D_{s0}^*(2317)$ and $D_{s1}^{\prime}(2460)$ in an unquenched framework. In addition to coupled channel effects between the $c\bar{s}$ cores and $D^{(*)}K$ channels, $D^{(*)}K$-$D^{(*)}K$ self interactions are also considered in this work and we succeed to reproduce their mass spectra. Furthermore, we study the radiative decays of the $D_{s0}^*(2317)$ and $D_{s1}^{\prime}(2460)$ by simultaneously including the compound structures of conventional $c\bar{s}$ cores and $D^{(*)}K$ components. We also calculate their bottom analogs with heavy quark symmetry. Our study offers useful insights into the important unquenched effects in the formation of $D_{s0}^*(2317)$, $D_{s1}^{\prime}(2460)$ and the bottom counterparts. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 18 pages, 3 tables, 7 figures

arXiv:2409.05064 [pdf, other]

Environmental effects as a key factor in shaping star-forming S0 galaxies

Authors: Pei-Bin Chen, Junfeng Wang, Yan-Mei Chen, Xiao-Yu Xu, Tian-Wen Cao

Abstract: The origins of lenticular galaxies (S0s) can be classified into two main categories: ``minor mergers" in low-density environments (LDEs) and ``faded spirals" in high-density environments (HDEs). The transitional phase in the evolution of S0s, namely, star-forming lenticular galaxies (SFS0s), can serve as an important probe for analyzing the complex processes involved in the transformation between… ▽ More The origins of lenticular galaxies (S0s) can be classified into two main categories: ``minor mergers" in low-density environments (LDEs) and ``faded spirals" in high-density environments (HDEs). The transitional phase in the evolution of S0s, namely, star-forming lenticular galaxies (SFS0s), can serve as an important probe for analyzing the complex processes involved in the transformation between different galaxy types and the quenching of star formation (SF). We attempt to find the impact of different environments on the global properties and spatially resolved quantities of SFS0s. We selected 71 SFS0s from the SDSS-IV MaNGA Survey, comprising 23 SFS0s in HDEs (SFS0s$\_$HE) and 48 SFS0s in LDEs (SFS0s$\_$LE). We examined the effects of the environment, by studying the global properties, concentration index, and radial profiles of the derived quantities. The varied environments of SFS0s do not lead to any significant difference in global properties (e.g., S$\acute{\rm e}$rsic index). By calculating $CI_{\rm H_α/cont}$, we observe that different environments may cause varying concentrations of SF. Specifically, SFS0s$\_$LE, affected by external gas mergers or inflow, exhibit a more centrally concentrated SF (i.e., larger $CI_{\rm H_α/cont}$). This trend is further supported by $CI_{\rm SFR, H_α}$, which only considers the gas disk of the galaxy. This observation is aligned with the observed shrinking of gas disks in galaxies affected by ram-pressure stripping in HDEs. Furthermore, their $Σ_{\rm SFR}$ or resolved sSFR are comparable. On average, SFS0s$\_$LE display significantly higher values for both quantities. Finally, the observed D$_{\rm n}4000$ and gas-phase metallicity gradient correspond well to their assumed origins. However, we did not find a significantly lower gas-phase metallicity in SFS0s$\_$LE. Abridged △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 15 pages, 7 figures, 4 tables. Accepted for publication in A&A

arXiv:2409.04363 [pdf, other]

RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement

Authors: Hao Luo, Baoliang Chen, Lingyu Zhu, Peilin Chen, Shiqi Wang

Abstract: Scene observation from multiple perspectives would bring a more comprehensive visual experience. However, in the context of acquiring multiple views in the dark, the highly correlated views are seriously alienated, making it challenging to improve scene understanding with auxiliary views. Recent single image-based enhancement methods may not be able to provide consistently desirable restoration pe… ▽ More Scene observation from multiple perspectives would bring a more comprehensive visual experience. However, in the context of acquiring multiple views in the dark, the highly correlated views are seriously alienated, making it challenging to improve scene understanding with auxiliary views. Recent single image-based enhancement methods may not be able to provide consistently desirable restoration performance for all views due to the ignorance of potential feature correspondence among different views. To alleviate this issue, we make the first attempt to investigate multi-view low-light image enhancement. First, we construct a new dataset called Multi-View Low-light Triplets (MVLT), including 1,860 pairs of triple images with large illumination ranges and wide noise distribution. Each triplet is equipped with three different viewpoints towards the same scene. Second, we propose a deep multi-view enhancement framework based on the Recurrent Collaborative Network (RCNet). Specifically, in order to benefit from similar texture correspondence across different views, we design the recurrent feature enhancement, alignment and fusion (ReEAF) module, in which intra-view feature enhancement (Intra-view EN) followed by inter-view feature alignment and fusion (Inter-view AF) is performed to model the intra-view and inter-view feature propagation sequentially via multi-view collaboration. In addition, two different modules from enhancement to alignment (E2A) and from alignment to enhancement (A2E) are developed to enable the interactions between Intra-view EN and Inter-view AF, which explicitly utilize attentive feature weighting and sampling for enhancement and alignment, respectively. Experimental results demonstrate that our RCNet significantly outperforms other state-of-the-art methods. All of our dataset, code, and model will be available at https://github.com/hluo29/RCNet. △ Less

Submitted 6 September, 2024; originally announced September 2024.

Comments: 14 Pages, 10 Figures, Under Review

arXiv:2409.04200 [pdf, other]

ZTF SN Ia DR2: The diversity and relative rates of the thermonuclear SN population

Authors: G. Dimitriadis, U. Burgaz, M. Deckers, K. Maguire, J. Johansson, M. Smith, M. Rigault, C. Frohmaier, J. Sollerman, L. Galbany, Y. -L. Kim, C. Liu, A. A. Miller, P. E. Nugent, A. Alburai, P. Chen, S. Dhawan, M. Ginolin, A. Goobar, S. L. Groom, L. Harvey, W. D. Kenworthy, S. R. Kulkarni, B. Popovic, R. L. Riddle , et al. (5 additional authors not shown)

Abstract: The Zwicky Transient Facility SN Ia Data Release 2 (ZTF SN Ia DR2) contains more than 3,000 Type Ia supernovae (SNe Ia), providing the largest homogeneous low-redshift sample of SNe Ia. Having at least one spectrum per event, this data collection is ideal for large-scale statistical studies of the photometric, spectroscopic and host-galaxy properties of SNe Ia, particularly of the more rare "pecul… ▽ More The Zwicky Transient Facility SN Ia Data Release 2 (ZTF SN Ia DR2) contains more than 3,000 Type Ia supernovae (SNe Ia), providing the largest homogeneous low-redshift sample of SNe Ia. Having at least one spectrum per event, this data collection is ideal for large-scale statistical studies of the photometric, spectroscopic and host-galaxy properties of SNe Ia, particularly of the more rare "peculiar" subclasses. In this paper, we first present the method we developed to spectroscopically classify the SNe in the sample, and the techniques we used to model their multi-band light curves and explore their photometric properties. We then show a method to distinguish between the "peculiar" subtypes and the normal SNe Ia. We also explore the properties of their host galaxies and estimate their relative rates, focusing on the "peculiar" subtypes and their connection to the cosmologically useful SNe Ia. Finally, we discuss the implications of our study with respect to the progenitor systems of the "peculiar" SN Ia events. △ Less

Submitted 6 September, 2024; originally announced September 2024.

Comments: 19 pages, 13 figures, submitted to Astronomy and Astrophysics

arXiv:2409.02054 [pdf, other]

A cosmic formation site of silicon and sulphur revealed by a new type of supernova explosion

Authors: Steve Schulze, Avishay Gal-Yam, Luc Dessart, Adam A. Miller, Stan E. Woosley, Yi Yang, Mattia Bulla, Ofer Yaron, Jesper Sollerman, Alexei V. Filippenko, K-Ryan Hinds, Daniel A. Perley, Daichi Tsuna, Ragnhild Lunnan, Nikhil Sarin, Sean J. Brennan, Thomas G. Brink, Rachel J. Bruch, Ping Chen, Kaustav K. Das, Suhail Dhawan, Claes Fransson, Christoffer Fremling, Anjasha Gangopadhyay, Ido Irani , et al. (25 additional authors not shown)

Abstract: The cores of stars are the cosmic furnaces where light elements are fused into heavier nuclei. The fusion of hydrogen to helium initially powers all stars. The ashes of the fusion reactions are then predicted to serve as fuel in a series of stages, eventually transforming massive stars into a structure of concentric shells. These are composed of natal hydrogen on the outside, and consecutively hea… ▽ More The cores of stars are the cosmic furnaces where light elements are fused into heavier nuclei. The fusion of hydrogen to helium initially powers all stars. The ashes of the fusion reactions are then predicted to serve as fuel in a series of stages, eventually transforming massive stars into a structure of concentric shells. These are composed of natal hydrogen on the outside, and consecutively heavier compositions inside, predicted to be dominated by helium, carbon/oxygen, oxygen/neon/magnesium, and oxygen/silicon/sulphur. Silicon and sulphur are fused into inert iron, leading to the collapse of the core and either a supernova explosion or the direct formation of a black hole. Stripped stars, where the outer hydrogen layer has been removed and the internal He-rich layer (in Wolf-Rayet WN stars) or even the C/O layer below it (in Wolf-Rayet WC/WO stars) are exposed, provide evidence for this shell structure, and the cosmic element production mechanism it reflects. The types of supernova explosions that arise from stripped stars embedded in shells of circumstellar material (most notably Type Ibn supernovae from stars with outer He layers, and Type Icn supernovae from stars with outer C/O layers) confirm this scenario. However, direct evidence for the most interior shells, which are responsible for the production of elements heavier than oxygen, is lacking. Here, we report the discovery of the first-of-its-kind supernova arising from a star peculiarly stripped all the way to the silicon and sulphur-rich internal layer. Whereas the concentric shell structure of massive stars is not under debate, it is the first time that such a thick, massive silicon and sulphur-rich shell, expelled by the progenitor shortly before the SN explosion, has been directly revealed. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 48 pages, 12 figures and 10 tables. Submitted to a high-impact journal. The reduced spectra and photometry will be made available via the journal webpage and the WISeREP archive after the acceptance of the paper

arXiv:2409.02038 [pdf, other]

BEAVER: An Enterprise Benchmark for Text-to-SQL

Authors: Peter Baile Chen, Fabian Wenz, Yi Zhang, Moe Kayali, Nesime Tatbul, Michael Cafarella, Çağatay Demiralp, Michael Stonebraker

Abstract: Existing text-to-SQL benchmarks have largely been constructed using publicly available tables from the web with human-generated tests containing question and SQL statement pairs. They typically show very good results and lead people to think that LLMs are effective at text-to-SQL tasks. In this paper, we apply off-the-shelf LLMs to a benchmark containing enterprise data warehouse data. In this env… ▽ More Existing text-to-SQL benchmarks have largely been constructed using publicly available tables from the web with human-generated tests containing question and SQL statement pairs. They typically show very good results and lead people to think that LLMs are effective at text-to-SQL tasks. In this paper, we apply off-the-shelf LLMs to a benchmark containing enterprise data warehouse data. In this environment, LLMs perform poorly, even when standard prompt engineering and RAG techniques are utilized. As we will show, the reasons for poor performance are largely due to three characteristics: (1) public LLMs cannot train on enterprise data warehouses because they are largely in the "dark web", (2) schemas of enterprise tables are more complex than the schemas in public data, which leads the SQL-generation task innately harder, and (3) business-oriented questions are often more complex, requiring joins over multiple tables and aggregations. As a result, we propose a new dataset BEAVER, sourced from real enterprise data warehouses together with natural language queries and their correct SQL statements which we collected from actual user history. We evaluated this dataset using recent LLMs and demonstrated their poor performance on this task. We hope this dataset will facilitate future researchers building more sophisticated text-to-SQL systems which can do better on this important class of data. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.01821 [pdf, other]

When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

Authors: Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Tsung-Yi Ho

Abstract: Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) a… ▽ More Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing. By employing the LLR score alongside resource-efficient visual prompts approximations, our cost-effective measure attains up to a 100-fold reduction in run time compared to full training, while achieving prediction accuracies up to 91%. The source code is available at https://github.com/IBM/VP-LLR. △ Less

Submitted 4 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.01030 [pdf, other]

doi 10.1109/TIFS.2024.3372773

Learning to Discover Forgery Cues for Face Forgery Detection

Authors: Jiahe Tian, Peng Chen, Cai Yu, Xiaomeng Fu, Xi Wang, Jiao Dai, Jizhong Han

Abstract: Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision.… ▽ More Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision. This requirement restricts their applicability to unpaired faces and contradicts real-world scenarios. Moreover, the used comparison methods annotate all changed pixels, including noise introduced by compression and upsampling. Using such maps as supervision hinders the learning of exploitable cues and makes models prone to overfitting. To address these issues, we introduce a weakly supervised model in this paper, named Forgery Cue Discovery (FoCus), to locate forgery cues in unpaired faces. Unlike some detectors that claim to locate forged regions in attention maps, FoCus is designed to sidestep their shortcomings of capturing partial and inaccurate forgery cues. Specifically, we propose a classification attentive regions proposal module to locate forgery cues during classification and a complementary learning module to facilitate the learning of richer cues. The produced manipulation maps can serve as better supervision to enhance face forgery detectors. Visualization of the manipulation maps of the proposed FoCus exhibits superior interpretability and robustness compared to existing methods. Experiments on five datasets and four multi-task models demonstrate the effectiveness of FoCus in both in-dataset and cross-dataset evaluations. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: TIFS 2024

Showing 1–50 of 2,759 results for author: Chen, P