subscribe to arXiv mailings

Interaction of the Prominence Plasma within the Magnetic Cloud of an ICME with the Earth's Bow Shock

Authors: Hadi Madanian, Li-Jen Chen, Jonathan Ng, Michael J. Starkey, Stephen A. Fuselier, Naoki Bessho, Daniel J. Gershman, Terry Z. Liu

Abstract: The magnetic cloud within an interplanetary coronal mass ejection (ICME) is characterized by high magnetic field intensities. In this study, we investigate the interaction of a magnetic cloud carrying a density structure with the Earth's bow shock during ICME event on 24 April 2023. Elevated abundances of cold protons and heavier ions, namely alpha particles, and singly charged helium ions associa… ▽ More The magnetic cloud within an interplanetary coronal mass ejection (ICME) is characterized by high magnetic field intensities. In this study, we investigate the interaction of a magnetic cloud carrying a density structure with the Earth's bow shock during ICME event on 24 April 2023. Elevated abundances of cold protons and heavier ions, namely alpha particles, and singly charged helium ions associated with the prominence plasma are observed within this structure. The plasma downstream of the bow shock exhibits an irregular compression pattern which could be due to the presence of heavy ions. Heavy ions carry a significant fraction of upstream flow energy, however, due to their different charge per mass ratio and rigidity, they are less scattered by the electromagnetic and electrostatic waves at the shock. We find that ions thermal energy is only a small fraction of the background magnetic energy density downstream of the shock. While increased ion fluxes reduce the characteristic wave speeds in the that region. As such, we observe a transition state of an unstable bow shock in which the plasma flow is super Alfvénic both upstream and downstream of the bow shock. Our findings help with understanding of the intense space weather impacts of such events. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.15970 [pdf, other]

doi 10.1145/3617829

Policy-driven Knowledge Selection and Response Generation for Document-grounded Dialogue

Authors: Longxuan Ma, Jiapeng Li, Mingda Li, Wei-Nan Zhang, Ting Liu

Abstract: Document-grounded dialogue (DGD) uses documents as external knowledge for dialogue generation. Correctly understanding the dialogue context is crucial for selecting knowledge from the document and generating proper responses. In this paper, we propose using a dialogue policy to help the dialogue understanding in DGD. Our dialogue policy consists of two kinds of guiding signals: utterance function… ▽ More Document-grounded dialogue (DGD) uses documents as external knowledge for dialogue generation. Correctly understanding the dialogue context is crucial for selecting knowledge from the document and generating proper responses. In this paper, we propose using a dialogue policy to help the dialogue understanding in DGD. Our dialogue policy consists of two kinds of guiding signals: utterance function and topic transfer intent. The utterance function reflects the purpose and style of an utterance, and the topic transfer intent reflects the topic and content of an utterance. We propose a novel framework exploiting our dialogue policy for two core tasks in DGD, namely knowledge selection (KS) and response generation (RG). The framework consists of two modules: the Policy planner leverages policy-aware dialogue representation to select knowledge and predict the policy of the response; the generator uses policy/knowledge-aware dialogue representation for response generation. Our policy-driven model gets state-of-the-art performance on three public benchmarks and we provide a detailed analysis of the experimental results. Our code/data will be released on GitHub. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: 29 pages, 9 figures, 14 tables, TOIS 2024

ACM Class: I.2.7

Journal ref: ACM Transactions on Information Systems, Volume 42, Issue 2, 08 November 2023

arXiv:2410.15913 [pdf, other]

The magnetic field in quiescent star-forming filament G16.96+0.27

Authors: Qi-Lao Gu, Tie Liu, Zhi-Qiang Shen, Sihan Jiao, Julien Montillaud, Mika Juvela, Xing Lu, Chang Won Lee, Junhao Liu, Pak Shing Li, Xunchuan Liu, Doug Johnstone, Woojin Kwon, Kee-Tae Kim, Ken'ichi Tatematsu, Patricio Sanhueza, Isabelle Ristorcelli, Patrick Koch, Qizhou Zhang, Kate Pattle, Naomi Hirano, Dana Alina, James Di Francesco

Abstract: We present 850 μm thermal dust polarization observations with a resolution of 14.4"(~ 0.13 pc) towards an infrared dark cloud G16.96+0.27 using JCMT/POL-2. The average magnetic field orientation, which roughly agrees with the larger-scale magnetic field orientation traced by the Planck 353 GHz data, is approximately perpendicular to the filament structure. The estimated plane-of-sky magnetic field… ▽ More We present 850 μm thermal dust polarization observations with a resolution of 14.4"(~ 0.13 pc) towards an infrared dark cloud G16.96+0.27 using JCMT/POL-2. The average magnetic field orientation, which roughly agrees with the larger-scale magnetic field orientation traced by the Planck 353 GHz data, is approximately perpendicular to the filament structure. The estimated plane-of-sky magnetic field strength is ~ 96 μG and ~ 60 μG using two variants of the Davis-Chandrasekhar-Fermi methods. We calculate the virial and magnetic critical parameters to evaluate the relative importance of gravity, the magnetic field, and turbulence. The magnetic field and turbulence are both weaker than gravity, but magnetic fields and turbulence together are equal to gravity, suggesting that G16.96+0.27 is in a quasi-equilibrium state. The cloud-magnetic-field alignment is found to have a trend moving away from perpendicularity in the dense regions, which may serve as a tracer of potential fragmentation in such quiescent filaments. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: Accepted by ApJ. 13 pages, 5 figures

arXiv:2410.15765 [pdf, other]

SeisLM: a Foundation Model for Seismic Waveforms

Authors: Tianlin Liu, Jannes Münchmeyer, Laura Laurenti, Chris Marone, Maarten V. de Hoop, Ivan Dokmanić

Abstract: We introduce the Seismic Language Model (SeisLM), a foundational model designed to analyze seismic waveforms -- signals generated by Earth's vibrations such as the ones originating from earthquakes. SeisLM is pretrained on a large collection of open-source seismic datasets using a self-supervised contrastive loss, akin to BERT in language modeling. This approach allows the model to learn general s… ▽ More We introduce the Seismic Language Model (SeisLM), a foundational model designed to analyze seismic waveforms -- signals generated by Earth's vibrations such as the ones originating from earthquakes. SeisLM is pretrained on a large collection of open-source seismic datasets using a self-supervised contrastive loss, akin to BERT in language modeling. This approach allows the model to learn general seismic waveform patterns from unlabeled data without being tied to specific downstream tasks. When fine-tuned, SeisLM excels in seismological tasks like event detection, phase-picking, onset time regression, and foreshock-aftershock classification. The code has been made publicly available on https://github.com/liutianlin0121/seisLM. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.15750 [pdf, ps, other]

Normalized solutions for a class of Sobolev critical Schrodinger systems

Authors: Houwang Li, Tianhao Liu, Wenming Zou

Abstract: This paper focuses on the existence and multiplicity of normalized solutions for the coupled Schrodinger system with Sobolev critical coupling term. We present several existence and multiplicity results under some explicit conditions. Furthermore, we present a non-existence result for the defocusing case. This paper, together with the paper [T. Bartsch, H. W. Li and W. M. Zou. Calc. Var. Partial D… ▽ More This paper focuses on the existence and multiplicity of normalized solutions for the coupled Schrodinger system with Sobolev critical coupling term. We present several existence and multiplicity results under some explicit conditions. Furthermore, we present a non-existence result for the defocusing case. This paper, together with the paper [T. Bartsch, H. W. Li and W. M. Zou. Calc. Var. Partial Differential Equations 62 (2023) ], provides a more comprehensive understanding of normalized solutions for Sobolev critical systems. We believe our methods can also address the open problem of the multiplicity of normalized solutions for Schrodinger systems with Sobolev critical growth, with potential for future development and broader applicability. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: Any comments are welcome

arXiv:2410.15651 [pdf, other]

Understanding and Alleviating Memory Consumption in RLHF for LLMs

Authors: Jin Zhou, Hanmei Yang, Steven, Tang, Mingcan Xiang, Hui Guan, Tongping Liu

Abstract: Fine-tuning with Reinforcement Learning with Human Feedback (RLHF) is essential for aligning large language models (LLMs). However, RLHF often encounters significant memory challenges. This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption. Additionally, we introduce a simple… ▽ More Fine-tuning with Reinforcement Learning with Human Feedback (RLHF) is essential for aligning large language models (LLMs). However, RLHF often encounters significant memory challenges. This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption. Additionally, we introduce a simple yet effective approach that substantially reduces the memory required for RLHF fine-tuning. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.15333 [pdf, other]

The ALMA-QUARKS Survey: Fibers' role in star formation unveiled in an intermediate-mass protocluster region of the Vela D cloud

Authors: Dongting Yang, HongLi Liu, Tie Liu, Anandmayee Tej, Xunchuan Liu, Jinhua He, Guido Garay, Amelia Stutz, Lei Zhu, Sheng-Li Qin, Fengwei Xu, Pak-Shing Li, Mika Juvela, Pablo Garcia, Paul F. Goldsmith, Siju Zhang, Xindi Tang, Patricio Sanhueza, Shanghuo Li, Chang Won Lee, Swagat Ranjan Das, Wenyu Jiao, Xiaofeng Mai, Prasanta Gorai, Yichen Zhang , et al. (10 additional authors not shown)

Abstract: In this paper, we present a detailed analysis of the IRS 17 filament within the intermediate-mass protocluster IRAS 08448-4343 (of $\sim\,10^3\,\rm M_{\odot}$), using ALMA data from the ATOMS 3-mm and QUARKS 1.3-mm surveys. The IRS 17 filament, which spans $\sim$54000 au ($0.26\,\rm pc$) in length and $\sim$4000 au ($0.02\,\rm pc$) in width, exhibits a complex, multi-component velocity field, and… ▽ More In this paper, we present a detailed analysis of the IRS 17 filament within the intermediate-mass protocluster IRAS 08448-4343 (of $\sim\,10^3\,\rm M_{\odot}$), using ALMA data from the ATOMS 3-mm and QUARKS 1.3-mm surveys. The IRS 17 filament, which spans $\sim$54000 au ($0.26\,\rm pc$) in length and $\sim$4000 au ($0.02\,\rm pc$) in width, exhibits a complex, multi-component velocity field, and harbours hierarchical substructures. These substructures include three bundles of seven velocity-coherent fibers, and 29 dense ($n\sim 10^8\,\rm cm^{-3}$) condensations. The fibers have a median length of $\sim 4500\,\rm au$ and a median width of $\sim 1400\,\rm au$. Among these fibers, four are identified as ``fertile", each hosting at least three dense condensations, which are regarded as the ``seeds" of star formation. While the detected cores are randomly spaced within the IRS\,17 filament based on the 3-mm dust continuum image, periodic spacing ($\sim1600\,\rm au$) of condensations is observed in the fertile fibers according to the 1.3-mm dust map, consistent with the predictions of linear isothermal cylinder fragmentation models. These findings underscore the crucial role of fibers in star formation and suggest a hierarchical fragmentation process that extends from the filament to the fibers, and ultimately, to the smallest-scale condensations. △ Less

Submitted 20 October, 2024; originally announced October 2024.

Comments: 19 pages, 10 figures, 4 tables, accepted by ApJ

arXiv:2410.15229 [pdf]

Deep Learning-based Detection of Bacterial Swarm Motion Using a Single Image

Authors: Yuzhu Li, Hao Li, Weijie Chen, Keelan O'Riordan, Neha Mani, Yuxuan Qi, Tairan Liu, Sridhar Mani, Aydogan Ozcan

Abstract: Distinguishing between swarming and swimming, the two principal forms of bacterial movement, holds significant conceptual and clinical relevance. This is because bacteria that exhibit swarming capabilities often possess unique properties crucial to the pathogenesis of infectious diseases and may also have therapeutic potential. Here, we report a deep learning-based swarming classifier that rapidly… ▽ More Distinguishing between swarming and swimming, the two principal forms of bacterial movement, holds significant conceptual and clinical relevance. This is because bacteria that exhibit swarming capabilities often possess unique properties crucial to the pathogenesis of infectious diseases and may also have therapeutic potential. Here, we report a deep learning-based swarming classifier that rapidly and autonomously predicts swarming probability using a single blurry image. Compared with traditional video-based, manually-processed approaches, our method is particularly suited for high-throughput environments and provides objective, quantitative assessments of swarming probability. The swarming classifier demonstrated in our work was trained on Enterobacter sp. SM3 and showed good performance when blindly tested on new swarming (positive) and swimming (negative) test images of SM3, achieving a sensitivity of 97.44% and a specificity of 100%. Furthermore, this classifier demonstrated robust external generalization capabilities when applied to unseen bacterial species, such as Serratia marcescens DB10 and Citrobacter koseri H6. It blindly achieved a sensitivity of 97.92% and a specificity of 96.77% for DB10, and a sensitivity of 100% and a specificity of 97.22% for H6. This competitive performance indicates the potential to adapt our approach for diagnostic applications through portable devices or even smartphones. This adaptation would facilitate rapid, objective, on-site screening for bacterial swarming motility, potentially enhancing the early detection and treatment assessment of various diseases, including inflammatory bowel diseases (IBD) and urinary tract infections (UTI). △ Less

Submitted 19 October, 2024; originally announced October 2024.

Comments: 17 Pages, 4 Figures

arXiv:2410.15061 [pdf, other]

Classifying extended, localized and critical states in quasiperiodic lattices via unsupervised learning

Authors: Bohan Zheng, Siyu Zhu, Xingping Zhou, Tong Liu

Abstract: Classification of quantum phases is one of the most important areas of research in condensed matter physics. In this work, we obtain the phase diagram of one-dimensional quasiperiodic models via unsupervised learning. Firstly, we choose two advanced unsupervised learning algorithms, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering Points To Identify the Clustering… ▽ More Classification of quantum phases is one of the most important areas of research in condensed matter physics. In this work, we obtain the phase diagram of one-dimensional quasiperiodic models via unsupervised learning. Firstly, we choose two advanced unsupervised learning algorithms, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering Points To Identify the Clustering Structure (OPTICS), to explore the distinct phases of Aubry-André-Harper model and quasiperiodic p-wave model. The unsupervised learning results match well with traditional numerical diagonalization. Finally, we compare the similarity of different algorithms and find that the highest similarity between the results of unsupervised learning algorithms and those of traditional algorithms has exceeded 98\%. Our work sheds light on applications of unsupervised learning for phase classification. △ Less

Submitted 19 October, 2024; originally announced October 2024.

arXiv:2410.14361 [pdf, other]

Efficiently Computing Susceptibility to Context in Language Models

Authors: Tianyu Liu, Kevin Du, Mrinmaya Sachan, Ryan Cotterell

Abstract: One strength of modern language models is their ability to incorporate information from a user-input context when answering queries. However, they are not equally sensitive to the subtle changes to that context. To quantify this, Du et al. (2024) gives an information-theoretic metric to measure such sensitivity. Their metric, susceptibility, is defined as the degree to which contexts can influence… ▽ More One strength of modern language models is their ability to incorporate information from a user-input context when answering queries. However, they are not equally sensitive to the subtle changes to that context. To quantify this, Du et al. (2024) gives an information-theoretic metric to measure such sensitivity. Their metric, susceptibility, is defined as the degree to which contexts can influence a model's response to a query at a distributional level. However, exactly computing susceptibility is difficult and, thus, Du et al. (2024) falls back on a Monte Carlo approximation. Due to the large number of samples required, the Monte Carlo approximation is inefficient in practice. As a faster alternative, we propose Fisher susceptibility, an efficient method to estimate the susceptibility based on Fisher information. Empirically, we validate that Fisher susceptibility is comparable to Monte Carlo estimated susceptibility across a diverse set of query domains despite its being $70\times$ faster. Exploiting the improved efficiency, we apply Fisher susceptibility to analyze factors affecting the susceptibility of language models. We observe that larger models are as susceptible as smaller ones. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.13515 [pdf, other]

Observation of a rare beta decay of the charmed baryon with a Graph Neural Network

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: The study of beta decay of the charmed baryon provides unique insights into the fundamental mechanism of the strong and electro-weak interactions. The $Λ_c^+$, being the lightest charmed baryon, undergoes disintegration solely through the charm quark weak decay. Its beta decay provides an ideal laboratory for investigating non-perturbative effects in quantum chromodynamics and for constraining the… ▽ More The study of beta decay of the charmed baryon provides unique insights into the fundamental mechanism of the strong and electro-weak interactions. The $Λ_c^+$, being the lightest charmed baryon, undergoes disintegration solely through the charm quark weak decay. Its beta decay provides an ideal laboratory for investigating non-perturbative effects in quantum chromodynamics and for constraining the fundamental parameters of the Cabibbo-Kobayashi-Maskawa matrix in weak interaction theory. This article presents the first observation of the Cabibbo-suppressed $Λ_c^+$ beta decay into a neutron $Λ_c^+ \rightarrow n e^+ ν_{e}$, based on $4.5~\mathrm{fb}^{-1}$ of electron-positron annihilation data collected with the BESIII detector in the energy region above the $Λ^+_c\barΛ^-_c$ threshold. A novel machine learning technique, leveraging Graph Neural Networks, has been utilized to effectively separate signals from dominant backgrounds, particularly $Λ_c^+ \rightarrow Λe^+ ν_{e}$. This approach has yielded a statistical significance of more than $10σ$. The absolute branching fraction of $Λ_c^+ \rightarrow n e^+ ν_{e}$ is measured to be $(3.57\pm0.34_{\mathrm{stat}}\pm0.14_{\mathrm{syst}})\times 10^{-3}$. For the first time, the CKM matrix element $\left|V_{cd}\right|$ is extracted via a charmed baryon decay to be $0.208\pm0.011_{\rm exp.}\pm0.007_{\rm LQCD}\pm0.001_{τ_{Λ_c^+}}$. This study provides a new probe to further understand fundamental interactions in the charmed baryon sector, and demonstrates the power of modern machine learning techniques in enhancing experimental capability in high energy physics research. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 28 pages, 6 figures

arXiv:2410.13496 [pdf, other]

State Estimation Transformers for Agile Legged Locomotion

Authors: Chen Yu, Yichu Yang, Tianlin Liu, Yangwei You, Mingliang Zhou, Diyun Xiang

Abstract: We propose a state estimation method that can accurately predict the robot's privileged states to push the limits of quadruped robots in executing advanced skills such as jumping in the wild. In particular, we present the State Estimation Transformers (SET), an architecture that casts the state estimation problem as conditional sequence modeling. SET outputs the robot states that are hard to obtai… ▽ More We propose a state estimation method that can accurately predict the robot's privileged states to push the limits of quadruped robots in executing advanced skills such as jumping in the wild. In particular, we present the State Estimation Transformers (SET), an architecture that casts the state estimation problem as conditional sequence modeling. SET outputs the robot states that are hard to obtain directly in the real world, such as the body height and velocities, by leveraging a causally masked Transformer. By conditioning an autoregressive model on the robot's past states, our SET model can predict these privileged observations accurately even in highly dynamic locomotions. We evaluate our methods on three tasks -- running jumping, running backflipping, and running sideslipping -- on a low-cost quadruped robot, Cyberdog2. Results show that SET can outperform other methods in estimation accuracy and transferability in the simulation as well as success rates of jumping and triggering a recovery controller in the real world, suggesting the superiority of such a Transformer-based explicit state estimator in highly dynamic locomotion tasks. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: Accepted by IROS 2024

arXiv:2410.13478 [pdf, other]

Observation of $χ_{c0}\toΣ^{+}\barΣ^{-}η$ and evidence for $χ_{c1,2}\toΣ^{+}\barΣ^{-}η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, the decay $χ_{c0}\toΣ^{+}\barΣ^{-}η$ is observed for the first time with a statistical significance of $7.0σ$, and evidence for $χ_{c1}\toΣ^{+}\barΣ^{-}η$ and $χ_{c2}\toΣ^{+}\barΣ^{-}η$ is found with statistical significances of $4.3σ$ and $4.6σ$, respectively. The branching fractions are determined to be… ▽ More Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, the decay $χ_{c0}\toΣ^{+}\barΣ^{-}η$ is observed for the first time with a statistical significance of $7.0σ$, and evidence for $χ_{c1}\toΣ^{+}\barΣ^{-}η$ and $χ_{c2}\toΣ^{+}\barΣ^{-}η$ is found with statistical significances of $4.3σ$ and $4.6σ$, respectively. The branching fractions are determined to be $\mathcal{B}(χ_{c0}\toΣ^{+}\barΣ^{-}η)=({1.26 \pm 0.20 \pm 0.13}) \times 10^{-4}, ~\mathcal{B}(χ_{c1}\toΣ^{+}\barΣ^{-}η)=({5.10 \pm 1.21 \pm 0.67}) \times 10^{-5}$, and $\mathcal{B}(χ_{c2}\toΣ^{+}\barΣ^{-}η)=({5.46 \pm 1.18 \pm 0.50}) \times 10^{-5}$, where the first uncertainties are statistical, and the second ones are systematic. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.13408 [pdf, other]

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Authors: Chuanyu Tang, Yilong Chen, Zhenyu Zhang, Junyuan Shang, Wenyuan Zhang, Yong Huang, Tingwen Liu

Abstract: Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning… ▽ More Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning and ease of application. To address these challenges, we introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input and efficiently integrates multi-rank information. We firstly propose a new framework that equates the integration of multiple LoRAs to expanding the rank of LoRA. Moreover, we hypothesize that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components. Thus, MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities. MoR achieves impressive results, with MoR delivering a 1.31\% performance improvement while using only 93.93\% of the parameters compared to baseline methods. △ Less

Submitted 17 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

Comments: 11 pages, 7 figures

arXiv:2410.13369 [pdf, other]

A Neutron Capture Explanation for the 10 MeV Emission Line Seen in GRB 221009A

Authors: Jiahuan Zhu, Hua Feng, Tong Liu

Abstract: The brightest ever gamma-ray burst (GRB) 221009A displays a significant emission line component around 10 MeV. As the GRB central engine is neutron-rich, we propose that the emission line could be originally due to the 2.223 MeV gamma-rays following neutron capture with protons. The measured line profile can be adequately fitted with a neutron capture model that involves thermal broadening and a b… ▽ More The brightest ever gamma-ray burst (GRB) 221009A displays a significant emission line component around 10 MeV. As the GRB central engine is neutron-rich, we propose that the emission line could be originally due to the 2.223 MeV gamma-rays following neutron capture with protons. The measured line profile can be adequately fitted with a neutron capture model that involves thermal broadening and a bulk Doppler shift. The spectral modeling reveals a Doppler factor varying from 5.1 to 2.1 for the neutron-rich component, along with a temperature increase from 300 keV to about 900 keV, during the time interval of 280--360 s since the trigger, with about $10^{-2}$ $M_\odot$ deuteriums produced in the process. We argue that neutron capture can take place in the outer shell of a structure jet. Disk winds could be another possible site. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 5 pages, 2 figures, 1 table, Submitted to ApJ Letters on July 9, 2024, referees' reports not received so far

arXiv:2410.13368 [pdf, other]

Observation of the Singly Cabibbo-Suppressed Decay $Λ_c^{+}\to pπ^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Utilizing 4.5${~\rm{fb}}^{-1}$ of $e^+e^-$ annihilation data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 4.600 and 4.699 GeV, the first observation of the singly Cabibbo-suppressed decay $Λ_c^{+}\to pπ^0$ is presented, with a statistical significance of $5.4σ$. The ratio of the branching fractions of $Λ_c^{+}\to pπ^0$ and $Λ_c^{+}\to pη$ is measured… ▽ More Utilizing 4.5${~\rm{fb}}^{-1}$ of $e^+e^-$ annihilation data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 4.600 and 4.699 GeV, the first observation of the singly Cabibbo-suppressed decay $Λ_c^{+}\to pπ^0$ is presented, with a statistical significance of $5.4σ$. The ratio of the branching fractions of $Λ_c^{+}\to pπ^0$ and $Λ_c^{+}\to pη$ is measured as $\mathcal{B}(Λ_c^{+}\to pπ^0)/\mathcal{B}(Λ_c^{+}\to pη)=(0.120\pm0.026_{\rm stat.}\pm0.007_{\rm syst.})$. This result resolves the longstanding discrepancy between earlier experimental searches, providing both a decisive conclusion and valuable input for QCD-inspired theoretical models. A sophisticated deep learning approach using a Transformer-based architecture is employed to distinguish the signal from the prevalent hadronic backgrounds, complemented by thorough validation and systematic uncertainty quantification. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 9 pages, 4 figures

arXiv:2410.13351 [pdf, other]

Representation Learning of Structured Data for Medical Foundation Models

Authors: Vijay Prakash Dwivedi, Viktor Schlegel, Andy T. Liu, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Jeng Wei, Wei-Hsian Yin, Stefan Winkler, Robby T. Tan

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing me… ▽ More Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing medical codes due to the shortcomings of current tokenization methods. As a result, we introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data, which addresses these challenges by adapting subword tokenization techniques specifically for the structured medical codes. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records. Trained on over 1 billion tokens on the internal medical database, the proposed model achieves up to a 23% improvement in evaluation metrics, with around 2% gain attributed to our proposed tokenization. Additionally, when evaluated on the EHRSHOT public benchmark with a 1/1000 fraction of the pre-training data, the UniStruct model improves performance on over 42% of the downstream tasks. Our approach not only enhances the representation and generalization capabilities of patient-centric models but also bridges a critical gap in representation learning models' ability to handle complex structured medical data, alongside unstructured text. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: NeurIPS 2024 Workshop on Unifying Representations in Neural Models (UniReps 2024)

arXiv:2410.13051 [pdf, other]

Supply Chain Network Extraction and Entity Classification Leveraging Large Language Models

Authors: Tong Liu, Hadi Meidani

Abstract: Supply chain networks are critical to the operational efficiency of industries, yet their increasing complexity presents significant challenges in mapping relationships and identifying the roles of various entities. Traditional methods for constructing supply chain networks rely heavily on structured datasets and manual data collection, limiting their scope and efficiency. In contrast, recent adva… ▽ More Supply chain networks are critical to the operational efficiency of industries, yet their increasing complexity presents significant challenges in mapping relationships and identifying the roles of various entities. Traditional methods for constructing supply chain networks rely heavily on structured datasets and manual data collection, limiting their scope and efficiency. In contrast, recent advancements in Natural Language Processing (NLP) and large language models (LLMs) offer new opportunities for discovering and analyzing supply chain networks using unstructured text data. This paper proposes a novel approach that leverages LLMs to extract and process raw textual information from publicly available sources to construct a comprehensive supply chain graph. We focus on the civil engineering sector as a case study, demonstrating how LLMs can uncover hidden relationships among companies, projects, and other entities. Additionally, we fine-tune an LLM to classify entities within the supply chain graph, providing detailed insights into their roles and relationships. The results show that domain-specific fine-tuning improves classification accuracy, highlighting the potential of LLMs for industry-specific supply chain analysis. Our contributions include the development of a supply chain graph for the civil engineering sector, as well as a fine-tuned LLM model that enhances entity classification and understanding of supply chain networks. △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: 11 pages, 4 figures

arXiv:2410.12620 [pdf, other]

Search for $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ at center-of-mass energies from 4.47 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Utilizing a data set of $6.7$ fb$^{-1}$ from electron-positron collisions recorded by the BESIII detector at the BEPCII storage ring, a search is conducted for the processes $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ across center-of-mass energies from 4.47 to 4.95 GeV. In the absence of any significant signals, upper limits are set. These include limits on the Born cross sections for… ▽ More Utilizing a data set of $6.7$ fb$^{-1}$ from electron-positron collisions recorded by the BESIII detector at the BEPCII storage ring, a search is conducted for the processes $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ across center-of-mass energies from 4.47 to 4.95 GeV. In the absence of any significant signals, upper limits are set. These include limits on the Born cross sections for $e^{+}e^{-} \to φχ_{c0}$, as well as the product of the Born cross section for $e^{+}e^{-} \to φη_{c2}(1D)$ and a sum of five branching fractions. Furthermore, the product of the electronic width of $Y(4660)$ and the branching fraction of the $Y(4660) \to φχ_{c0}$, denoted as $Γ^{Y(4660)}_{e^{+}e^{-}} \mathcal{B}_{Y(4660) \to φχ_{c0}}$, is determined to be $< 0.40$ eV at the 90\% confidence level. △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: 14 pages, 6 figures

arXiv:2410.12474 [pdf, other]

Mind the Gap Between Prototypes and Images in Cross-domain Finetuning

Authors: Hongduan Tian, Feng Liu, Zhanke Zhou, Tongliang Liu, Chengqi Zhang, Bo Han

Abstract: In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in s… ▽ More In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representations and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations respectively for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve minimal validation loss at the enlarged gap. △ Less

Submitted 20 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.12089 [pdf, other]

BICEP/Keck XVIII: Measurement of BICEP3 polarization angles and consequences for constraining cosmic birefringence and inflation

Authors: BICEP/Keck Collaboration, :, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, D. Beck, J. J. Bock, H. Boenish, V. Buza, J. R. Cheshire IV, J. Connors, J. Cornelison, M. Crumrine, A. J. Cukierman, E. Denison, L. Duband, M. Eiben, B. D. Elwood, S. Fatigoni, J. P. Filippini, A. Fortes, M. Gao , et al. (60 additional authors not shown)

Abstract: We use a custom-made calibrator to measure individual detectors' polarization angles of BICEP3, a small aperture telescope observing the cosmic microwave background (CMB) at 95GHz from the South Pole. We describe our calibration strategy and the statistical and systematic uncertainties associated with the measurement. We reach an unprecedented precision for such measurement on a CMB experiment, wi… ▽ More We use a custom-made calibrator to measure individual detectors' polarization angles of BICEP3, a small aperture telescope observing the cosmic microwave background (CMB) at 95GHz from the South Pole. We describe our calibration strategy and the statistical and systematic uncertainties associated with the measurement. We reach an unprecedented precision for such measurement on a CMB experiment, with a repeatability for each detector pair of $0.02°$. We show that the relative angles measured using this method are in excellent agreement with those extracted from CMB data. Because the absolute measurement is currently limited by a systematic uncertainty, we do not derive cosmic birefringence constraints from BICEP3 data in this work. Rather, we forecast the sensitivity of BICEP3 sky maps for such analysis. We investigate the relative contributions of instrument noise, lensing, and dust, as well as astrophysical and instrumental systematics. We also explore the constraining power of different angle estimators, depending on analysis choices. We establish that the BICEP3 2-year dataset (2017--2018) has an on-sky sensitivity to the cosmic birefringence angle of $σ= 0.078°$, which could be improved to $σ= 0.055°$ by adding all of the existing BICEP3 data (through 2023). Furthermore, we emphasize the possibility of using the BICEP3 sky patch as a polarization calibration source for CMB experiments, which with the present data could reach a precision of $0.035°$. Finally, in the context of inflation searches, we investigate the impact of detector-to-detector variations in polarization angles as they may bias the tensor-to-scalar ratio r. We show that while the effect is expected to remain subdominant to other sources of systematic uncertainty, it can be reliably calibrated using polarization angle measurements such as the ones we present in this paper. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 29 Pages, 17 Figures, 6 Tables, as submitted to PRD

arXiv:2410.11607 [pdf, other]

Observation of $χ_{cJ}\to p \bar p K^0_S K^- π^+ + c.c.$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (648 additional authors not shown)

Abstract: By analyzing $(27.12\pm0.14)\times10^8$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays of $χ_{cJ} \to p \bar{p} K^0_S K^- π^+ +c.c.(J=0, 1, 2)$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be… ▽ More By analyzing $(27.12\pm0.14)\times10^8$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays of $χ_{cJ} \to p \bar{p} K^0_S K^- π^+ +c.c.(J=0, 1, 2)$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\to p \bar p K^{0}_{S} K^- π^+ + c.c.)=(2.61\pm0.27\pm0.32)\times10^{-5},$ $\mathcal{B}(χ_{c1}\to p \bar p K^{0}_{S} K^- π^+ + c.c.)=(4.16\pm0.24\pm0.46)\times10^{-5},$ and $\mathcal{B}(χ_{c2}\to p \bar p K^{0}_{S} K^- π^+ + c.c.)=(5.63\pm0.28\pm0.46)\times10^{-5}$, respectively. The processes $χ_{c1,2} \to \bar{p} Λ(1520) K^0_S π^{+} + c.c.$ are also observed, with statistical significances of 5.7$σ$ and 7.0$σ$, respectively. Evidence for $χ_{c0} \to\bar{p} Λ(1520) K^0_S π^{+} + c.c.$ is found with statistical significances of 3.3$σ$ each. The corresponding branching fractions are determined to be $\mathcal{B}(χ_{c0}\to \bar{p} Λ(1520) K^0_S π^{+} + c.c.) =(1.61^{+0.68}_{-0.64}\pm0.23)\times10^{-5}$, $\mathcal{B}(χ_{c1}\to \bar{p} Λ(1520) K^0_S π^{+} + c.c.)=(4.06^{+0.80}_{-0.76}\pm0.52)\times10^{-5}$, and $\mathcal{B}(χ_{c2}\to \bar{p} Λ(1520) K^0_S π^{+} + c.c.)=(4.09^{+0.87}_{-0.84}\pm0.42)\times10^{-5}$. Here, the first uncertainties are statistical and the second ones are systematic. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 12 pages, 5 figures

arXiv:2410.10118 [pdf, other]

Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning

Authors: Yuxuan Ren, Dihan Zheng, Chang Liu, Peiran Jin, Yu Shi, Lin Huang, Jiyan He, Shengjie Luo, Tao Qin, Tie-Yan Liu

Abstract: In recent years, machine learning has demonstrated impressive capability in handling molecular science tasks. To support various molecular properties at scale, machine learning models are trained in the multi-task learning paradigm. Nevertheless, data of different molecular properties are often not aligned: some quantities, e.g. equilibrium structure, demand more cost to compute than others, e.g.… ▽ More In recent years, machine learning has demonstrated impressive capability in handling molecular science tasks. To support various molecular properties at scale, machine learning models are trained in the multi-task learning paradigm. Nevertheless, data of different molecular properties are often not aligned: some quantities, e.g. equilibrium structure, demand more cost to compute than others, e.g. energy, so their data are often generated by cheaper computational methods at the cost of lower accuracy, which cannot be directly overcome through multi-task learning. Moreover, it is not straightforward to leverage abundant data of other tasks to benefit a particular task. To handle such data heterogeneity challenges, we exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches that allow different tasks to exchange information directly so as to improve one another. Particularly, we demonstrate that the more accurate energy data can improve the accuracy of structure prediction. We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction, demonstrating a broad capability for integrating heterogeneous data. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: Published as a conference paper at NeurIPS 2024

arXiv:2410.10100 [pdf, other]

Could the inter-band lag of active galactic nucleus vary randomly?

Authors: Zhen-Bo Su, Zhen-Yi Cai, Jun-Xian Wang, Tinggui Wang, Yongquan Xue, Min-Xuan Cai, Lulu Fan, Hengxiao Guo, Zhicheng He, Zizhao He, Xu-Fan Hu, Ji-an Jiang, Ning Jiang, Wen-Yong Kang, Lei Lei, Guilin Liu, Teng Liu, Zhengyan Liu, Zhenfeng Sheng, Mouyuan Sun, Wen Zhao

Abstract: The inter-band lags among the optical broad-band continua of active galactic nuclei (AGNs) have been intensively explored over the past decade. However, the nature of the lags remains under debate. Here utilizing two distinct scenarios for AGN variability, i.e., the thermal fluctuation of accretion disk and the reprocessing of both the accretion disk and clouds in the broad line region, we show th… ▽ More The inter-band lags among the optical broad-band continua of active galactic nuclei (AGNs) have been intensively explored over the past decade. However, the nature of the lags remains under debate. Here utilizing two distinct scenarios for AGN variability, i.e., the thermal fluctuation of accretion disk and the reprocessing of both the accretion disk and clouds in the broad line region, we show that, owing to the random nature of AGN variability, the inter-band lags of an individual AGN would vary from one campaign with a finite baseline to another. Specifically, the thermal fluctuation scenario implies larger variations in the lags than the reprocessing scenario. Moreover, the former predicts a positive correlation between the lag and variation amplitude, while the latter does not result in such a correlation. For both scenarios, averaging the lags of an individual AGN measured with repeated and non-overlapping campaigns would give rise to a stable lag, which is larger for a longer baseline and gets saturation for a sufficiently long baseline. However, obtaining the stable lag for an individual AGN is very time-consuming. Alternatively, it can be equivalently inferred by averaging the lags of a sample of AGNs with similar physical properties, thus can be properly compared with predictions of AGN models. In addition, discussed are several new observational tests suggested by our simulations as well as the role of the deep high-cadence surveys of the Wide Field Survey Telescope in enriching our knowledge of the lags. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: 16 pages, 10 figures. Accepted for publication in Astrophysical Journal, comments are welcome!

arXiv:2410.09908 [pdf, other]

Retrieval Instead of Fine-tuning: A Retrieval-based Parameter Ensemble for Zero-shot Learning

Authors: Pengfei Jin, Peng Shu, Sekeun Kim, Qing Xiao, Sifan Song, Cheng Chen, Tianming Liu, Xiang Li, Quanzheng Li

Abstract: Foundation models have become a cornerstone in deep learning, with techniques like Low-Rank Adaptation (LoRA) offering efficient fine-tuning of large models. Similarly, methods such as Retrieval-Augmented Generation (RAG), which leverage vectorized databases, have further improved model performance by grounding outputs in external information. While these approaches have demonstrated notable succe… ▽ More Foundation models have become a cornerstone in deep learning, with techniques like Low-Rank Adaptation (LoRA) offering efficient fine-tuning of large models. Similarly, methods such as Retrieval-Augmented Generation (RAG), which leverage vectorized databases, have further improved model performance by grounding outputs in external information. While these approaches have demonstrated notable success, they often require extensive training or labeled data, which can limit their adaptability in resource-constrained environments. To address these challenges, we introduce Retrieval-based Parameter Ensemble (RPE), a new method that creates a vectorized database of LoRAs, enabling efficient retrieval and application of model adaptations to new tasks. RPE minimizes the need for extensive training and eliminates the requirement for labeled data, making it particularly effective for zero-shot learning. Additionally, RPE is well-suited for privacy-sensitive domains like healthcare, as it modifies model parameters without accessing raw data. When applied to tasks such as medical report generation and image segmentation, RPE not only proved effective but also surpassed supervised fine-tuning methods in certain cases, highlighting its potential to enhance both computational efficiency and privacy in deep learning applications. △ Less

Submitted 13 October, 2024; originally announced October 2024.

arXiv:2410.09845 [pdf, other]

Understanding Robustness of Parameter-Efficient Tuning for Image Classification

Authors: Jiacheng Ruan, Xian Gao, Suncheng Xiang, Mingye Xie, Ting Liu, Yuzhuo Fu

Abstract: Parameter-efficient tuning (PET) techniques calibrate the model's predictions on downstream tasks by freezing the pre-trained models and introducing a small number of learnable parameters. However, despite the numerous PET methods proposed, their robustness has not been thoroughly investigated. In this paper, we systematically explore the robustness of four classical PET techniques (e.g., VPT, Ada… ▽ More Parameter-efficient tuning (PET) techniques calibrate the model's predictions on downstream tasks by freezing the pre-trained models and introducing a small number of learnable parameters. However, despite the numerous PET methods proposed, their robustness has not been thoroughly investigated. In this paper, we systematically explore the robustness of four classical PET techniques (e.g., VPT, Adapter, AdaptFormer, and LoRA) under both white-box attacks and information perturbations. For white-box attack scenarios, we first analyze the performance of PET techniques using FGSM and PGD attacks. Subsequently, we further explore the transferability of adversarial samples and the impact of learnable parameter quantities on the robustness of PET methods. Under information perturbation attacks, we introduce four distinct perturbation strategies, including Patch-wise Drop, Pixel-wise Drop, Patch Shuffle, and Gaussian Noise, to comprehensively assess the robustness of these PET techniques in the presence of information loss. Via these extensive studies, we enhance the understanding of the robustness of PET methods, providing valuable insights for improving their performance in computer vision applications. The code is available at https://github.com/JCruan519/PETRobustness. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: 5 pages, 2 figures. Work in Progress

arXiv:2410.09674 [pdf, other]

EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis

Authors: Yi Pan, Hanqi Jiang, Junhao Chen, Yiwei Li, Huaqin Zhao, Yifan Zhou, Peng Shu, Zihao Wu, Zhengliang Liu, Dajiang Zhu, Xiang Li, Yohannes Abate, Tianming Liu

Abstract: Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware. Significant advancements have been made in SNN-based convolutional neural networks (CNNs) and Transformer architectures. However, their applications in the medical imaging domain remain un… ▽ More Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware. Significant advancements have been made in SNN-based convolutional neural networks (CNNs) and Transformer architectures. However, their applications in the medical imaging domain remain underexplored. In this study, we introduce EG-SpikeFormer, an SNN architecture designed for clinical tasks that integrates eye-gaze data to guide the model's focus on diagnostically relevant regions in medical images. This approach effectively addresses shortcut learning issues commonly observed in conventional models, especially in scenarios with limited clinical data and high demands for model reliability, generalizability, and transparency. Our EG-SpikeFormer not only demonstrates superior energy efficiency and performance in medical image classification tasks but also enhances clinical relevance. By incorporating eye-gaze data, the model improves interpretability and generalization, opening new directions for the application of neuromorphic computing in healthcare. △ Less

Submitted 12 October, 2024; originally announced October 2024.

arXiv:2410.09540 [pdf, ps, other]

Effects of orbital eccentricity on continuous gravitational waveforms from triaxially-deformed precessing neutron stars in tight binaries

Authors: Wen-Fan Feng, Tan Liu, Yan Wang, Lijing Shao

Abstract: The successful detection of continuous gravitational waves (GWs) from spinning neutron stars (NSs) will shape our understanding of the physical properties of dense matter under extreme conditions. Binary population synthesis simulations show that forthcoming space-borne GW detectors may be capable of detecting some tight Galactic double NSs (DNSs) with 10-minute orbital periods. Successfully searc… ▽ More The successful detection of continuous gravitational waves (GWs) from spinning neutron stars (NSs) will shape our understanding of the physical properties of dense matter under extreme conditions. Binary population synthesis simulations show that forthcoming space-borne GW detectors may be capable of detecting some tight Galactic double NSs (DNSs) with 10-minute orbital periods. Successfully searching for continuous GWs from such a close DNS demands extremely precise waveform templates considering the interaction between the NS and its companion. Unlike the isolated formation channel, the DNSs from the dynamical formation channel have moderate to high orbital eccentricities. To accommodate these systems, we generalize the analytical waveforms from triaxial nonaligned NSs under spin-orbit coupling derived by Feng et al. [Phys. Rev. D 108, 063035 (2023) {https://journals.aps.org/prd/abstract/10.1103/PhysRevD.108.063035}] to incorporate the effects of the orbital eccentricity. Our findings suggest that for DNSs formed through isolated binary evolution, the impact of eccentricity on the continuous GWs of their NSs can be neglected. In contrast, for DNSs formed through dynamical processes, it is necessary to consider eccentricity, as high-eccentricity orbits can result in a fitting factor of $\lesssim 0.97$ within approximately 0.5 to 2 years of a coherent search. Once the GWs from spinning NSs in tight binaries are detected, the relative measurement accuracy of eccentricity can reach $Δe / e \sim O(10^{-7})$ for a signal-to-noise ratio of $O(100)$ based on the Fisher information matrix, bearing significant implications for understanding the formation mechanisms of DNSs. △ Less

Submitted 12 October, 2024; originally announced October 2024.

arXiv:2410.09401 [pdf, other]

A Novel Approach to Malicious Code Detection Using CNN-BiLSTM and Feature Fusion

Authors: Lixia Zhang, Tianxu Liu, Kaihui Shen, Cheng Chen

Abstract: With the rapid advancement of Internet technology, the threat of malware to computer systems and network security has intensified. Malware affects individual privacy and security and poses risks to critical infrastructures of enterprises and nations. The increasing quantity and complexity of malware, along with its concealment and diversity, challenge traditional detection techniques. Static detec… ▽ More With the rapid advancement of Internet technology, the threat of malware to computer systems and network security has intensified. Malware affects individual privacy and security and poses risks to critical infrastructures of enterprises and nations. The increasing quantity and complexity of malware, along with its concealment and diversity, challenge traditional detection techniques. Static detection methods struggle against variants and packed malware, while dynamic methods face high costs and risks that limit their application. Consequently, there is an urgent need for novel and efficient malware detection techniques to improve accuracy and robustness. This study first employs the minhash algorithm to convert binary files of malware into grayscale images, followed by the extraction of global and local texture features using GIST and LBP algorithms. Additionally, the study utilizes IDA Pro to decompile and extract opcode sequences, applying N-gram and tf-idf algorithms for feature vectorization. The fusion of these features enables the model to comprehensively capture the behavioral characteristics of malware. In terms of model construction, a CNN-BiLSTM fusion model is designed to simultaneously process image features and opcode sequences, enhancing classification performance. Experimental validation on multiple public datasets demonstrates that the proposed method significantly outperforms traditional detection techniques in terms of accuracy, recall, and F1 score, particularly in detecting variants and obfuscated malware with greater stability. The research presented in this paper offers new insights into the development of malware detection technologies, validating the effectiveness of feature and model fusion, and holds promising application prospects. △ Less

Submitted 12 October, 2024; originally announced October 2024.

arXiv:2410.08613 [pdf, other]

Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation

Authors: Zhe Dong, Yuzhe Sun, Yanfeng Gu, Tianzhu Liu

Abstract: Given a natural language expression and a remote sensing image, the goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression. In contrast to natural scenarios, expressions in RRSIS often involve complex geospatial relationships, with target objects of interest that vary significantly in scale and lack… ▽ More Given a natural language expression and a remote sensing image, the goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression. In contrast to natural scenarios, expressions in RRSIS often involve complex geospatial relationships, with target objects of interest that vary significantly in scale and lack visual saliency, thereby increasing the difficulty of achieving precise segmentation. To address the aforementioned challenges, a novel RRSIS framework is proposed, termed the cross-modal bidirectional interaction model (CroBIM). Specifically, a context-aware prompt modulation (CAPM) module is designed to integrate spatial positional relationships and task-specific knowledge into the linguistic features, thereby enhancing the ability to capture the target object. Additionally, a language-guided feature aggregation (LGFA) module is introduced to integrate linguistic information into multi-scale visual features, incorporating an attention deficit compensation mechanism to enhance feature aggregation. Finally, a mutual-interaction decoder (MID) is designed to enhance cross-modal feature alignment through cascaded bidirectional cross-attention, thereby enabling precise segmentation mask prediction. To further forster the research of RRSIS, we also construct RISBench, a new large-scale benchmark dataset comprising 52,472 image-language-label triplets. Extensive benchmarking on RISBench and two other prevalent datasets demonstrates the superior performance of the proposed CroBIM over existing state-of-the-art (SOTA) methods. The source code for CroBIM and the RISBench dataset will be publicly available at https://github.com/HIT-SIRS/CroBIM △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.08603 [pdf, other]

Observation of $D^+\toη^\primeμ^+ν_μ$ and First Study of $D^+\to η^\prime \ell^+ν_\ell$ Decay Dynamics

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy 3.773\,GeV with the BESIII detector, we report the first observation of the semileptonic decay $D^+\to η^\prime μ^+ν_μ$ with significance of $8.6σ$ including systematic uncertainties, and an improved measurement of $D^+\to η^\prime e^+ν_e$. The branching fractions of $D^+\to η^\prime μ^+ν_μ$ and… ▽ More Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy 3.773\,GeV with the BESIII detector, we report the first observation of the semileptonic decay $D^+\to η^\prime μ^+ν_μ$ with significance of $8.6σ$ including systematic uncertainties, and an improved measurement of $D^+\to η^\prime e^+ν_e$. The branching fractions of $D^+\to η^\prime μ^+ν_μ$ and $D^+\to η^\prime e^+ν_e$ are determined to be $(1.92\pm0.28_{\rm stat}\pm 0.08_{\rm syst})\times 10^{-4}$ and $(1.79\pm0.19_{\rm stat}\pm 0.07_{\rm syst})\times 10^{-4}$, respectively. From an analysis of the $D^+\to η^\prime \ell^+ν_\ell$ decay dynamics, the product of the hadronic form factor $f_+^{η^{\prime}}(0)$ and the CKM matrix element $|V_{cd}|$ is measured for the first time, giving $f^{η^\prime}_+(0)|V_{cd}| = (5.92\pm0.56_{\rm stat}\pm0.13_{\rm syst})\times 10^{-2}$. No evidence for violation of $μ-e$ lepton-flavor universality is found in both the full range and several bins of $\ell^+ν_\ell$ four-momentum transfer. The $η-η^\prime$ mixing angle in the quark flavor basis is determined to be $φ_{\rm P} =(39.8\pm0.8_{\rm stat}\pm0.3_{\rm syst})^\circ$. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.07985 [pdf, other]

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Authors: Bofei Gao, Feifan Song, Zhe Yang, Zefan Cai, Yibo Miao, Qingxiu Dong, Lei Li, Chenghao Ma, Liang Chen, Runxin Xu, Zhengyang Tang, Benyou Wang, Daoguang Zan, Shanghaoran Quan, Ge Zhang, Lei Sha, Yichang Zhang, Xuancheng Ren, Tianyu Liu, Baobao Chang

Abstract: Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging bench… ▽ More Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs' mathematical reasoning at the Olympiad level. Unlike existing Olympiad-related benchmarks, our dataset focuses exclusively on mathematics and comprises a vast collection of 4428 competition-level problems with rigorous human annotation. These problems are meticulously categorized into over 33 sub-domains and span more than 10 distinct difficulty levels, enabling a holistic assessment of model performance in Olympiad-mathematical reasoning. Furthermore, we conducted an in-depth analysis based on this benchmark. Our experimental results show that even the most advanced models, OpenAI o1-mini and OpenAI o1-preview, struggle with highly challenging Olympiad-level problems, with 60.54% and 52.55% accuracy, highlighting significant challenges in Olympiad-level mathematical reasoning. △ Less

Submitted 10 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

Comments: 26 Pages, 17 Figures

arXiv:2410.07626 [pdf, other]

Precision Measurement of the Branching Fraction of $D^{+}\to μ^{+}ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of $E_{\rm cm}=3.773$ GeV with the BESIII detector operating at the BEPCII collider, we determine the branching fraction of the leptonic decay $D^+\toμ^+ν_μ$ to be $(3.981\pm0.079_{\rm stat}\pm0.040_{\rm syst})\times10^{-4}$. Interpreting our measurement with knowledge of the Fermi coupling constant… ▽ More Using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of $E_{\rm cm}=3.773$ GeV with the BESIII detector operating at the BEPCII collider, we determine the branching fraction of the leptonic decay $D^+\toμ^+ν_μ$ to be $(3.981\pm0.079_{\rm stat}\pm0.040_{\rm syst})\times10^{-4}$. Interpreting our measurement with knowledge of the Fermi coupling constant $G_F$, the masses of the $D^+$ and $μ^+$ as well as the lifetime of the $D^+$, we determine $f_{D^+}|V_{cd}|=(47.53\pm0.48_{\rm stat}\pm0.24_{\rm syst}\pm0.12_{\rm input})~\mathrm{MeV}$. This result is a factor of 2.3 more precise than the previous best measurement. Using the value of the magnitude of the Cabibbo-Kobayashi-Maskawa matrix element $|V_{cd}|$ given by the global standard model fit, we obtain the $D^+$ decay constant $f_{D^+}=(211.5\pm2.3_{\rm stat}\pm1.1_{\rm syst}\pm0.8_{\rm input})$ MeV. Alternatively, using the value of $f_{D^+}$ from a precise lattice quantum chromodynamics calculation, we extract $|V_{cd}|=0.2242\pm0.0023_{\rm stat}\pm0.0011_{\rm syst}\pm0.0009_{\rm input}$. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 9 pages, 2 figures

arXiv:2410.07594 [pdf]

Design and Characterization of High Efficiency Single-stage Electromagnetic Coil Guns

Authors: Sophia Chen, Annie Peng, Ava Chen, Takyiu Liu

Abstract: This study presents several novel approaches to improve the efficiency of a single-stage coil gun. Conventional designs typically feature a uniformly wound solenoid and a ferrite projectile. For our research, we constructed a microcontroller-based prototype to test several new enhancements, including the use of a bipolar current pulse, a stepped multilayer coil with non-uniform winding densities,… ▽ More This study presents several novel approaches to improve the efficiency of a single-stage coil gun. Conventional designs typically feature a uniformly wound solenoid and a ferrite projectile. For our research, we constructed a microcontroller-based prototype to test several new enhancements, including the use of a bipolar current pulse, a stepped multilayer coil with non-uniform winding densities, and the replacement of conventional ferrite projectiles with a neodymium permanent magnet. These modifications were designed to reduce energy loss and improve projectile acceleration by changing magnetic field strength and effectively controlling the magnetic flux. The experimental results show that the proposed methods resulted in significant efficiency improvements, with the varying current pulse and stepped coil design providing enhanced magnetic force at key points in the projectile's path, and the permanent magnet projectile contributing to higher velocities and efficiencies by leveraging the current pulses. Our findings suggest that combining these enhancements significantly improves coil gun performance, achieving higher velocities and efficiencies. These findings can be applied to future coil gun developments, such as multi-stage coil gun systems. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 10 pages, 23 figures

arXiv:2410.07524 [pdf, other]

Upcycling Large Language Models into Mixture of Experts

Authors: Ethan He, Abhinav Khattar, Ryan Prenger, Vijay Korthikanti, Zijie Yan, Tong Liu, Shiqing Fan, Ashwath Aithal, Mohammad Shoeybi, Bryan Catanzaro

Abstract: Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques for upcycling at scale remain unclear. In this work, we conduct an extensive study of upcycling methods and hyperparameters for billion-parameter scale language models. We propose a novel "virtual grou… ▽ More Upcycling pre-trained dense language models into sparse mixture-of-experts (MoE) models is an efficient approach to increase the model capacity of already trained models. However, optimal techniques for upcycling at scale remain unclear. In this work, we conduct an extensive study of upcycling methods and hyperparameters for billion-parameter scale language models. We propose a novel "virtual group" initialization scheme and weight scaling approach to enable upcycling into fine-grained MoE architectures. Through ablations, we find that upcycling outperforms continued dense model training. In addition, we show that softmax-then-topK expert routing improves over topK-then-softmax approach and higher granularity MoEs can help improve accuracy. Finally, we upcycled Nemotron-4 15B on 1T tokens and compared it to a continuously trained version of the same model on the same 1T tokens: the continuous trained model achieved 65.3% MMLU, whereas the upcycled model achieved 67.6%. Our results offer insights and best practices to effectively leverage upcycling for building MoE language models. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.07198 [pdf, other]

Nonlinear Coupling between Magnetic Gears

Authors: Tianchi Liu

Abstract: This study investigates the complex nonlinear coupling of magnetic gears arranged in proximity on a plane. Acknowledging the rich array of geometric and electromagnetic parameters involved, we initiate our exploration with a simplified model. By nondimensionalizing the key variables, we derive a novel nonlinear dynamics framework that abstracts away electromagnetic dependencies. Our approach inclu… ▽ More This study investigates the complex nonlinear coupling of magnetic gears arranged in proximity on a plane. Acknowledging the rich array of geometric and electromagnetic parameters involved, we initiate our exploration with a simplified model. By nondimensionalizing the key variables, we derive a novel nonlinear dynamics framework that abstracts away electromagnetic dependencies. Our approach includes analatic results, numerical simulations and experimental validation, emphasizing two distinct operational modes: free motion and constant driving motion. The findings highlight the intricate behaviors arising from these interactions, contributing to a deeper understanding of magnetic gear dynamics. △ Less

Submitted 24 September, 2024; originally announced October 2024.

Comments: 11 pages, 7 figures

arXiv:2410.07033 [pdf, ps, other]

Probing blackbody components in gamma-ray bursts from black hole neutrino-dominated accretion flows

Authors: Xiao-Yan Li, Tong Liu, Bao-Quan Huang, Guo-Yu Li, Da-Bin Lin, Zhi-Lin Chen, Yun Wang

Abstract: A stellar-mass black hole (BH) surrounded by a neutrino-dominated accretion flow (NDAF) is generally considered to be the central engine of gamma-ray bursts (GRBs). Neutrinos escaping from the disk will annihilate out of the disk to produce the fireball that could power GRBs with blackbody (BB) components. The initial GRB jet power and fireball launch radius are related to the annihilation luminos… ▽ More A stellar-mass black hole (BH) surrounded by a neutrino-dominated accretion flow (NDAF) is generally considered to be the central engine of gamma-ray bursts (GRBs). Neutrinos escaping from the disk will annihilate out of the disk to produce the fireball that could power GRBs with blackbody (BB) components. The initial GRB jet power and fireball launch radius are related to the annihilation luminosity and annihilation height of the NDAFs, respectively. In this paper, we collect 7 GRBs with known redshifts and identified BB components to test whether the NDAF model works. We find that, in most cases, the values of the accretion rates and the central BH properties are all in the reasonable range, suggesting that these BB components indeed originate from the neutrino annihilation process. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 8 pages, 1 table, 1 figure, accepted for publication in ApJ

arXiv:2410.06811 [pdf, other]

Rethinking the Evaluation of Visible and Infrared Image Fusion

Authors: Dayan Guan, Yixuan Wu, Tianzhu Liu, Alex C. Kot, Yanfeng Gu

Abstract: Visible and Infrared Image Fusion (VIF) has garnered significant interest across a wide range of high-level vision tasks, such as object detection and semantic segmentation. However, the evaluation of VIF methods remains challenging due to the absence of ground truth. This paper proposes a Segmentation-oriented Evaluation Approach (SEA) to assess VIF methods by incorporating the semantic segmentat… ▽ More Visible and Infrared Image Fusion (VIF) has garnered significant interest across a wide range of high-level vision tasks, such as object detection and semantic segmentation. However, the evaluation of VIF methods remains challenging due to the absence of ground truth. This paper proposes a Segmentation-oriented Evaluation Approach (SEA) to assess VIF methods by incorporating the semantic segmentation task and leveraging segmentation labels available in latest VIF datasets. Specifically, SEA utilizes universal segmentation models, capable of handling diverse images and classes, to predict segmentation outputs from fused images and compare these outputs with segmentation labels. Our evaluation of recent VIF methods using SEA reveals that their performance is comparable or even inferior to using visible images only, despite nearly half of the infrared images demonstrating better performance than visible images. Further analysis indicates that the two metrics most correlated to our SEA are the gradient-based fusion metric $Q_{\text{ABF}}$ and the visual information fidelity metric $Q_{\text{VIFF}}$ in conventional VIF evaluation metrics, which can serve as proxies when segmentation labels are unavailable. We hope that our evaluation will guide the development of novel and practical VIF methods. The code has been released in \url{https://github.com/Yixuan-2002/SEA/}. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: The code has been released in \url{https://github.com/Yixuan-2002/SEA/}

arXiv:2410.06730 [pdf, other]

Systematic collapse of the accretion disc in AGN confirmed by UV photometry and broad line spectra

Authors: Jia-Lai Kang, Chris Done, Scott Hagen, Matthew J. Temple, John D. Silverman, Junyao Li, Teng Liu

Abstract: A recent study on the spectral energy distribution (SED) of AGN combined unobscured X-ray sources from the eROSITA eFEDS Survey with high quality optical imaging from Subaru's Hyper Suprime-Cam (HSC). The HSC data enabled accurate host galaxy subtraction as well as giving a uniform black hole mass estimator from the stellar mass. The resulting stacked optical/X-ray SEDs for black holes at fixed ma… ▽ More A recent study on the spectral energy distribution (SED) of AGN combined unobscured X-ray sources from the eROSITA eFEDS Survey with high quality optical imaging from Subaru's Hyper Suprime-Cam (HSC). The HSC data enabled accurate host galaxy subtraction as well as giving a uniform black hole mass estimator from the stellar mass. The resulting stacked optical/X-ray SEDs for black holes at fixed mass show a dramatic transition, where the dominating disc component in bright AGN evaporates into an X-ray hot plasma below $L/L_{\rm Edd}\sim 0.01$. The models fit to these datasets predicted the largest change in SED in the rest frame UV ($< 3000\,Å$), but this waveband was not included in the original study. Here we use archival $u$-band and UV photometry to extend the SEDs into this range, and confirm the UV is indeed intrinsically faint in AGN below $L/L_{\rm Edd}\sim 0.01$ as predicted. This dramatic drop in UV photo-ionising flux is also seen from its effect on the broad emission lines. We stack the recently released SDSS DR18 optical spectra for this sample, and show that the broad H$β$ line disappears along with the UV bright component at $L/L_{\rm Edd}\sim 0.01$. This shows that there is a population of unobscured, X-ray bright, UV faint AGN which lack broad emission lines (true type 2 Seyferts). △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 10 pages, 3 figure, 2 appendices. Submitted to MNRAS. Comments are very welcome!

arXiv:2410.06511 [pdf, other]

TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

Authors: Wanchao Liang, Tianyu Liu, Less Wright, Will Constable, Andrew Gu, Chien-Chin Huang, Iris Zhang, Wei Feng, Howard Huang, Junjie Wang, Sanket Purandare, Gokul Nadathur, Stratos Idreos

Abstract: The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, exist… ▽ More The development of large language models (LLMs) has been instrumental in advancing state-of-the-art natural language processing applications. Training LLMs with billions of parameters and trillions of tokens require sophisticated distributed systems that enable composing and comparing several state-of-the-art techniques in order to efficiently scale across thousands of accelerators. However, existing solutions are complex, scattered across multiple libraries/repositories, lack interoperability, and are cumbersome to maintain. Thus, curating and empirically comparing training recipes require non-trivial engineering effort. This paper introduces TorchTitan, an open-source, PyTorch-native distributed training system that unifies state-of-the-art techniques, streamlining integration and reducing overhead. TorchTitan enables 3D parallelism in a modular manner with elastic scaling, providing comprehensive logging, checkpointing, and debugging tools for production-ready training. It also incorporates hardware-software co-designed solutions, leveraging features like Float8 training and SymmetricMemory. As a flexible test bed, TorchTitan facilitates custom recipe curation and comparison, allowing us to develop optimized training recipes for Llama 3.1 and provide guidance on selecting techniques for maximum efficiency based on our experiences. We thoroughly assess TorchTitan on the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its exceptional performance, modular composability, and elastic scalability. By stacking training optimizations, we demonstrate accelerations of 65.08% with 1D parallelism at the 128-GPU scale (Llama 3.1 8B), an additional 12.59% with 2D parallelism at the 256-GPU scale (Llama 3.1 70B), and an additional 30% with 3D parallelism at the 512-GPU scale (Llama 3.1 405B) on NVIDIA H100 GPUs over optimized baselines. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.06500 [pdf, other]

Search for the radiative decays $D^+\toγρ^+$ and $D^+\toγK^{*+}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (648 additional authors not shown)

Abstract: We search for the radiative decays $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ using 20.3~fb$^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and the upper limits on the branching fractions of $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ at 90\% confidence level ar… ▽ More We search for the radiative decays $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ using 20.3~fb$^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and the upper limits on the branching fractions of $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ at 90\% confidence level are set to be $1.3\times10^{-5}$ and $1.8\times10^{-5}$, respectively. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05808 [pdf, other]

Vision Transformer based Random Walk for Group Re-Identification

Authors: Guoqing Zhang, Tianqi Liu, Wenxuan Fang, Yuhui Zheng

Abstract: Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of group layout changes. To this end, we propose a no… ▽ More Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of group layout changes. To this end, we propose a novel vision transformer based random walk framework for group re-ID. Specifically, we design a vision transformer based on a monocular depth estimation algorithm to construct a graph through the average depth value of pedestrian features to fully consider the impact of camera distance on group members relationships. In addition, we propose a random walk module to reconstruct the graph by calculating affinity scores between target and gallery images to remove pedestrians who do not belong to the current group. Experimental results show that our framework is superior to most methods. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 6 pages

arXiv:2410.05736 [pdf, ps, other]

Observation of an axial-vector state in the study of $ψ(3686) \to φηη'$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (625 additional authors not shown)

Abstract: Using (2712.4 $\pm$ 14.3)$\times 10^{6}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, a partial wave analysis of the decay $ψ(3686) \to φηη' $ is performed with the covariant tensor approach. An axial-vector state with a mass near 2.3 $\rm GeV/c^2$ is observed for the first time. Its mass and width are measured to be 2316… ▽ More Using (2712.4 $\pm$ 14.3)$\times 10^{6}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, a partial wave analysis of the decay $ψ(3686) \to φηη' $ is performed with the covariant tensor approach. An axial-vector state with a mass near 2.3 $\rm GeV/c^2$ is observed for the first time. Its mass and width are measured to be 2316 $\pm 9_{\mathrm{stat}} \pm 30_{\mathrm{syst}}\,\rm MeV/c^2$ and 89 $\pm 15_{\mathrm{stat}} \pm 26_{\mathrm{syst}}\,\rm MeV$, respectively. The product branching fractions of $\mathcal{B}(ψ(3686) \to X(2300) η') \mathcal{B}(X(2300)\to φη)$ and $\mathcal{B}(ψ(3686) \to X(2300) η)\mathcal{B}(X(2300)\to φη')$ are determined to be (4.8 $\pm 1.3_{\mathrm{stat}} \pm 0.7_{\mathrm{syst}})\times 10^{-6}$ and (2.2 $\pm 0.7_{\mathrm{stat}} \pm 0.7_{\mathrm{syst}})\times 10^{-6}$, respectively. The branching fraction $\mathcal{B}(ψ(3686) \to φηη')$ is measured for the first time to be (3.14$\pm0.17_{\mathrm{stat}}\pm0.24_{\mathrm{syst}})\times10^{-5}$. The first uncertainties are statistical and the second are systematic. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05317 [pdf, other]

Accelerating Diffusion Transformers with Token-wise Feature Caching

Authors: Chang Zou, Xuyang Liu, Ting Liu, Siteng Huang, Linfeng Zhang

Abstract: Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion transformers by caching the features in previous timesteps and reusing them in the following timesteps. However, previous caching methods ignore that different tokens exh… ▽ More Diffusion transformers have shown significant effectiveness in both image and video synthesis at the expense of huge computation costs. To address this problem, feature caching methods have been introduced to accelerate diffusion transformers by caching the features in previous timesteps and reusing them in the following timesteps. However, previous caching methods ignore that different tokens exhibit different sensitivities to feature caching, and feature caching on some tokens may lead to 10$\times$ more destruction to the overall generation quality compared with other tokens. In this paper, we introduce token-wise feature caching, allowing us to adaptively select the most suitable tokens for caching, and further enable us to apply different caching ratios to neural layers in different types and depths. Extensive experiments on PixArt-$α$, OpenSora, and DiT demonstrate our effectiveness in both image and video generation with no requirements for training. For instance, 2.36$\times$ and 1.93$\times$ acceleration are achieved on OpenSora and PixArt-$α$ with almost no drop in generation quality. △ Less

Submitted 14 October, 2024; v1 submitted 4 October, 2024; originally announced October 2024.

arXiv:2410.05281 [pdf, other]

Micrometer: Micromechanics Transformer for Predicting Mechanical Responses of Heterogeneous Materials

Authors: Sifan Wang, Tong-Rui Liu, Shyam Sankaran, Paris Perdikaris

Abstract: Heterogeneous materials, crucial in various engineering applications, exhibit complex multiscale behavior, which challenges the effectiveness of traditional computational methods. In this work, we introduce the Micromechanics Transformer ({\em Micrometer}), an artificial intelligence (AI) framework for predicting the mechanical response of heterogeneous materials, bridging the gap between advanced… ▽ More Heterogeneous materials, crucial in various engineering applications, exhibit complex multiscale behavior, which challenges the effectiveness of traditional computational methods. In this work, we introduce the Micromechanics Transformer ({\em Micrometer}), an artificial intelligence (AI) framework for predicting the mechanical response of heterogeneous materials, bridging the gap between advanced data-driven methods and complex solid mechanics problems. Trained on a large-scale high-resolution dataset of 2D fiber-reinforced composites, Micrometer can achieve state-of-the-art performance in predicting microscale strain fields across a wide range of microstructures, material properties under any loading conditions and We demonstrate the accuracy and computational efficiency of Micrometer through applications in computational homogenization and multiscale modeling, where Micrometer achieves 1\% error in predicting macroscale stress fields while reducing computational time by up to two orders of magnitude compared to conventional numerical solvers. We further showcase the adaptability of the proposed model through transfer learning experiments on new materials with limited data, highlighting its potential to tackle diverse scenarios in mechanical analysis of solid materials. Our work represents a significant step towards AI-driven innovation in computational solid mechanics, addressing the limitations of traditional numerical methods and paving the way for more efficient simulations of heterogeneous materials across various industrial applications. △ Less

Submitted 23 September, 2024; originally announced October 2024.

Comments: 36 pages, 12 figures, 9 tables

arXiv:2410.05218 [pdf, other]

Density estimation with LLMs: a geometric investigation of in-context learning trajectories

Authors: Toni J. B. Liu, Nicolas Boullé, Raphaël Sarfati, Christopher J. Earls

Abstract: Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensiv… ▽ More Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context learning dynamics of LLaMA-2 models. Our main finding is that these LLMs all follow similar learning trajectories in a low-dimensional InPCA space, which are distinct from those of traditional density estimation methods like histograms and Gaussian kernel density estimation (KDE). We interpret the LLaMA in-context DE process as a KDE with an adaptive kernel width and shape. This custom kernel model captures a significant portion of LLaMA's behavior despite having only two parameters. We further speculate on why LLaMA's kernel width and shape differs from classical algorithms, providing insights into the mechanism of in-context probabilistic reasoning in LLMs. △ Less

Submitted 9 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.04524 [pdf, other]

Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning

Authors: Yanrui Du, Sendong Zhao, Jiawei Cao, Ming Ma, Danyang Zhao, Fenglei Fan, Ting Liu, Bing Qin

Abstract: Instruction Fine-Tuning (IFT) has become an essential method for adapting base Large Language Models (LLMs) into variants for professional and private use. However, researchers have raised concerns over a significant decrease in LLMs' security following IFT, even when the IFT process involves entirely benign instructions (termed Benign IFT). Our study represents a pioneering effort to mitigate the… ▽ More Instruction Fine-Tuning (IFT) has become an essential method for adapting base Large Language Models (LLMs) into variants for professional and private use. However, researchers have raised concerns over a significant decrease in LLMs' security following IFT, even when the IFT process involves entirely benign instructions (termed Benign IFT). Our study represents a pioneering effort to mitigate the security risks arising from Benign IFT. Specifically, we conduct a Module Robustness Analysis, aiming to investigate how LLMs' internal modules contribute to their security. Based on our analysis, we propose a novel IFT strategy, called the Modular Layer-wise Learning Rate (ML-LR) strategy. In our analysis, we implement a simple security feature classifier that serves as a proxy to measure the robustness of modules (e.g. $Q$/$K$/$V$, etc.). Our findings reveal that the module robustness shows clear patterns, varying regularly with the module type and the layer depth. Leveraging these insights, we develop a proxy-guided search algorithm to identify a robust subset of modules, termed Mods$_{Robust}$. During IFT, the ML-LR strategy employs differentiated learning rates for Mods$_{Robust}$ and the rest modules. Our experimental results show that in security assessments, the application of our ML-LR strategy significantly mitigates the rise in harmfulness of LLMs following Benign IFT. Notably, our ML-LR strategy has little impact on the usability or expertise of LLMs following Benign IFT. Furthermore, we have conducted comprehensive analyses to verify the soundness and flexibility of our ML-LR strategy. △ Less

Submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.04503 [pdf, other]

LRHP: Learning Representations for Human Preferences via Preference Pairs

Authors: Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Qiaozhi He, Murun Yang, Tong Xiao, Chunliang Zhang, Tongran Liu, Jingbo Zhu

Abstract: To improve human-preference alignment training, current research has developed numerous preference datasets consisting of preference pairs labeled as "preferred" or "dispreferred". These preference pairs are typically used to encode human preferences into a single numerical value through reward modeling, which acts as a reward signal during reinforcement learning from human feedback (RLHF). Howeve… ▽ More To improve human-preference alignment training, current research has developed numerous preference datasets consisting of preference pairs labeled as "preferred" or "dispreferred". These preference pairs are typically used to encode human preferences into a single numerical value through reward modeling, which acts as a reward signal during reinforcement learning from human feedback (RLHF). However, representing these human preferences as a numerical value complicates the analysis of these preferences and restricts their broader applications other than RLHF. In contrast, in this work, we introduce a preference representation learning task that aims to construct a richer and more structured representation of human preferences. We further develop a more generalizable framework, Learning Representations for Human Preferences via preference pairs (namely LRHP), which extends beyond traditional reward modeling to tackle this task. We verify the utility of preference representations in two downstream tasks: preference data selection and preference margin prediction. Building upon the human preferences in representations, we achieve strong performance in both tasks, significantly outperforming baselines. △ Less

Submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.04407 [pdf, other]

Lens: Rethinking Multilingual Enhancement for Large Language Models

Authors: Weixiang Zhao, Yulin Hu, Jiahe Guo, Xingyu Sui, Tongtong Wu, Yang Deng, Yanyan Zhao, Bing Qin, Wanxiang Che, Ting Liu

Abstract: Despite the growing global demand for large language models (LLMs) that serve users from diverse linguistic backgrounds, most cutting-edge LLMs remain predominantly English-centric. This creates a performance gap across languages, restricting access to advanced AI services for non-English speakers. Current methods to enhance multilingual capabilities largely rely on data-driven post-training techn… ▽ More Despite the growing global demand for large language models (LLMs) that serve users from diverse linguistic backgrounds, most cutting-edge LLMs remain predominantly English-centric. This creates a performance gap across languages, restricting access to advanced AI services for non-English speakers. Current methods to enhance multilingual capabilities largely rely on data-driven post-training techniques, such as multilingual instruction tuning or continual pre-training. However, these approaches encounter significant challenges, including the scarcity of high-quality multilingual datasets and the limited enhancement of multilingual capabilities. They often suffer from off-target issues and catastrophic forgetting of central language abilities. To this end, we propose Lens, a novel approach to enhance multilingual capabilities of LLMs by leveraging their internal language representation spaces. Specially, Lens operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. Using the central language as a pivot, the target language is drawn closer to it within the language-agnostic subspace, allowing it to inherit well-established semantic representations. Meanwhile, in the language-specific subspace, the representations of the target and central languages are pushed apart, enabling the target language to express itself distinctly. Extensive experiments on one English-centric and two multilingual LLMs demonstrate that Lens effectively improves multilingual performance without sacrificing the original central language capabilities of the backbone model, achieving superior results with much fewer computational resources compared to existing post-training approaches. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Comments: 21 pages, 9 figures, 5 tables

arXiv:2410.04358 [pdf]

Enabling Clinical Use of Linear Energy Transfer in Proton Therapy for Head and Neck Cancer -- A Review of Implications for Treatment Planning and Adverse Events Study

Authors: Jingyuan Chen, Yunze Yang, Hongying Feng, Chenbin Liu, Lian Zhang, Jason M. Holmes, Zhengliang Liu, Haibo Lin, Tianming Liu, Charles B. Simone II, Nancy Y. Lee, Steven E. Frank, Daniel J. Ma, Samir H. Patel, Wei Liu

Abstract: Proton therapy offers significant advantages due to its unique physical and biological properties, particularly the Bragg peak, enabling precise dose delivery to tumors while sparing healthy tissues. However, the clinical implementation is challenged by the oversimplification of the relative biological effectiveness (RBE) as a fixed value of 1.1, which does not account for the complex interplay be… ▽ More Proton therapy offers significant advantages due to its unique physical and biological properties, particularly the Bragg peak, enabling precise dose delivery to tumors while sparing healthy tissues. However, the clinical implementation is challenged by the oversimplification of the relative biological effectiveness (RBE) as a fixed value of 1.1, which does not account for the complex interplay between dose, linear energy transfer (LET), and biological endpoints. Lack of heterogeneity control or the understanding of the complex interplay may result in unexpected adverse events and suboptimal patient outcomes. On the other hand, expanding our knowledge of variable tumor RBE and LET optimization may provide a better management strategy for radioresistant tumors. This review examines recent advancements in LET calculation methods, including analytical models and Monte Carlo simulations. The integration of LET into plan evaluation is assessed to enhance plan quality control. LET-guided robust optimization demonstrates promise in minimizing high-LET exposure to organs at risk, thereby reducing the risk of adverse events. Dosimetric seed spot analysis is discussed to show its importance in revealing the true LET-related effect upon the adverse event initialization by finding the lesion origins and eliminating the confounding factors from the biological processes. Dose-LET volume histograms (DLVH) are discussed as effective tools for correlating physical dose and LET with clinical outcomes, enabling the derivation of clinically relevant dose-LET volume constraints without reliance on uncertain RBE models. Based on DLVH, the dose-LET volume constraints (DLVC)-guided robust optimization is introduced to upgrade conventional dose-volume constraints-based robust optimization, which optimizes the joint distribution of dose and LET simultaneously. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Showing 1–50 of 4,313 results for author: Liu, T