Skip to main content

Showing 1–50 of 28,992 results for author: Sun

  1. arXiv:2410.17018  [pdf, other

    cs.CL

    Exploring Forgetting in Large Language Model Pre-Training

    Authors: Chonghua Liao, Ruobing Xie, Xingwu Sun, Haowen Sun, Zhanhui Kang

    Abstract: Catastrophic forgetting remains a formidable obstacle to building an omniscient model in large language models (LLMs). Despite the pioneering research on task-level forgetting in LLM fine-tuning, there is scant focus on forgetting during pre-training. We systematically explored the existence and measurement of forgetting in pre-training, questioning traditional metrics such as perplexity (PPL) and… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  2. arXiv:2410.16912  [pdf, ps, other

    hep-ex

    Measurement of the branching fractions of the decays $Λ_{c}^{+}\rightarrowΛK_{S}^{0}K^{+}$, $Λ_{c}^{+}\rightarrowΛK_{S}^{0}π^{+}$ and $Λ_{c}^{+}\rightarrowΛK^{*+}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: Studies are performed of the Cabibbo-favored decay $Λ_{c}^{+}\toΛK_{S}^{0}K^+$ and the singly Cabibbo-suppressed decay $Λ_{c}^{+}\toΛK_{S}^{0}π^+$, based on a sample of $e^{+}e^{-}$ collision data, corresponding to an integrated luminosity of 4.5 fb$^{-1}$, accumulated at center-of-mass energies between $4599.53$ MeV and $4698.82$ MeV with the BESIII detector. The decay… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  3. arXiv:2410.16830  [pdf, other

    math.PR math.CO

    Random spanning trees in random environment

    Authors: Luca Makowiec, Michele Salvi, Rongfeng Sun

    Abstract: We introduce a new spanning tree model called the random spanning tree in random environment (RSTRE), which interpolates between the uniform spanning tree and the minimum spanning tree as the inverse temperature (disorder strength) $β$ varies. On the complete graph with $n$ vertices and i.i.d.\ uniform disorder variables on the edges, we identify: (1) a low disorder regime with $β\leq C n/\log n$,… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 36 pages, 2 figures. Comments are welcome!

    MSC Class: 60K35 (Primary) 82B41; 82B44; 05C05 (Secondary)

  4. arXiv:2410.16720  [pdf, other

    cs.DB cs.CR

    NodeOP: Optimizing Node Management for Decentralized Networks

    Authors: Angela Tsang, Jiankai Sun, Boo Xie, Azeem Khan, Ender Lu, Fletcher Fan, Maggie Wu, Jing Tang

    Abstract: We present NodeOP, a novel framework designed to optimize the management of General Node Operators in decentralized networks. By integrating Agent-Based Modeling (ABM) with a Tendermint Byzantine Fault Tolerance (BFT)-based consensus mechanism, NodeOP addresses key challenges in task allocation, consensus formation, and system stability. Through rigorous mathematical modeling and formal optimizati… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  5. arXiv:2410.16695  [pdf, other

    cs.CV cs.AI

    MPT: A Large-scale Multi-Phytoplankton Tracking Benchmark

    Authors: Yang Yu, Yuezun Li, Xin Sun, Junyu Dong

    Abstract: Phytoplankton are a crucial component of aquatic ecosystems, and effective monitoring of them can provide valuable insights into ocean environments and ecosystem changes. Traditional phytoplankton monitoring methods are often complex and lack timely analysis. Therefore, deep learning algorithms offer a promising approach for automated phytoplankton monitoring. However, the lack of large-scale, hig… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  6. arXiv:2410.16663  [pdf, other

    cs.LG

    FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs

    Authors: Haoran Lin, Xianzhi Yu, Kang Zhao, Lu Hou, Zongyuan Zhan, Stanislav Kamenev, Han Bao, Ting Hu, Mingkai Wang, Qixin Chang, Siyue Sui, Weihao Sun, Jiaxin Hu, Jun Yao, Zekun Yin, Cheng Qian, Ying Zhang, Yinfei Pan, Yu Yang, Weiguo Liu

    Abstract: FlashAttention series has been widely applied in the inference of large language models (LLMs). However, FlashAttention series only supports the high-level GPU architectures, e.g., Ampere and Hopper. At present, FlashAttention series is not easily transferrable to NPUs and low-resource GPUs. Moreover, FlashAttention series is inefficient for multi- NPUs or GPUs inference scenarios. In this work, w… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  7. arXiv:2410.16638  [pdf, other

    cs.AI cs.CL cs.LG

    LLMScan: Causal Scan for LLM Misbehavior Detection

    Authors: Mengdi Zhang, Kai Kiat Goh, Peixin Zhang, Jun Sun

    Abstract: Despite the success of Large Language Models (LLMs) across various fields, their potential to generate untruthful, biased and harmful responses poses significant risks, particularly in critical applications. This highlights the urgent need for systematic methods to detect and prevent such misbehavior. While existing approaches target specific issues such as harmful responses, this work introduces… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  8. arXiv:2410.16594  [pdf, other

    astro-ph.SR astro-ph.GA

    The Impact of Initial Composition on Massive Star Evolution and Nucleosynthesis

    Authors: Christopher West, Alexander Heger, Benoit Cote, Lev Serxner, Haoxuan Sun

    Abstract: We study the sensitivity of presupernova evolution and supernova nucleosynthesis yields of massive stars to variations of the initial composition. We use the solar abundances from Lodders (2009), and compute two different initial stellar compositions: i) scaled solar abundances, and ii) the isotopic galactic chemical history model (GCH) developed by West and Heger (2013b). We run a grid of models… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  9. arXiv:2410.16565  [pdf, other

    astro-ph.HE

    Search for gravitational waves emitted from SN 2023ixf

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné, A. Allocca , et al. (1758 additional authors not shown)

    Abstract: We present the results of a search for gravitational-wave transients associated with core-collapse supernova SN 2023ixf, which was observed in the galaxy Messier 101 via optical emission on 2023 May 19th, during the LIGO-Virgo-KAGRA 15th Engineering Run. We define a five-day on-source window during which an accompanying gravitational-wave signal may have occurred. No gravitational waves have been… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: Main paper: 6 pages, 4 figures and 1 table. Total with appendices: 20 pages, 4 figures, and 1 table

    Report number: LIGO-P2400125

  10. arXiv:2410.16561  [pdf, ps, other

    cs.LG math.OC stat.ML

    Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results

    Authors: Tao Sun, Xinwang Liu, Kun Yuan

    Abstract: This paper investigates Gradient Normalization Stochastic Gradient Descent without Clipping (NSGDC) and its variance reduction variant (NSGDC-VR) for nonconvex optimization under heavy-tailed noise. We present significant improvements in the theoretical results for both algorithms, including the removal of logarithmic factors from the convergence rates and the recovery of the convergence rate to m… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  11. arXiv:2410.16446  [pdf, ps, other

    physics.ins-det astro-ph.IM nucl-ex

    Lifetimes and Branching Ratios Apparatus (LIBRA)

    Authors: L. J. Sun, J. Dopfer, A. Adams, C. Wrede, A. Banerjee, B. A. Brown, J. Chen, E. A. M. Jensen, R. Mahajan, T. Rauscher, C. Sumithrarachchi, L. E. Weghorn, D. Weisshaar, T. Wheeler

    Abstract: The Particle X-ray Coincidence Technique (PXCT) was originally developed to measure average lifetimes in the $10^{-17}-10^{-15}$~s range for proton-unbound states populated by electron capture (EC). We have designed and built the Lifetimes and Branching Ratios Apparatus (LIBRA) to be used in the stopped-beam area at the Facility for Rare Isotope Beams that extends PXCT to measure both lifetimes an… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  12. arXiv:2410.16429  [pdf, other

    cs.LO cs.AI cs.LG math.LO

    Pantograph: A Machine-to-Machine Interaction Interface for Advanced Theorem Proving, High Level Reasoning, and Data Extraction in Lean 4

    Authors: Leni Aniva, Chuyue Sun, Brando Miranda, Clark Barrett, Sanmi Koyejo

    Abstract: Machine-assisted theorem proving refers to the process of conducting structured reasoning to automatically generate proofs for mathematical theorems. Recently, there has been a surge of interest in using machine learning models in conjunction with proof assistants to perform this task. In this paper, we introduce Pantograph, a tool that provides a versatile interface to the Lean 4 proof assistant… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    ACM Class: F.4.1; I.2.3; I.2.7

  13. arXiv:2410.16404  [pdf, other

    astro-ph.GA

    UVCANDELS: Catalogs of photometric redshifts and galaxy physical properties

    Authors: Vihang Mehta, Marc Rafelski, Ben Sunnquist, Harry I. Teplitz, Claudia Scarlata, Xin Wang, Adriano Fontana, Nimish P. Hathi, Kartheik G. Iyer, Anahita Alavi, James Colbert, Norman Grogin, Anton Koekemoer, Kalina V. Nedkova, Matthew Hayes, Laura Prichard, Brian Siana, Brent M. Smith, Rogier Windhorst, Teresa Ashcraft, Micaela Bagley, Ivano Baronchelli, Guillermo Barro, Alex Blanche, Adam Broussard , et al. (54 additional authors not shown)

    Abstract: The UltraViolet imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey Fields (UVCANDELS) program provides deep HST F275W and F435W imaging over four CANDELS fields (GOODS-N, GOODS-S, COSMOS, and EGS). We combine this newly acquired UV imaging with existing HST imaging from CANDELS as well as existing ancillary data to obtain robust photometric redshifts and reliable estimat… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 22 pages, 6 figures; accepted to ApJS; catalogs available via MAST

  14. arXiv:2410.16322  [pdf, other

    cs.CL cs.AI cs.HC

    SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques

    Authors: Qiming Guo, Jinwen Tang, Wenbo Sun, Haoteng Tang, Yi Shang, Wenlu Wang

    Abstract: Mental health issues significantly impact individuals' daily lives, yet many do not receive the help they need even with available online resources. This study aims to provide diverse, accessible, stigma-free, personalized, and real-time mental health support through cutting-edge AI technologies. It makes the following contributions: (1) Conducting an extensive survey of recent mental health suppo… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 26 pages, 19 figures, 8 tables

  15. arXiv:2410.16271  [pdf, other

    cs.CV

    FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

    Authors: Chin-Yang Lin, Chung-Ho Wu, Chang-Han Yeh, Shih-Han Yen, Cheng Sun, Yu-Lun Liu

    Abstract: Neural Radiance Fields (NeRF) face significant challenges in few-shot scenarios, primarily due to overfitting and long training times for high-fidelity rendering. Existing methods, such as FreeNeRF and SparseNeRF, use frequency regularization or pre-trained priors but struggle with complex scheduling and bias. We introduce FrugalNeRF, a novel few-shot NeRF framework that leverages weight-sharing v… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: Project page: https://linjohnss.github.io/frugalnerf/

  16. arXiv:2410.16240  [pdf, other

    eess.SY

    Nonlinear Magnetics Model for Permanent Magnet Synchronous Machines Capturing Saturation and Temperature Effects

    Authors: Kishan Srinivasan, Heath Hofmann, Jing Sun

    Abstract: This paper proposes a nonlinear magnetics model for Permanent Magnet Synchronous Machines (PMSMs) that accurately captures the effects of magnetic saturation in the machine iron and variations in rotor temperature on the permanent magnet excitation. The proposed model considers the permanent magnet as a current source rather than the more commonly used flux-linkage source. A comparison of the two… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  17. arXiv:2410.16198  [pdf, other

    cs.AI cs.CV

    Improve Vision Language Model Chain-of-thought Reasoning

    Authors: Ruohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang, Yiming Yang

    Abstract: Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes lack robust CoT reasoning data, relying on datasets dominated by short annotations with minimal rationales. In this work, we show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed r… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 10 pages + appendix

    MSC Class: 68T07

  18. arXiv:2410.16173  [pdf, other

    eess.SY

    Fast Physics-Informed Model Predictive Control Approximation for Lyapunov Stability

    Authors: Josue N. Rivera, Jianqi Ruan, XiaoLin Xu, Shuting Yang, Dengfeng Sun, Neera Jain

    Abstract: At the forefront of control techniques is Model Predictive Control (MPC). While MPCs are effective, their requisite to recompute an optimal control given a new state leads to sparse response to the system and may make their implementation infeasible in small systems with low computational resources. To address these limitations in stability control, this research presents a small deterministic Phy… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  19. arXiv:2410.16135  [pdf, other

    cs.LG cs.AI

    Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs

    Authors: Kang Zhao, Tao Yuan, Han Bao, Zhenfeng Su, Chang Gao, Zhaofeng Sun, Zichen Liang, Liping Jing, Jianfei Chen

    Abstract: To date, 2:4 sparsity has stood as the only sparse pattern that can be accelerated using sparse tensor cores on GPUs. In practice, 2:4 sparsity often possesses low actual speedups ($\leq 1.3$) and requires fixed sparse ratios, meaning that other ratios, such as 4:8, 8:16, or those exceeding 50% sparsity, do not incur any speedups on GPUs. Recent studies suggest that V:N:M sparsity is promising in… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  20. arXiv:2410.15832  [pdf, other

    eess.SY

    Nonlinear Bayesian Filtering with Natural Gradient Gaussian Approximation

    Authors: Wenhan Cao, Tianyi Zhang, Zeju Sun, Chang Liu, Stephen S. -T. Yau, Shengbo Eben Li

    Abstract: Practical Bayes filters often assume the state distribution of each time step to be Gaussian for computational tractability, resulting in the so-called Gaussian filters. When facing nonlinear systems, Gaussian filters such as extended Kalman filter (EKF) or unscented Kalman filter (UKF) typically rely on certain linearization techniques, which can introduce large estimation errors. To address this… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  21. arXiv:2410.15817  [pdf, other

    cs.CE

    Large Language Models Empower Personalized Valuation in Auction

    Authors: Jie Sun, Tianyu Zhang, Houcheng Jiang, Kexin Huang, Chi Luo, Junkang Wu, Jiancan Wu, An Zhang, Xiang Wang

    Abstract: Auctions, a fundamental economic mechanism, encompass the valuation of goods or services and the competitive bidding algorithms within a specific framework, serving to uncover the true market value. However, current research predominantly focuses on the bidding algorithms within a given auction mechanism, often overlooking the advantages of incorporating individual bidders' unique preferences and… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 14 pages, 5 figures

  22. arXiv:2410.15774  [pdf, other

    cs.RO cs.CV

    Generalizing Motion Planners with Mixture of Experts for Autonomous Driving

    Authors: Qiao Sun, Huimin Wang, Jiahao Zhan, Fan Nie, Xin Wen, Leimeng Xu, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

    Abstract: Large real-world driving datasets have sparked significant research into various aspects of data-driven motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. These planners promise better generalizations on complicated and few-shot cases than previous methods. However, experiment results show that man… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 7 pages, 3 figures

  23. arXiv:2410.15755  [pdf, other

    quant-ph

    Search for New Particles with Flying Quantum Sensors in Space

    Authors: Huang Xingming, Wang Yuanhong, Jiang Min, Kang Xiang, Su Haowen, Wang Zehao, Lin Qing, Zheng Wenqiang, Sun Yuan, Liu Liang, Peng Xinhua, Zhao Zhengguo, Du JiangFeng

    Abstract: Recent advancements in space science and technologies offer exciting prospects for investigating novel research that is unattainable within terrestrial laboratories. Here we propose the implementation of space-based quantum sensing to explore ultralight new particles beyond the standard model. The central idea involves probing long-range interactions between spin ensembles of space quantum sensors… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  24. arXiv:2410.15738  [pdf, ps, other

    cs.GT

    A Fair Allocation is Approximately Optimal for Indivisible Chores, or Is It?

    Authors: Bo Li, Ankang Sun, Shiji Xing

    Abstract: In this paper, we study the allocation of indivisible chores and consider the problem of finding a fair allocation that is approximately efficient. We shift our attention from the multiplicative approximation to the additive one. Our results are twofold, with (1) bounding how the optimal social cost escalates resulting from fairness requirements and (2) presenting the hardness of approximation for… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: Appears in the 20th Conference on Web and Internet Economics (WINE), 2024

    ACM Class: F.2.2

  25. arXiv:2410.15732  [pdf, other

    cs.CV

    ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts

    Authors: Xumeng Han, Longhui Wei, Zhiyang Dou, Zipeng Wang, Chenhui Qiang, Xin He, Yingfei Sun, Zhenjun Han, Qi Tian

    Abstract: Mixture-of-Experts (MoE) models embody the divide-and-conquer concept and are a promising approach for increasing model capacity, demonstrating excellent scalability across multiple domains. In this paper, we integrate the MoE structure into the classic Vision Transformer (ViT), naming it ViMoE, and explore the potential of applying MoE to vision through a comprehensive study on image classificati… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  26. arXiv:2410.15698  [pdf, other

    cs.LG

    Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces

    Authors: Jifeng Hu, Sili Huang, Li Shen, Zhejian Yang, Shengchao Hu, Shisong Tang, Hechang Chen, Yi Chang, Dacheng Tao, Lichao Sun

    Abstract: Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems by modeling the joint distributions of trajectories. However, most research only focuses on limited continual task settings where the tasks have the same observation and action space, which deviates from the realistic demands of training agents in various environments. In view… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  27. arXiv:2410.15633  [pdf, other

    cs.CL cs.AI

    Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement

    Authors: Shuzheng Si, Haozhe Zhao, Gang Chen, Yunshui Li, Kangyang Luo, Chuancheng Lv, Kaikai An, Fanchao Qi, Baobao Chang, Maosong Sun

    Abstract: The expansion of large language models to effectively handle instructions with extremely long contexts has yet to be fully investigated. The primary obstacle lies in constructing a high-quality long instruction-following dataset devised for long context alignment. Existing studies have attempted to scale up the available data volume by synthesizing long instruction-following samples. However, indi… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  28. arXiv:2410.15631  [pdf, other

    cs.SE cs.CR

    Security of Language Models for Code: A Systematic Literature Review

    Authors: Yuchen Chen, Weisong Sun, Chunrong Fang, Zhenpeng Chen, Yifei Ge, Tingxu Han, Quanjun Zhang, Yang Liu, Zhenyu Chen, Baowen Xu

    Abstract: Language models for code (CodeLMs) have emerged as powerful tools for code-related tasks, outperforming traditional methods and standard machine learning approaches. However, these models are susceptible to security vulnerabilities, drawing increasing research attention from domains such as software engineering, artificial intelligence, and cybersecurity. Despite the growing body of research focus… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  29. arXiv:2410.15575  [pdf, other

    cs.CL

    Neural Search Space in Gboard Decoder

    Authors: Yanxiang Zhang, Yuanbo Zhang, Haicheng Sun, Yun Wang, Billy Dou, Gary Sivek, Shumin Zhai

    Abstract: Gboard Decoder produces suggestions by looking for paths that best match input touch points on the context aware search space, which is backed by the language Finite State Transducers (FST). The language FST is currently an N-gram language model (LM). However, N-gram LMs, limited in context length, are known to have sparsity problem under device model size constraint. In this paper, we propose \te… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: 10 pages, 7 figures, 3 tables

  30. arXiv:2410.15567  [pdf, other

    cs.LG cs.AI cs.CL

    Pruning Foundation Models for High Accuracy without Retraining

    Authors: Pu Zhao, Fei Sun, Xuan Shen, Pinrui Yu, Zhenglun Kong, Yanzhi Wang, Xue Lin

    Abstract: Despite the superior performance, it is challenging to deploy foundation models or large language models (LLMs) due to their massive parameters and computations. While pruning is a promising technique to reduce model size and accelerate the inference, the traditional pruning techniques can hardly be applied for LLMs as they need to finetune the model on the full dataset with multiple epochs consum… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: Accepted by EMNLP 2024 findings

  31. arXiv:2410.15536  [pdf, other

    cs.RO cs.AI

    GRS: Generating Robotic Simulation Tasks from Real-World Images

    Authors: Alex Zook, Fan-Yun Sun, Josef Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay

    Abstract: We introduce GRS (Generating Robotic Simulation tasks), a novel system to address the challenge of real-to-sim in robotics, computer vision, and AR/VR. GRS enables the creation of digital twin simulations from single real-world RGB-D observations, complete with diverse, solvable tasks for virtual agent training. We use state-of-the-art vision-language models (VLMs) to achieve a comprehensive real-… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  32. arXiv:2410.15529  [pdf, other

    physics.ins-det hep-ex

    Measurement of gas properties for the ion-TPC of N$ν$DEx experiment

    Authors: Tianyu Liang, Meiqiang Zhan, Hulin Wang, Xianglun Wei, Dongliang Zhang, Jun Liu, Chengui Lu, Qiang Hu, Yichen Yang, Chaosong Gao, Le Xiao, Xiangming Sun, Feng Liu, Chengxin Zhao, Hao Qiu, Kai Chen

    Abstract: In the N$ν$DEx collaboration, a high-pressure gas TPC is being developed to search for the neutrinoless double beta decay. The use of electronegative $\mathrm{^{82}SeF_{6}}$ gas mandates an ion-TPC. The reconstruction of $z$ coordinate is to be realized exploiting the feature of multiple species of charge carriers. As the initial stage of the development, we studied the properties of the… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: 10 pages, 8 figures

  33. arXiv:2410.15526  [pdf, other

    cs.LG cs.DC

    SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

    Authors: Jinda Jia, Cong Xie, Hanlin Lu, Daoce Wang, Hao Feng, Chengming Zhang, Baixi Sun, Haibin Lin, Zhi Zhang, Xin Liu, Dingwen Tao

    Abstract: Recent years have witnessed a clear trend towards language models with an ever-increasing number of parameters, as well as the growing training overhead and memory usage. Distributed training, particularly through Sharded Data Parallelism (ShardedDP) which partitions optimizer states among workers, has emerged as a crucial technique to mitigate training time and memory usage. Yet, a major challeng… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024

  34. arXiv:2410.15397  [pdf, other

    cs.LG cs.CL cs.CV

    IPO: Interpretable Prompt Optimization for Vision-Language Models

    Authors: Yingjun Du, Wenfang Sun, Cees G. M. Snoek

    Abstract: Pre-trained vision-language models like CLIP have remarkably adapted to various downstream tasks. Nonetheless, their performance heavily depends on the specificity of the input text prompts, which requires skillful prompt template engineering. Instead, current approaches to prompt optimization learn the prompts through gradient descent, where the prompts are treated as adjustable parameters. Howev… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024

  35. arXiv:2410.15261  [pdf

    cond-mat.str-el cond-mat.dis-nn cond-mat.mtrl-sci

    Emerging quantum critical phase in a cluster spin-glass

    Authors: Fang Zhang, Tao Feng, Yurong Ruan, Xiaoyuan Ye, Bing Wen, Liang Zhou, Minglin He, Zhaotong Zhuang, Liusuo Wu, Hongtao He, Peijie Sun, Zhiyang Yu, Weishu Liu, Wenqing Zhang

    Abstract: Magnetic frustration has been recognized as pivotal to investigating new phases of matter in correlation-driven Kondo breakdown quantum phase transitions that are not clearly associated with broken symmetry. The nature of these new phases, however, remains underexplored. Here, we report quantum criticalities emerging from a cluster spin-glass in the heavy-fermion metal TiFe$_x$Cu$_{2x-1}$Sb, where… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 18 pages, 4 figures, with Supplementary Information

  36. arXiv:2410.15252  [pdf, other

    cs.CL cs.AI

    Lossless KV Cache Compression to 2%

    Authors: Zhen Yang, J. N. Han, Kan Wu, Ruobing Xie, An Wang, Xingwu Sun, Zhanhui Kang

    Abstract: Large language models have revolutionized data processing in numerous domains, with their ability to handle extended context reasoning receiving notable recognition. To speed up inference, maintaining a key-value (KV) cache memory is essential. Nonetheless, the growing demands for KV cache memory create significant hurdles for efficient implementation. This work introduces a novel architecture, Cr… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

  37. arXiv:2410.15130  [pdf, ps, other

    math.DS math.CO math.NT

    Seminorm estimates and joint ergodicity for pairwise independent Hardy sequences

    Authors: Sebastián Donoso, Andreas Koutsogiannis, Borys Kuca, Wenbo Sun, Konstantinos Tsinas

    Abstract: We develop a robust structure theory for multiple ergodic averages of commuting transformations along Hardy sequences of polynomial growth. We then apply it to derive a number of novel results on joint ergodicity, recurrence and convergence. Specifically, we construct a suitable generalization of Host-Kra and box seminorms that quantitatively controls the aforementioned averages subject to necessa… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 105 pages. Comments welcome!

    MSC Class: Primary: 37A44; Secondary: 11B30; 28D05

  38. arXiv:2410.15100  [pdf

    physics.optics

    A Flat Plasmonic Biosensing Interface on Optical Fiber End-Facet via SPP-MIM Hybridization

    Authors: Chenjia He, Xiaqing Sun, Hao Zhong, Qingfeng Meng, Xuetong Zhou, Sihang Liu, Li Zheng, Xiangyang Kong, Shengfu Chen, Shengce Tao, Tian Yang

    Abstract: We found that the specific dispersion of metal-insulator-metal (MIM) waveguide allows the hybridization of surface plasmon polaritons (SPPs) and the waveguide, which is not possible with dielectric waveguides. The SPP-MIM hybridization structure forms such a meta-film that integrates the previously incompatible respective merits of SPR and LSPR, including flat interfaces, high sensitivities, short… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: article + supplementary information

  39. arXiv:2410.15020  [pdf, other

    cs.LG

    Iterative Methods via Locally Evolving Set Process

    Authors: Baojian Zhou, Yifan Sun, Reza Babanezhad Harikandeh, Xingzhi Guo, Deqing Yang, Yanghua Xiao

    Abstract: Given the damping factor $α$ and precision tolerance $ε$, \citet{andersen2006local} introduced Approximate Personalized PageRank (APPR), the \textit{de facto local method} for approximating the PPR vector, with runtime bounded by $Θ(1/(αε))$ independent of the graph size. Recently, \citet{fountoulakis2022open} asked whether faster local algorithms could be developed using $\tilde{O}(1/(\sqrtαε))$… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 58 pages, 15 figures, NeurIPS 2024

  40. arXiv:2410.14961  [pdf, other

    cs.LG cs.AI cs.SI

    LangGFM: A Large Language Model Alone Can be a Powerful Graph Foundation Model

    Authors: Tianqianjin Lin, Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Jun Lin, Weikang Yuan, Junjie Cao, Changlong Sun, Xiaozhong Liu

    Abstract: Graph foundation models (GFMs) have recently gained significant attention. However, the unique data processing and evaluation setups employed by different studies hinder a deeper understanding of their progress. Additionally, current research tends to focus on specific subsets of graph learning tasks, such as structural tasks, node-level tasks, or classification tasks. As a result, they often inco… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: under review

  41. arXiv:2410.14940  [pdf, other

    cs.LG cs.CL

    Baichuan Alignment Technical Report

    Authors: Mingan Lin, Fan Yang, Yanjun Shen, Haoze Sun, Tianpeng Li, Tao Zhang, Chenzheng Zhu, Tao Zhang, Miao Zheng, Xu Li, Yijie Zhou, Mingyang Chen, Yanzhao Qin, Youquan Li, Hao Liang, Fei Li, Yadong Li, Mang Wang, Guosheng Dong, Kun Fang, Jianhua Xu, Bin Cui, Wentao Zhang, Zenan Zhou, Weipeng Chen

    Abstract: We introduce Baichuan Alignment, a detailed analysis of the alignment techniques employed in the Baichuan series of models. This represents the industry's first comprehensive account of alignment methodologies, offering valuable insights for advancing AI research. We investigate the critical components that enhance model performance during the alignment process, including optimization methods, dat… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  42. arXiv:2410.14932  [pdf, other

    physics.ao-ph cs.LG

    Can AI weather models predict out-of-distribution gray swan tropical cyclones?

    Authors: Y. Qiang Sun, Pedram Hassanzadeh, Mohsen Zand, Ashesh Chattopadhyay, Jonathan Weare, Dorian S. Abbot

    Abstract: Predicting gray swan weather extremes, which are possible but so rare that they are absent from the training dataset, is a major concern for AI weather/climate models. An important open question is whether AI models can extrapolate from weaker weather events present in the training set to stronger, unseen weather extremes. To test this, we train independent versions of the AI model FourCastNet on… ▽ More

    Submitted 22 October, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

  43. arXiv:2410.14900  [pdf, other

    cs.CV

    DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits

    Authors: Chengze Ye, Linda-Sophie Schneider, Yipeng Sun, Mareike Thies, Siyuan Mei, Andreas Maier

    Abstract: This paper introduces a novel method for reconstructing cone beam computed tomography (CBCT) images for arbitrary orbits using a differentiable shift-variant filtered backprojection (FBP) neural network. Traditional CBCT reconstruction methods for arbitrary orbits, like iterative reconstruction algorithms, are computationally expensive and memory-intensive. The proposed method addresses these chal… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  44. arXiv:2410.14882  [pdf

    cs.AR eess.SP

    Multi-diseases detection with memristive system on chip

    Authors: Zihan Wang, Daniel W. Yang, Zerui Liu, Evan Yan, Heming Sun, Ning Ge, Miao Hu, Wei Wu

    Abstract: This study presents the first implementation of multilayer neural networks on a memristor/CMOS integrated system on chip (SoC) to simultaneously detect multiple diseases. To overcome limitations in medical data, generative AI techniques are used to enhance the dataset, improving the classifier's robustness and diversity. The system achieves notable performance with low latency, high accuracy (91.8… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 14 pages, 5 figures

    ACM Class: C.1.3; I.2.0

  45. arXiv:2410.14853  [pdf, other

    cs.CL cs.AI

    DFlow: Diverse Dialogue Flow Simulation with Large Language Models

    Authors: Wanyu Du, Song Feng, James Gung, Lijia Sun, Yi Zhang, Saab Mansour, Yanjun Qi

    Abstract: Developing language model-based dialogue agents requires effective data to train models that can follow specific task logic. However, most existing data augmentation methods focus on increasing diversity in language, topics, or dialogue acts at the utterance level, largely neglecting a critical aspect of task logic diversity at the dialogue level. This paper proposes a novel data augmentation meth… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 16 pages

  46. arXiv:2410.14804  [pdf, other

    astro-ph.GA astro-ph.CO

    SMILES: Discovery of Higher Ionizing Photon Production Efficiency in Overdense Regions

    Authors: Yongda Zhu, Stacey Alberts, Jianwei Lyu, Jane Morrison, George H. Rieke, Yang Sun, Jakob M. Helton, Zhiyuan Ji, Rachana Bhatawdekar, Nina Bonaventura, Andrew J. Bunker, Xiaojing Lin, Marcia J. Rieke, Pierluigi Rinaldi, Irene Shivaei, Christopher N. A. Willmer, Junyu Zhang

    Abstract: The topology of reionization and the environments where galaxies efficiently produce ionizing photons are key open questions. For the first time, we investigate the correlation between ionizing photon production efficiency, $ξ_{\rm ion}$, and galaxy overdensity, $\log(1+δ)$. We analyze the ionizing properties of 93 galaxies between $0.7 < z < 6.9$ using JWST NIRSpec medium-resolution spectra from… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 14 pages, 7 figures, 1 table. Submitted to AAS journals. The machine-readable table will be made available upon acceptance

  47. arXiv:2410.14795  [pdf, other

    cs.CL

    Cross-Document Event-Keyed Summarization

    Authors: William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White

    Abstract: Event-keyed summarization (EKS) requires generating a summary about a specific event described in a document, given the document and an event representation extracted from it. In this work, we extend EKS to the cross-document setting (CDEKS), in which summaries must synthesize information from accounts of the same event given by multiple sources. We introduce SEAMUS (Summaries of Events Across Mul… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  48. arXiv:2410.14660  [pdf, other

    cs.LG

    A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

    Authors: Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li

    Abstract: Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or repetitive RL training. To address these issues, we propose CARD, a LLM-driven Reward Design framework that iteratively generates and improves reward function code.… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  49. arXiv:2410.14605  [pdf, ps, other

    math.NT

    Universal sums via products of Ramanujan's theta functions

    Authors: Nasser Abdo Saeed Bulkhali, Zhi-Wei Sun

    Abstract: An integer-valued polynomial $P(x,y,z)$ is said to be universal (over $\mathbb Z$) if each nonnegative integer can be written as $P(x,y,z)$ with $x,y,z\in\mathbb Z$. In this paper, we mainly introduce a new technique to determine the universality of some sums in the form $x(a_1x+a_2)/2+y(b_1y+b_2)/2+z(c_1z+c_2)/2$ conjectured by Sun, using various identities of Ramanujan's theta functions.

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 20 pages

    MSC Class: 11D72; 11E20; 11E25; 11F27; 14H42

  50. arXiv:2410.14210  [pdf, other

    cs.CV cs.NE

    Shape Transformation Driven by Active Contour for Class-Imbalanced Semi-Supervised Medical Image Segmentation

    Authors: Yuliang Gu, Yepeng Liu, Zhichao Sun, Jinchi Zhu, Yongchao Xu, Laurent Najman

    Abstract: Annotating 3D medical images demands expert knowledge and is time-consuming. As a result, semi-supervised learning (SSL) approaches have gained significant interest in 3D medical image segmentation. The significant size differences among various organs in the human body lead to imbalanced class distribution, which is a major challenge in the real-world application of these SSL approaches. To addre… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Journal ref: 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Dec 2024, Lisbon (Portugal), Portugal