Skip to main content

Showing 1–50 of 5,550 results for author: Sun, Y

  1. arXiv:2410.15732  [pdf, other

    cs.CV

    ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts

    Authors: Xumeng Han, Longhui Wei, Zhiyang Dou, Zipeng Wang, Chenhui Qiang, Xin He, Yingfei Sun, Zhenjun Han, Qi Tian

    Abstract: Mixture-of-Experts (MoE) models embody the divide-and-conquer concept and are a promising approach for increasing model capacity, demonstrating excellent scalability across multiple domains. In this paper, we integrate the MoE structure into the classic Vision Transformer (ViT), naming it ViMoE, and explore the potential of applying MoE to vision through a comprehensive study on image classificati… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  2. arXiv:2410.15020  [pdf, other

    cs.LG

    Iterative Methods via Locally Evolving Set Process

    Authors: Baojian Zhou, Yifan Sun, Reza Babanezhad Harikandeh, Xingzhi Guo, Deqing Yang, Yanghua Xiao

    Abstract: Given the damping factor $α$ and precision tolerance $ε$, \citet{andersen2006local} introduced Approximate Personalized PageRank (APPR), the \textit{de facto local method} for approximating the PPR vector, with runtime bounded by $Θ(1/(αε))$ independent of the graph size. Recently, \citet{fountoulakis2022open} asked whether faster local algorithms could be developed using $\tilde{O}(1/(\sqrtαε))$… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 58 pages, 15 figures, NeurIPS 2024

  3. arXiv:2410.14932  [pdf, other

    physics.ao-ph cs.LG

    Can AI weather models predict out-of-distribution gray swan tropical cyclones?

    Authors: Y. Qiang Sun, Pedram Hassanzadeh, Mohsen Zand, Ashesh Chattopadhyay, Jonathan Weare, Dorian S. Abbot

    Abstract: Predicting gray swan weather extremes, which are possible but so rare that they are absent from the training dataset, is a major concern for AI weather/climate models. An important open question is whether AI models can extrapolate from weaker weather events present in the training set to stronger, unseen weather extremes. To test this, we train independent versions of the AI model FourCastNet on… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  4. arXiv:2410.14900  [pdf, other

    cs.CV

    DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits

    Authors: Chengze Ye, Linda-Sophie Schneider, Yipeng Sun, Mareike Thies, Siyuan Mei, Andreas Maier

    Abstract: This paper introduces a novel method for reconstructing cone beam computed tomography (CBCT) images for arbitrary orbits using a differentiable shift-variant filtered backprojection (FBP) neural network. Traditional CBCT reconstruction methods for arbitrary orbits, like iterative reconstruction algorithms, are computationally expensive and memory-intensive. The proposed method addresses these chal… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  5. arXiv:2410.14804  [pdf, other

    astro-ph.GA astro-ph.CO

    SMILES: Discovery of Higher Ionizing Photon Production Efficiency in Overdense Regions

    Authors: Yongda Zhu, Stacey Alberts, Jianwei Lyu, Jane Morrison, George H. Rieke, Yang Sun, Jakob M. Helton, Zhiyuan Ji, Rachana Bhatawdekar, Nina Bonaventura, Andrew J. Bunker, Xiaojing Lin, Marcia J. Rieke, Pierluigi Rinaldi, Irene Shivaei, Christopher N. A. Willmer, Junyu Zhang

    Abstract: The topology of reionization and the environments where galaxies efficiently produce ionizing photons are key open questions. For the first time, we investigate the correlation between ionizing photon production efficiency, $ξ_{\rm ion}$, and galaxy overdensity, $\log(1+δ)$. We analyze the ionizing properties of 93 galaxies between $0.7 < z < 6.9$ using JWST NIRSpec medium-resolution spectra from… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 14 pages, 7 figures, 1 table. Submitted to AAS journals. The machine-readable table will be made available upon acceptance

  6. arXiv:2410.14054  [pdf, other

    math.OC stat.ML

    Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization

    Authors: Yufeng Yang, Erin Tripp, Yifan Sun, Shaofeng Zou, Yi Zhou

    Abstract: Recent studies have shown that many nonconvex machine learning problems meet a so-called generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms designed for generalized-smooth nonconvex optimization encounter significant limitations in both their design and convergence analysis. In this work, we first study deterministic general… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 3 figures, 30 pages

  7. arXiv:2410.14050  [pdf, other

    cs.CL cs.CV cs.CY cs.HC

    Learning Multimodal Cues of Children's Uncertainty

    Authors: Qi Cheng, Mert İnan, Rahma Mbarki, Grace Grmek, Theresa Choi, Yiming Sun, Kimele Persaud, Jenny Wang, Malihe Alikhani

    Abstract: Understanding uncertainty plays a critical role in achieving common ground (Clark et al.,1983). This is especially important for multimodal AI systems that collaborate with users to solve a problem or guide the user through a challenging concept. In this work, for the first time, we present a dataset annotated in collaboration with developmental and cognitive psychologists for the purpose of study… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: SIGDIAL 2023

  8. arXiv:2410.13733  [pdf, other

    cs.CV cs.MM

    Improving Multi-modal Large Language Model through Boosting Vision Capabilities

    Authors: Yanpeng Sun, Huaxin Zhang, Qiang Chen, Xinyu Zhang, Nong Sang, Gang Zhang, Jingdong Wang, Zechao Li

    Abstract: We focus on improving the visual understanding capability for boosting the vision-language models. We propose \textbf{Arcana}, a multiModal language model, which introduces two crucial techniques. First, we present Multimodal LoRA (MM-LoRA), a module designed to enhance the decoder. Unlike traditional language-driven decoders, MM-LoRA consists of two parallel LoRAs -- one for vision and one for la… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  9. arXiv:2410.13515  [pdf, other

    hep-ex hep-lat hep-ph nucl-ex

    Observation of a rare beta decay of the charmed baryon with a Graph Neural Network

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

    Abstract: The study of beta decay of the charmed baryon provides unique insights into the fundamental mechanism of the strong and electro-weak interactions. The $Λ_c^+$, being the lightest charmed baryon, undergoes disintegration solely through the charm quark weak decay. Its beta decay provides an ideal laboratory for investigating non-perturbative effects in quantum chromodynamics and for constraining the… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 28 pages, 6 figures

  10. arXiv:2410.13478  [pdf, other

    hep-ex

    Observation of $χ_{c0}\toΣ^{+}\barΣ^{-}η$ and evidence for $χ_{c1,2}\toΣ^{+}\barΣ^{-}η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, the decay $χ_{c0}\toΣ^{+}\barΣ^{-}η$ is observed for the first time with a statistical significance of $7.0σ$, and evidence for $χ_{c1}\toΣ^{+}\barΣ^{-}η$ and $χ_{c2}\toΣ^{+}\barΣ^{-}η$ is found with statistical significances of $4.3σ$ and $4.6σ$, respectively. The branching fractions are determined to be… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  11. arXiv:2410.13368  [pdf, other

    hep-ex hep-ph

    Observation of the Singly Cabibbo-Suppressed Decay $Λ_c^{+}\to pπ^0$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

    Abstract: Utilizing 4.5${~\rm{fb}}^{-1}$ of $e^+e^-$ annihilation data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 4.600 and 4.699 GeV, the first observation of the singly Cabibbo-suppressed decay $Λ_c^{+}\to pπ^0$ is presented, with a statistical significance of $5.4σ$. The ratio of the branching fractions of $Λ_c^{+}\to pπ^0$ and $Λ_c^{+}\to pη$ is measured… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 9 pages, 4 figures

  12. arXiv:2410.13327  [pdf, other

    cond-mat.supr-con cond-mat.str-el

    Cryogenic Digital Image Correlation as a Probe of Strain in Iron-Based Superconductors

    Authors: Ziye Mo, Chunyi Li, Wenting Zhang, Chang Liu, Yongxin Sun, Ruixian Liu, Xingye Lu

    Abstract: Uniaxial strain is a powerful tuning parameter that can control symmetry and anisotropic electronic properties in iron-based superconductors. However, accurately characterizing anisotropic strain can be challenging and complex. Here, we utilize a cryogenic optical system equipped with a high-spatial-resolution microscope to characterize surface strains in iron-based superconductors using the digit… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 6 pages, 4 figures. Published online in Chinese Physics Letters. DOI 10.1088/0256-307X/41/10/107102

  13. arXiv:2410.13264  [pdf, other

    cs.LG cs.AI

    The Latent Road to Atoms: Backmapping Coarse-grained Protein Structures with Latent Diffusion

    Authors: Xu Han, Yuancheng Sun, Kai Chen, Kang Liu, Qiwei Ye

    Abstract: Coarse-grained(CG) molecular dynamics simulations offer computational efficiency for exploring protein conformational ensembles and thermodynamic properties. Though coarse representations enable large-scale simulations across extended temporal and spatial ranges, the sacrifice of atomic-level details limits their utility in tasks such as ligand docking and protein-protein interaction prediction. B… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Paper under review

  14. arXiv:2410.12620  [pdf, other

    hep-ex

    Search for $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ at center-of-mass energies from 4.47 to 4.95 GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: Utilizing a data set of $6.7$ fb$^{-1}$ from electron-positron collisions recorded by the BESIII detector at the BEPCII storage ring, a search is conducted for the processes $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ across center-of-mass energies from 4.47 to 4.95 GeV. In the absence of any significant signals, upper limits are set. These include limits on the Born cross sections for… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 14 pages, 6 figures

  15. arXiv:2410.11783  [pdf, other

    cs.CV cs.RO

    Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

    Authors: Joey Wilson, Ruihan Xu, Yile Sun, Parker Ewen, Minghan Zhu, Kira Barton, Maani Ghaffari

    Abstract: This paper introduces a novel probabilistic mapping algorithm, Latent BKI, which enables open-vocabulary mapping with quantifiable uncertainty. Traditionally, semantic mapping algorithms focus on a fixed set of semantic categories which limits their applicability for complex robotic tasks. Vision-Language (VL) models have recently emerged as a technique to jointly model language and visual feature… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  16. arXiv:2410.11650  [pdf, other

    cs.CV cs.AI

    ED-ViT: Splitting Vision Transformer for Distributed Inference on Edge Devices

    Authors: Xiang Liu, Yijun Song, Xia Li, Yifei Sun, Huiying Lan, Zemin Liu, Linshan Jiang, Jialin Li

    Abstract: Deep learning models are increasingly deployed on resource-constrained edge devices for real-time data analytics. In recent years, Vision Transformer models and their variants have demonstrated outstanding performance across various computer vision tasks. However, their high computational demands and inference latency pose significant challenges for model deployment on resource-constraint edge dev… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 14 pages, 8 figures

  17. arXiv:2410.11607  [pdf, other

    hep-ex

    Observation of $χ_{cJ}\to p \bar p K^0_S K^- π^+ + c.c.$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (648 additional authors not shown)

    Abstract: By analyzing $(27.12\pm0.14)\times10^8$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays of $χ_{cJ} \to p \bar{p} K^0_S K^- π^+ +c.c.(J=0, 1, 2)$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 12 pages, 5 figures

  18. arXiv:2410.11464  [pdf, other

    cs.IR cs.AI cs.LG

    CoActionGraphRec: Sequential Multi-Interest Recommendations Using Co-Action Graphs

    Authors: Yi Sun, Yuri M. Brovman

    Abstract: There are unique challenges to developing item recommender systems for e-commerce platforms like eBay due to sparse data and diverse user interests. While rich user-item interactions are important, eBay's data sparsity exceeds other e-commerce sites by an order of magnitude. To address this challenge, we propose CoActionGraphRec (CAGR), a text based two-tower deep learning model (Item Tower and Us… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  19. arXiv:2410.11310  [pdf, ps, other

    hep-th gr-qc

    Pure geometric $f(R)$ branes

    Authors: Heng Guo, Cai-Ling Wang, Yong-Tao Lu, Yue Sun, Lang-Lang Wang

    Abstract: In this paper, we investigate pure geometric $f(R)$ cosmology branes embedded in five-dimensional spacetime. The form of $f(R)$ is chosen as a polynomial. The Five-dimensional scalar curvature $R$ is assumed to be constant. Based on the value of the four-dimensional cosmological constant $λ_4$, the branes can be classified into Minkowski, de Sitter, and de anti-de Sitter cases. Solutions for each… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 16 pages, 8 figures

  20. arXiv:2410.10783  [pdf, other

    cs.CV

    LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content

    Authors: Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh, Wei Lin, M. Jehanzeb Mirza, Leshem Chosen, Mikhail Yurochkin, Yuekai Sun, Assaf Arbelle, Leonid Karlinsky, Raja Giryes

    Abstract: The large-scale training of multi-modal models on data scraped from the web has shown outstanding utility in infusing these models with the required world knowledge to perform effectively on multiple downstream tasks. However, one downside of scraping data from the web can be the potential sacrifice of the benchmarks on which the abilities of these models are often evaluated. To safeguard against… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

  21. arXiv:2410.10453  [pdf, other

    cs.CV

    Self-Assessed Generation: Trustworthy Label Generation for Optical Flow and Stereo Matching in Real-world

    Authors: Han Ling, Yinghui Sun, Quansen Sun, Ivor Tsang, Yuhui Zheng

    Abstract: A significant challenge facing current optical flow and stereo methods is the difficulty in generalizing them well to the real world. This is mainly due to the high costs required to produce datasets, and the limitations of existing self-supervised methods on fuzzy results and complex model training problems. To address the above challenges, we propose a unified self-supervised generalization fram… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  22. arXiv:2410.09909  [pdf, other

    cs.CV cs.CR cs.LG

    UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation

    Authors: Ye Sun, Hao Zhang, Tiehua Zhang, Xingjun Ma, Yu-Gang Jiang

    Abstract: Image segmentation is a crucial vision task that groups pixels within an image into semantically meaningful segments, which is pivotal in obtaining a fine-grained understanding of real-world scenes. However, an increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data. In this work, we exploit the concept of unlearnable examples to make… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024

  23. arXiv:2410.09836  [pdf, other

    cs.LG stat.ML

    Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift

    Authors: Yanru Sun, Zongxia Xie, Emadeldeen Eldele, Dongyue Chen, Qinghua Hu, Min Wu

    Abstract: Time series forecasting, which aims to predict future values based on historical data, has garnered significant attention due to its broad range of applications. However, real-world time series often exhibit complex non-uniform distribution with varying patterns across segments, such as season, operating condition, or semantic meaning, making accurate forecasting challenging. Existing approaches,… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  24. arXiv:2410.09775  [pdf, other

    cs.AI cs.CL

    EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of LLMs

    Authors: Yijie Li, Yuan Sun

    Abstract: Recently, there has been a growing trend of employing large language models (LLMs) to judge the quality of other LLMs. Many studies have adopted closed-source models, mainly using GPT-4 as the evaluator. However, due to the closed-source nature of the GPT-4 model, employing it as an evaluator has resulted in issues including transparency, controllability, and cost-effectiveness. Some researchers h… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  25. arXiv:2410.09518  [pdf, ps, other

    astro-ph.HE

    Follow-up timing of 12 pulsars discovered in Commensal Radio Astronomy FAST Survey

    Authors: D. Zhao, J. P. Yuan, N. Wang, D. Li, P. Wang, M. Y. Xue, W. W. Zhu, C. C. Miao, W. M. Yan, J. B. Wang, J. M. Yao, Q. D. Wu, S. Q. Wang, S. N. Sun, F. F. Kou, Y. T. Chen, S. J. Dang, Y. Feng, Z. J. Liu, X. L. Miao, L. Q. Meng, M. Yuan, C. H. Niu, J. R. Niu, L. Qian , et al. (18 additional authors not shown)

    Abstract: We present phase-connected timing ephemerides, polarization pulse profiles and Faraday rotation measurements of 12 pulsars discovered by the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in the Commensal Radio Astronomy FAST Survey (CRAFTS). The observational data for each pulsar span at least one year. Among them, PSR J1840+2843 shows subpulse drifting, and five pulsars are detecte… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: 20 pages, 15 figures, accepted for publication in ApJ

  26. arXiv:2410.09517  [pdf, ps, other

    math.NA

    Lower order mixed elements for the linear elasticity problem in 2D and 3D

    Authors: Jun Hu, Rui Ma, Yuanxun Sun

    Abstract: In this paper, we construct two lower order mixed elements for the linear elasticity problem in the Hellinger-Reissner formulation, one for the 2D problem and one for the 3D problem, both on macro-element meshes. The discrete stress spaces enrich the analogous $P_k$ stress spaces in [J. Hu and S. Zhang, arxiv, 2014, J. Hu and S. Zhang, Sci. China Math., 2015] with simple macro-element bubble funct… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    MSC Class: 65N30; 74B05

  27. arXiv:2410.09505  [pdf, other

    cs.LG cs.NE

    HG2P: Hippocampus-inspired High-reward Graph and Model-Free Q-Gradient Penalty for Path Planning and Motion Control

    Authors: Haoran Wang, Yaoru Sun, Zeshen Tang

    Abstract: Goal-conditioned hierarchical reinforcement learning (HRL) decomposes complex reaching tasks into a sequence of simple subgoal-conditioned tasks, showing significant promise for addressing long-horizon planning in large-scale environments. This paper bridges the goal-conditioned HRL based on graph-based planning to brain mechanisms, proposing a hippocampus-striatum-like dual-controller hypothesis.… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  28. arXiv:2410.09426  [pdf, other

    cs.CL cs.LG

    FlatQuant: Flatness Matters for LLM Quantization

    Authors: Yuxuan Sun, Ruikang Liu, Haoli Bai, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, Xin Jiang, Wulong Liu, Jun Yao

    Abstract: Recently, quantization has been widely used for the compression and acceleration of large language models~(LLMs). Due to the outliers in LLMs, it is crucial to flatten weights and activations to minimize quantization error with the equally spaced quantization points. Prior research explores various pre-quantization transformations to suppress outliers, such as per-channel scaling and Hadamard tran… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: 23 pages

  29. arXiv:2410.09352  [pdf, other

    cs.SE cs.CL

    LogLM: From Task-based to Instruction-based Automated Log Analysis

    Authors: Yilun Liu, Yuhe Ji, Shimin Tao, Minggui He, Weibin Meng, Shenglin Zhang, Yongqian Sun, Yuming Xie, Boxing Chen, Hao Yang

    Abstract: Automatic log analysis is essential for the efficient Operation and Maintenance (O&M) of software systems, providing critical insights into system behaviors. However, existing approaches mostly treat log analysis as training a model to perform an isolated task, using task-specific log-label pairs. These task-based approaches are inflexible in generalizing to complex scenarios, depend on task-speci… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  30. arXiv:2410.08934  [pdf, other

    stat.ML cs.DC cs.LG math.ST stat.CO

    The Effect of Personalization in FedProx: A Fine-grained Analysis on Statistical Accuracy and Communication Efficiency

    Authors: Xin Yu, Zelin He, Ying Sun, Lingzhou Xue, Runze Li

    Abstract: FedProx is a simple yet effective federated learning method that enables model personalization via regularization. Despite remarkable success in practice, a rigorous analysis of how such a regularization provably improves the statistical accuracy of each client's local model hasn't been fully established. Setting the regularization strength heuristically presents a risk, as an inappropriate choice… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  31. arXiv:2410.08706  [pdf, other

    cs.NI eess.SP

    Goal-Oriented Communications for Real-time Inference with Two-Way Delay

    Authors: Cagri Ari, Md Kamran Chowdhury Shisher, Yin Sun, Elif Uysal

    Abstract: We design a goal-oriented communication strategy for remote inference, where an intelligent model (e.g., a pre-trained neural network) at the receiver side predicts the real-time value of a target signal based on data packets transmitted from a remote location. The inference error depends on both the Age of Information (AoI) and the length of the data packets. Previous formulations of this problem… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: 12 pages, 8 figures

  32. arXiv:2410.08613  [pdf, other

    cs.CV cs.AI

    Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation

    Authors: Zhe Dong, Yuzhe Sun, Yanfeng Gu, Tianzhu Liu

    Abstract: Given a natural language expression and a remote sensing image, the goal of referring remote sensing image segmentation (RRSIS) is to generate a pixel-level mask of the target object identified by the referring expression. In contrast to natural scenarios, expressions in RRSIS often involve complex geospatial relationships, with target objects of interest that vary significantly in scale and lack… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  33. arXiv:2410.08603  [pdf, other

    hep-ex

    Observation of $D^+\toη^\primeμ^+ν_μ$ and First Study of $D^+\to η^\prime \ell^+ν_\ell$ Decay Dynamics

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

    Abstract: Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy 3.773\,GeV with the BESIII detector, we report the first observation of the semileptonic decay $D^+\to η^\prime μ^+ν_μ$ with significance of $8.6σ$ including systematic uncertainties, and an improved measurement of $D^+\to η^\prime e^+ν_e$. The branching fractions of $D^+\to η^\prime μ^+ν_μ$ and… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  34. arXiv:2410.08248  [pdf

    physics.optics physics.app-ph

    Dual-band Photonic Filters with Wide Tunable Range Using Chirped Sampled Gratings

    Authors: Siemng Zhu, Bocheng Yuan, Weiqing Cheng, Yizhe Fan, Yiming Sun, Mohanad Al-Rubaiee, Jehan Akbar, John H. Marsh, Lianping Hou

    Abstract: We have developed a photonic filter featuring dual independently tunable passbands. Employing the reconstruction equivalent-chirp technique, we designed linearly chirped sampled Bragg gratings with two equivalent phase shifts positioned at 1/3 and 2/3 of the cavity, thus introducing two passbands in the +1st channel. Leveraging the significant thermo-optic effect of silicon, dual-band tuning is ac… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 5 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2410.07788

  35. arXiv:2410.07788  [pdf

    physics.optics physics.app-ph

    Widely Tunable Photonic Filter Based on Equivalent Chirped Four-Phase-Shifted Sampled Bragg Gratings

    Authors: Simeng Zhu, Bocheng Yuan, Mohanad Al-Rubaiee, Yiming Sun, Yizhe Fan, Ahmet Seckin Hezarfen, Stephen J. Sweeney, John H. Marsh, Lianping Hou

    Abstract: We have developed an integrated dual-band photonic filter (PF) utilizing equivalent chirped four-phase-shifted sidewall-sampled Bragg gratings (4PS-SBG) on a silicon-on-insulator (SOI) platform. Using the reconstruction equivalent-chirp technique, we designed linearly chirped 4PS Bragg gratings with two π-phase shifts (π-PS) positioned at 1/3 and 2/3 of the grating cavity, introducing two passband… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 9 pages, 11 figures

  36. arXiv:2410.07644  [pdf

    cond-mat.soft physics.bio-ph

    Mechanics of soft-body rolling motion without external torque

    Authors: Xudong Liang, Yimiao Ding, Zihao Yuan, Junqi Jiang, Zongling Xie, Peng Fei, Yixuan Sun, Guoying Gu, Zheng Zhong, Feifei Chen, Guangwei Si, Zhefeng Gong

    Abstract: The Drosophila larva, a soft-body animal, can bend its body and roll efficiently to escape danger. However, contrary to common belief, this rolling motion is not driven by the imbalance of gravity and ground reaction forces. Through functional imaging and ablation experiments, we demonstrate that the sequential actuation of axial muscles within an appropriate range of angles is critical for genera… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  37. arXiv:2410.07626  [pdf, other

    hep-ex

    Precision Measurement of the Branching Fraction of $D^{+}\to μ^{+}ν_μ$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

    Abstract: Using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of $E_{\rm cm}=3.773$ GeV with the BESIII detector operating at the BEPCII collider, we determine the branching fraction of the leptonic decay $D^+\toμ^+ν_μ$ to be $(3.981\pm0.079_{\rm stat}\pm0.040_{\rm syst})\times10^{-4}$. Interpreting our measurement with knowledge of the Fermi coupling constant… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 9 pages, 2 figures

  38. arXiv:2410.07021  [pdf, other

    stat.ML cs.LG

    Do Contemporary CATE Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark

    Authors: Haining Yu, Yizhou Sun

    Abstract: We present unexpected findings from a large-scale benchmark study evaluating Conditional Average Treatment Effect (CATE) estimation algorithms. By running 16 modern CATE models across 43,200 datasets, we find that: (a) 62\% of CATE estimates have a higher Mean Squared Error (MSE) than a trivial zero-effect predictor, rendering them ineffective; (b) in datasets with at least one useful CATE estimat… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  39. arXiv:2410.06554  [pdf, other

    cs.CL cs.AI

    The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models

    Authors: Yanjun Chen, Dawei Zhu, Yirong Sun, Xinghao Chen, Wei Zhang, Xiaoyu Shen

    Abstract: Reinforcement Learning from Human Feedback significantly enhances Natural Language Processing by aligning language models with human expectations. A critical factor in this alignment is the strength of reward models used during training. This study explores whether stronger reward models invariably lead to better language models. In this paper, through experiments on relevance, factuality, and com… ▽ More

    Submitted 16 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: 10 pages, 27 figures (including 18 in the appendix), submitted to EMNLP 2024

  40. arXiv:2410.06539  [pdf, other

    hep-ex

    DeepMuon: Accelerating Cosmic-Ray Muon Simulation Based on Optimal Transport

    Authors: Ao-Bo Wang, Chu-Cheng Pan, Xiang Dong, Yu-Chang Sun, Yu-Xuan Hu, Ao-Yan Cheng, Hao Cai, Xi-Long Fan

    Abstract: Cosmic muon imaging technology is increasingly being applied in various fields. However, simulating cosmic muons typically requires the rapid generation of a large number of muons and tracking their complex trajectories through intricate structures. This process is highly computationally demanding and consumes significant CPU time. To address these challenges, we introduce DeepMuon, an innovative… ▽ More

    Submitted 9 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

  41. arXiv:2410.06500  [pdf, other

    hep-ex

    Search for the radiative decays $D^+\toγρ^+$ and $D^+\toγK^{*+}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (648 additional authors not shown)

    Abstract: We search for the radiative decays $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ using 20.3~fb$^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and the upper limits on the branching fractions of $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ at 90\% confidence level ar… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  42. arXiv:2410.06366  [pdf, other

    cs.LG cs.AI

    Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling

    Authors: Zijie Huang, Wanjia Zhao, Jingdong Gao, Ziniu Hu, Xiao Luo, Yadi Cao, Yuanzhou Chen, Yizhou Sun, Wei Wang

    Abstract: Learning complex physical dynamics purely from data is challenging due to the intrinsic properties of systems to be satisfied. Incorporating physics-informed priors, such as in Hamiltonian Neural Networks (HNNs), achieves high-precision modeling for energy-conservative systems. However, real-world systems often deviate from strict energy conservation and follow different physical priors. To addres… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted to The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

  43. arXiv:2410.05818  [pdf

    cond-mat.mes-hall

    Hot electron lifetime exceeds 300 nanoseconds in quantum dots with high quantum efficiency

    Authors: Beibei Tang, Bo Li, Yingying Sun, Jianshun Li, Yanheng Guo, Jiaojiao Song, Xiaohan Yan, Huimin Zhang, Xiaosuo Wang, Fei Chen, Lei Wang, Jiangfeng Du, Huaibin Shen, Fengjia Fan

    Abstract: Hot electrons are theoretically predicted to be long-lived in strongly confined quantum dots, which could play vital roles in quantum dot-based optoelectronics; however, existing photoexcitation transient spectroscopy investigations reveal that their lifetime is less than 1 ps in well-passivated quantum dots because of the ultrafast electron-hole Auger-assisted cooling. Therefore, they are general… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  44. arXiv:2410.05736  [pdf, ps, other

    hep-ex

    Observation of an axial-vector state in the study of $ψ(3686) \to φηη'$ decay

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (625 additional authors not shown)

    Abstract: Using (2712.4 $\pm$ 14.3)$\times 10^{6}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, a partial wave analysis of the decay $ψ(3686) \to φηη' $ is performed with the covariant tensor approach. An axial-vector state with a mass near 2.3 $\rm GeV/c^2$ is observed for the first time. Its mass and width are measured to be 2316… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  45. arXiv:2410.05648  [pdf, other

    cs.LG cs.CL

    Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective

    Authors: Xueying Bai, Yifan Sun, Niranjan Balasubramanian

    Abstract: Continual learning (CL) aims to train models that can sequentially learn new tasks without forgetting previous tasks' knowledge. Although previous works observed that pre-training can benefit CL, it remains unclear whether a pre-trained model with higher downstream capacity also performs better in CL. In this paper, we observe that pre-trained models may allocate high attention scores to some 'sin… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: COLM 2024

  46. arXiv:2410.05258  [pdf, other

    cs.CL cs.LG

    Differential Transformer

    Authors: Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei

    Abstract: Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, the differential attention mechanism calculates attention scores as the difference between two separate softmax attention maps. The subtraction cancels noise, promoting the emergence of sparse attentio… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  47. arXiv:2410.04477  [pdf, other

    stat.CO cs.CE

    Block Vecchia Approximation for Scalable and Efficient Gaussian Process Computations

    Authors: Qilong Pan, Sameh Abdulah, Marc G. Genton, Ying Sun

    Abstract: Gaussian Processes (GPs) are vital for modeling and predicting irregularly-spaced, large geospatial datasets. However, their computations often pose significant challenges in large-scale applications. One popular method to approximate GPs is the Vecchia approximation, which approximates the full likelihood via a series of conditional probabilities. The classical Vecchia approximation uses univaria… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  48. arXiv:2410.04350  [pdf, other

    cs.CL

    TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights

    Authors: Aiwei Liu, Haoping Bai, Zhiyun Lu, Yanchao Sun, Xiang Kong, Simon Wang, Jiulong Shan, Albin Madappally Jose, Xiaojiang Liu, Lijie Wen, Philip S. Yu, Meng Cao

    Abstract: Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: 27 pages, 7 figures, 2 tables

    MSC Class: 68T50 ACM Class: I.2.7

  49. arXiv:2410.04283  [pdf

    cs.LG

    Applying Hybrid Graph Neural Networks to Strengthen Credit Risk Analysis

    Authors: Mengfang Sun, Wenying Sun, Ying Sun, Shaobo Liu, Mohan Jiang, Zhen Xu

    Abstract: This paper presents a novel approach to credit risk prediction by employing Graph Convolutional Neural Networks (GCNNs) to assess the creditworthiness of borrowers. Leveraging the power of big data and artificial intelligence, the proposed method addresses the challenges faced by traditional credit risk assessment models, particularly in handling imbalanced datasets and extracting meaningful featu… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  50. Constructing Cloze Questions Generatively

    Authors: Yicheng Sun, Jie Wang

    Abstract: We present a generative method called CQG for constructing cloze questions from a given article using neural networks and WordNet, with an emphasis on generating multigram distractors. Built on sense disambiguation, text-to-text transformation, WordNet's synset taxonomies and lexical labels, CQG selects an answer key for a given sentence, segments it into a sequence of instances, generates instanc… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: 8 pages, 5 figures,5 tables, 2023 International Joint Conference on Neural Networks (IJCNN)

    ACM Class: I.2.7

    Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 2023, pp. 1-8