subscribe to arXiv mailings

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

Authors: Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu

Abstract: We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). FDR transforms the regression process from predicting fixed coordinates to iteratively ref… ▽ More We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). FDR transforms the regression process from predicting fixed coordinates to iteratively refining probability distributions, providing a fine-grained intermediate representation that significantly enhances localization accuracy. GO-LSD is a bidirectional optimization strategy that transfers localization knowledge from refined distributions to shallower layers through self-distillation, while also simplifying the residual prediction tasks for deeper layers. Additionally, D-FINE incorporates lightweight optimizations in computationally intensive modules and operations, achieving a better balance between speed and accuracy. Specifically, D-FINE-L / X achieves 54.0% / 55.8% AP on the COCO dataset at 124 / 78 FPS on an NVIDIA T4 GPU. When pretrained on Objects365, D-FINE-L / X attains 57.1% / 59.3% AP, surpassing all existing real-time detectors. Furthermore, our method significantly enhances the performance of a wide range of DETR models by up to 5.3% AP with negligible extra parameters and training costs. Our code and pretrained models: https://github.com/Peterande/D-FINE. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.13515 [pdf, other]

Observation of a rare beta decay of the charmed baryon with a Graph Neural Network

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (637 additional authors not shown)

Abstract: The study of beta decay of the charmed baryon provides unique insights into the fundamental mechanism of the strong and electro-weak interactions. The $Λ_c^+$, being the lightest charmed baryon, undergoes disintegration solely through the charm quark weak decay. Its beta decay provides an ideal laboratory for investigating non-perturbative effects in quantum chromodynamics and for constraining the… ▽ More The study of beta decay of the charmed baryon provides unique insights into the fundamental mechanism of the strong and electro-weak interactions. The $Λ_c^+$, being the lightest charmed baryon, undergoes disintegration solely through the charm quark weak decay. Its beta decay provides an ideal laboratory for investigating non-perturbative effects in quantum chromodynamics and for constraining the fundamental parameters of the Cabibbo-Kobayashi-Maskawa matrix in weak interaction theory. This article presents the first observation of the Cabibbo-suppressed $Λ_c^+$ beta decay into a neutron $Λ_c^+ \rightarrow n e^+ ν_{e}$, based on $4.5~\mathrm{fb}^{-1}$ of electron-positron annihilation data collected with the BESIII detector in the energy region above the $Λ^+_c\barΛ^-_c$ threshold. A novel machine learning technique, leveraging Graph Neural Networks, has been utilized to effectively separate signals from dominant backgrounds, particularly $Λ_c^+ \rightarrow Λe^+ ν_{e}$. This approach has yielded a statistical significance of more than $10σ$. The absolute branching fraction of $Λ_c^+ \rightarrow n e^+ ν_{e}$ is measured to be $(3.57\pm0.34_{\mathrm{stat}}\pm0.14_{\mathrm{syst}})\times 10^{-3}$. For the first time, the CKM matrix element $\left|V_{cd}\right|$ is extracted via a charmed baryon decay to be $0.208\pm0.011_{\rm exp.}\pm0.007_{\rm LQCD}\pm0.001_{τ_{Λ_c^+}}$. This study provides a new probe to further understand fundamental interactions in the charmed baryon sector, and demonstrates the power of modern machine learning techniques in enhancing experimental capability in high energy physics research. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 28 pages, 6 figures

arXiv:2410.13478 [pdf, other]

Observation of $χ_{c0}\toΣ^{+}\barΣ^{-}η$ and evidence for $χ_{c1,2}\toΣ^{+}\barΣ^{-}η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, the decay $χ_{c0}\toΣ^{+}\barΣ^{-}η$ is observed for the first time with a statistical significance of $7.0σ$, and evidence for $χ_{c1}\toΣ^{+}\barΣ^{-}η$ and $χ_{c2}\toΣ^{+}\barΣ^{-}η$ is found with statistical significances of $4.3σ$ and $4.6σ$, respectively. The branching fractions are determined to be… ▽ More Using $(27.12\pm 0.14)\times10^{8}$ $ψ(3686)$ events collected with the BESIII detector, the decay $χ_{c0}\toΣ^{+}\barΣ^{-}η$ is observed for the first time with a statistical significance of $7.0σ$, and evidence for $χ_{c1}\toΣ^{+}\barΣ^{-}η$ and $χ_{c2}\toΣ^{+}\barΣ^{-}η$ is found with statistical significances of $4.3σ$ and $4.6σ$, respectively. The branching fractions are determined to be $\mathcal{B}(χ_{c0}\toΣ^{+}\barΣ^{-}η)=({1.26 \pm 0.20 \pm 0.13}) \times 10^{-4}, ~\mathcal{B}(χ_{c1}\toΣ^{+}\barΣ^{-}η)=({5.10 \pm 1.21 \pm 0.67}) \times 10^{-5}$, and $\mathcal{B}(χ_{c2}\toΣ^{+}\barΣ^{-}η)=({5.46 \pm 1.18 \pm 0.50}) \times 10^{-5}$, where the first uncertainties are statistical, and the second ones are systematic. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.13368 [pdf, other]

Observation of the Singly Cabibbo-Suppressed Decay $Λ_c^{+}\to pπ^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Utilizing 4.5${~\rm{fb}}^{-1}$ of $e^+e^-$ annihilation data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 4.600 and 4.699 GeV, the first observation of the singly Cabibbo-suppressed decay $Λ_c^{+}\to pπ^0$ is presented, with a statistical significance of $5.4σ$. The ratio of the branching fractions of $Λ_c^{+}\to pπ^0$ and $Λ_c^{+}\to pη$ is measured… ▽ More Utilizing 4.5${~\rm{fb}}^{-1}$ of $e^+e^-$ annihilation data collected with the BESIII detector at the BEPCII collider at center-of-mass energies between 4.600 and 4.699 GeV, the first observation of the singly Cabibbo-suppressed decay $Λ_c^{+}\to pπ^0$ is presented, with a statistical significance of $5.4σ$. The ratio of the branching fractions of $Λ_c^{+}\to pπ^0$ and $Λ_c^{+}\to pη$ is measured as $\mathcal{B}(Λ_c^{+}\to pπ^0)/\mathcal{B}(Λ_c^{+}\to pη)=(0.120\pm0.026_{\rm stat.}\pm0.007_{\rm syst.})$. This result resolves the longstanding discrepancy between earlier experimental searches, providing both a decisive conclusion and valuable input for QCD-inspired theoretical models. A sophisticated deep learning approach using a Transformer-based architecture is employed to distinguish the signal from the prevalent hadronic backgrounds, complemented by thorough validation and systematic uncertainty quantification. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 9 pages, 4 figures

arXiv:2410.12793 [pdf, other]

Environment Scan of Generative AI Infrastructure for Clinical and Translational Science

Authors: Betina Idnay, Zihan Xu, William G. Adams, Mohammad Adibuzzaman, Nicholas R. Anderson, Neil Bahroos, Douglas S. Bell, Cody Bumgardner, Thomas Campion, Mario Castro, James J. Cimino, I. Glenn Cohen, David Dorr, Peter L Elkin, Jungwei W. Fan, Todd Ferris, David J. Foran, David Hanauer, Mike Hogarth, Kun Huang, Jayashree Kalpathy-Cramer, Manoj Kandpal, Niranjan S. Karnik, Avnish Katoch, Albert M. Lai , et al. (32 additional authors not shown)

Abstract: This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With t… ▽ More This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With the rapid advancement of GenAI technologies, including large language models (LLMs), healthcare institutions face unprecedented opportunities and challenges. This research explores the current status of GenAI integration, focusing on stakeholder roles, governance structures, and ethical considerations by administering a survey among leaders of health institutions (i.e., representing academic medical centers and health systems) to assess the institutional readiness and approach towards GenAI adoption. Key findings indicate a diverse range of institutional strategies, with most organizations in the experimental phase of GenAI deployment. The study highlights significant variations in governance models, with a strong preference for centralized decision-making but notable gaps in workforce training and ethical oversight. Moreover, the results underscore the need for a more coordinated approach to GenAI governance, emphasizing collaboration among senior leaders, clinicians, information technology staff, and researchers. Our analysis also reveals concerns regarding GenAI bias, data security, and stakeholder trust, which must be addressed to ensure the ethical and effective implementation of GenAI technologies. This study offers valuable insights into the challenges and opportunities of GenAI integration in healthcare, providing a roadmap for institutions aiming to leverage GenAI for improved quality of care and operational efficiency. △ Less

Submitted 27 September, 2024; originally announced October 2024.

arXiv:2410.12620 [pdf, other]

Search for $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ at center-of-mass energies from 4.47 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Utilizing a data set of $6.7$ fb$^{-1}$ from electron-positron collisions recorded by the BESIII detector at the BEPCII storage ring, a search is conducted for the processes $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ across center-of-mass energies from 4.47 to 4.95 GeV. In the absence of any significant signals, upper limits are set. These include limits on the Born cross sections for… ▽ More Utilizing a data set of $6.7$ fb$^{-1}$ from electron-positron collisions recorded by the BESIII detector at the BEPCII storage ring, a search is conducted for the processes $e^{+}e^{-} \to φχ_{c0}$ and $φη_{c2}(1D)$ across center-of-mass energies from 4.47 to 4.95 GeV. In the absence of any significant signals, upper limits are set. These include limits on the Born cross sections for $e^{+}e^{-} \to φχ_{c0}$, as well as the product of the Born cross section for $e^{+}e^{-} \to φη_{c2}(1D)$ and a sum of five branching fractions. Furthermore, the product of the electronic width of $Y(4660)$ and the branching fraction of the $Y(4660) \to φχ_{c0}$, denoted as $Γ^{Y(4660)}_{e^{+}e^{-}} \mathcal{B}_{Y(4660) \to φχ_{c0}}$, is determined to be $< 0.40$ eV at the 90\% confidence level. △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: 14 pages, 6 figures

arXiv:2410.11607 [pdf, other]

Observation of $χ_{cJ}\to p \bar p K^0_S K^- π^+ + c.c.$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (648 additional authors not shown)

Abstract: By analyzing $(27.12\pm0.14)\times10^8$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays of $χ_{cJ} \to p \bar{p} K^0_S K^- π^+ +c.c.(J=0, 1, 2)$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be… ▽ More By analyzing $(27.12\pm0.14)\times10^8$ $ψ(3686)$ events collected with the BESIII detector operating at the BEPCII collider, the decays of $χ_{cJ} \to p \bar{p} K^0_S K^- π^+ +c.c.(J=0, 1, 2)$ are observed for the first time with statistical significances greater than $10σ$. The branching fractions of these decays are determined to be $\mathcal{B}(χ_{c0}\to p \bar p K^{0}_{S} K^- π^+ + c.c.)=(2.61\pm0.27\pm0.32)\times10^{-5},$ $\mathcal{B}(χ_{c1}\to p \bar p K^{0}_{S} K^- π^+ + c.c.)=(4.16\pm0.24\pm0.46)\times10^{-5},$ and $\mathcal{B}(χ_{c2}\to p \bar p K^{0}_{S} K^- π^+ + c.c.)=(5.63\pm0.28\pm0.46)\times10^{-5}$, respectively. The processes $χ_{c1,2} \to \bar{p} Λ(1520) K^0_S π^{+} + c.c.$ are also observed, with statistical significances of 5.7$σ$ and 7.0$σ$, respectively. Evidence for $χ_{c0} \to\bar{p} Λ(1520) K^0_S π^{+} + c.c.$ is found with statistical significances of 3.3$σ$ each. The corresponding branching fractions are determined to be $\mathcal{B}(χ_{c0}\to \bar{p} Λ(1520) K^0_S π^{+} + c.c.) =(1.61^{+0.68}_{-0.64}\pm0.23)\times10^{-5}$, $\mathcal{B}(χ_{c1}\to \bar{p} Λ(1520) K^0_S π^{+} + c.c.)=(4.06^{+0.80}_{-0.76}\pm0.52)\times10^{-5}$, and $\mathcal{B}(χ_{c2}\to \bar{p} Λ(1520) K^0_S π^{+} + c.c.)=(4.09^{+0.87}_{-0.84}\pm0.42)\times10^{-5}$. Here, the first uncertainties are statistical and the second ones are systematic. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 12 pages, 5 figures

arXiv:2410.08634 [pdf, other]

GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning

Authors: Yubo Peng, Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang

Abstract: Federated learning (FL) is a commonly distributed algorithm for mobile users (MUs) training artificial intelligence (AI) models, however, several challenges arise when applying FL to real-world scenarios, such as label scarcity, non-IID data, and unexplainability. As a result, we propose an explainable personalized FL framework, called XPFL. First, we introduce a generative AI (GAI) assisted perso… ▽ More Federated learning (FL) is a commonly distributed algorithm for mobile users (MUs) training artificial intelligence (AI) models, however, several challenges arise when applying FL to real-world scenarios, such as label scarcity, non-IID data, and unexplainability. As a result, we propose an explainable personalized FL framework, called XPFL. First, we introduce a generative AI (GAI) assisted personalized federated semi-supervised learning, called GFed. Particularly, in local training, we utilize a GAI model to learn from large unlabeled data and apply knowledge distillation-based semi-supervised learning to train the local FL model using the knowledge acquired from the GAI model. In global aggregation, we obtain the new local FL model by fusing the local and global FL models in specific proportions, allowing each local model to incorporate knowledge from others while preserving its personalized characteristics. Second, we propose an explainable AI mechanism for FL, named XFed. Specifically, in local training, we apply a decision tree to match the input and output of the local FL model. In global aggregation, we utilize t-distributed stochastic neighbor embedding (t-SNE) to visualize the local models before and after aggregation. Finally, simulation results validate the effectiveness of the proposed XPFL framework. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.08603 [pdf, other]

Observation of $D^+\toη^\primeμ^+ν_μ$ and First Study of $D^+\to η^\prime \ell^+ν_\ell$ Decay Dynamics

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy 3.773\,GeV with the BESIII detector, we report the first observation of the semileptonic decay $D^+\to η^\prime μ^+ν_μ$ with significance of $8.6σ$ including systematic uncertainties, and an improved measurement of $D^+\to η^\prime e^+ν_e$. The branching fractions of $D^+\to η^\prime μ^+ν_μ$ and… ▽ More Using $20.3\,\rm fb^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy 3.773\,GeV with the BESIII detector, we report the first observation of the semileptonic decay $D^+\to η^\prime μ^+ν_μ$ with significance of $8.6σ$ including systematic uncertainties, and an improved measurement of $D^+\to η^\prime e^+ν_e$. The branching fractions of $D^+\to η^\prime μ^+ν_μ$ and $D^+\to η^\prime e^+ν_e$ are determined to be $(1.92\pm0.28_{\rm stat}\pm 0.08_{\rm syst})\times 10^{-4}$ and $(1.79\pm0.19_{\rm stat}\pm 0.07_{\rm syst})\times 10^{-4}$, respectively. From an analysis of the $D^+\to η^\prime \ell^+ν_\ell$ decay dynamics, the product of the hadronic form factor $f_+^{η^{\prime}}(0)$ and the CKM matrix element $|V_{cd}|$ is measured for the first time, giving $f^{η^\prime}_+(0)|V_{cd}| = (5.92\pm0.56_{\rm stat}\pm0.13_{\rm syst})\times 10^{-2}$. No evidence for violation of $μ-e$ lepton-flavor universality is found in both the full range and several bins of $\ell^+ν_\ell$ four-momentum transfer. The $η-η^\prime$ mixing angle in the quark flavor basis is determined to be $φ_{\rm P} =(39.8\pm0.8_{\rm stat}\pm0.3_{\rm syst})^\circ$. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.07654 [pdf, other]

Firzen: Firing Strict Cold-Start Items with Frozen Heterogeneous and Homogeneous Graphs for Recommendation

Authors: Hulingxiao He, Xiangteng He, Yuxin Peng, Zifei Shan, Xin Su

Abstract: Recommendation models utilizing unique identities (IDs) to represent distinct users and items have dominated the recommender systems literature for over a decade. Since multi-modal content of items (e.g., texts and images) and knowledge graphs (KGs) may reflect the interaction-related users' preferences and items' characteristics, they have been utilized as useful side information to further impro… ▽ More Recommendation models utilizing unique identities (IDs) to represent distinct users and items have dominated the recommender systems literature for over a decade. Since multi-modal content of items (e.g., texts and images) and knowledge graphs (KGs) may reflect the interaction-related users' preferences and items' characteristics, they have been utilized as useful side information to further improve the recommendation quality. However, the success of such methods often limits to either warm-start or strict cold-start item recommendation in which some items neither appear in the training data nor have any interactions in the test stage: (1) Some fail to learn the embedding of a strict cold-start item since side information is only utilized to enhance the warm-start ID representations; (2) The others deteriorate the performance of warm-start recommendation since unrelated multi-modal content or entities in KGs may blur the final representations. In this paper, we propose a unified framework incorporating multi-modal content of items and KGs to effectively solve both strict cold-start and warm-start recommendation termed Firzen, which extracts the user-item collaborative information over frozen heterogeneous graph (collaborative knowledge graph), and exploits the item-item semantic structures and user-user behavioral association over frozen homogeneous graphs (item-item relation graph and user-user co-occurrence graph). Furthermore, we build four unified strict cold-start evaluation benchmarks based on publicly available Amazon datasets and a real-world industrial dataset from Weixin Channels via rearranging the interaction data and constructing KGs. Extensive empirical results demonstrate that our model yields significant improvements for strict cold-start recommendation and outperforms or matches the state-of-the-art performance in the warm-start scenario. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: Accepted by ICDE 2024. The code is available at https://github.com/PKU-ICST-MIPL/Firzen_ICDE2024

arXiv:2410.07626 [pdf, other]

Precision Measurement of the Branching Fraction of $D^{+}\to μ^{+}ν_μ$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of $E_{\rm cm}=3.773$ GeV with the BESIII detector operating at the BEPCII collider, we determine the branching fraction of the leptonic decay $D^+\toμ^+ν_μ$ to be $(3.981\pm0.079_{\rm stat}\pm0.040_{\rm syst})\times10^{-4}$. Interpreting our measurement with knowledge of the Fermi coupling constant… ▽ More Using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of $E_{\rm cm}=3.773$ GeV with the BESIII detector operating at the BEPCII collider, we determine the branching fraction of the leptonic decay $D^+\toμ^+ν_μ$ to be $(3.981\pm0.079_{\rm stat}\pm0.040_{\rm syst})\times10^{-4}$. Interpreting our measurement with knowledge of the Fermi coupling constant $G_F$, the masses of the $D^+$ and $μ^+$ as well as the lifetime of the $D^+$, we determine $f_{D^+}|V_{cd}|=(47.53\pm0.48_{\rm stat}\pm0.24_{\rm syst}\pm0.12_{\rm input})~\mathrm{MeV}$. This result is a factor of 2.3 more precise than the previous best measurement. Using the value of the magnitude of the Cabibbo-Kobayashi-Maskawa matrix element $|V_{cd}|$ given by the global standard model fit, we obtain the $D^+$ decay constant $f_{D^+}=(211.5\pm2.3_{\rm stat}\pm1.1_{\rm syst}\pm0.8_{\rm input})$ MeV. Alternatively, using the value of $f_{D^+}$ from a precise lattice quantum chromodynamics calculation, we extract $|V_{cd}|=0.2242\pm0.0023_{\rm stat}\pm0.0011_{\rm syst}\pm0.0009_{\rm input}$. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 9 pages, 2 figures

arXiv:2410.07528 [pdf, other]

CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting

Authors: Hulingxiao He, Yaqi Zhang, Jinglin Xu, Yuxin Peng

Abstract: Plant counting is essential in every stage of agriculture, including seed breeding, germination, cultivation, fertilization, pollination yield estimation, and harvesting. Inspired by the fact that humans count objects in high-resolution images by sequential scanning, we explore the potential of handling plant counting tasks via state space models (SSMs) for generating counting results. In this pap… ▽ More Plant counting is essential in every stage of agriculture, including seed breeding, germination, cultivation, fertilization, pollination yield estimation, and harvesting. Inspired by the fact that humans count objects in high-resolution images by sequential scanning, we explore the potential of handling plant counting tasks via state space models (SSMs) for generating counting results. In this paper, we propose a new counting approach named CountMamba that constructs multiple counting experts to scan from various directions simultaneously. Specifically, we design a Multi-directional State-Space Group to process the image patch sequences in multiple orders and aim to simulate different counting experts. We also design Global-Local Adaptive Fusion to adaptively aggregate global features extracted from multiple directions and local features extracted from the CNN branch in a sample-wise manner. Extensive experiments demonstrate that the proposed CountMamba performs competitively on various plant counting tasks, including maize tassels, wheat ears, and sorghum head counting. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: Accepted by PRCV 2024

arXiv:2410.06500 [pdf, other]

Search for the radiative decays $D^+\toγρ^+$ and $D^+\toγK^{*+}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (648 additional authors not shown)

Abstract: We search for the radiative decays $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ using 20.3~fb$^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and the upper limits on the branching fractions of $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ at 90\% confidence level ar… ▽ More We search for the radiative decays $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ using 20.3~fb$^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and the upper limits on the branching fractions of $D^{+} \to γρ^+$ and $D^{+} \to γK^{*+}$ at 90\% confidence level are set to be $1.3\times10^{-5}$ and $1.8\times10^{-5}$, respectively. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05966 [pdf, other]

FLOPS: Forward Learning with OPtimal Sampling

Authors: Tao Ren, Zishi Zhang, Jinyang Jiang, Guanghao Li, Zeliang Zhang, Mingqian Feng, Yijie Peng

Abstract: Given the limitations of backpropagation, perturbation-based gradient computation methods have recently gained focus for learning with only forward passes, also referred to as queries. Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling, which hinders the scalability of those algorithms. However, not all data poin… ▽ More Given the limitations of backpropagation, perturbation-based gradient computation methods have recently gained focus for learning with only forward passes, also referred to as queries. Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling, which hinders the scalability of those algorithms. However, not all data points deserve equal queries for gradient estimation. In this paper, we study the problem of improving the forward learning efficiency from a novel perspective: how to reduce the gradient estimation variance with minimum cost? For this, we propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency. Specifically, with a simplified proxy objective and a reparameterization technique, we derive a novel plug-and-play query allocator with minimal parameters. Theoretical results are carried out to verify its optimality. We conduct extensive experiments for fine-tuning Vision Transformers on various datasets and further deploy the allocator to two black-box applications: prompt tuning and multimodal alignment for foundation models. All findings demonstrate that our proposed allocator significantly enhances the scalability of forward-learning algorithms, paving the way for real-world applications. △ Less

Submitted 17 October, 2024; v1 submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05736 [pdf, ps, other]

Observation of an axial-vector state in the study of $ψ(3686) \to φηη'$ decay

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (625 additional authors not shown)

Abstract: Using (2712.4 $\pm$ 14.3)$\times 10^{6}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, a partial wave analysis of the decay $ψ(3686) \to φηη' $ is performed with the covariant tensor approach. An axial-vector state with a mass near 2.3 $\rm GeV/c^2$ is observed for the first time. Its mass and width are measured to be 2316… ▽ More Using (2712.4 $\pm$ 14.3)$\times 10^{6}$ $ψ(3686)$ events collected with the BESIII detector at BEPCII, a partial wave analysis of the decay $ψ(3686) \to φηη' $ is performed with the covariant tensor approach. An axial-vector state with a mass near 2.3 $\rm GeV/c^2$ is observed for the first time. Its mass and width are measured to be 2316 $\pm 9_{\mathrm{stat}} \pm 30_{\mathrm{syst}}\,\rm MeV/c^2$ and 89 $\pm 15_{\mathrm{stat}} \pm 26_{\mathrm{syst}}\,\rm MeV$, respectively. The product branching fractions of $\mathcal{B}(ψ(3686) \to X(2300) η') \mathcal{B}(X(2300)\to φη)$ and $\mathcal{B}(ψ(3686) \to X(2300) η)\mathcal{B}(X(2300)\to φη')$ are determined to be (4.8 $\pm 1.3_{\mathrm{stat}} \pm 0.7_{\mathrm{syst}})\times 10^{-6}$ and (2.2 $\pm 0.7_{\mathrm{stat}} \pm 0.7_{\mathrm{syst}})\times 10^{-6}$, respectively. The branching fraction $\mathcal{B}(ψ(3686) \to φηη')$ is measured for the first time to be (3.14$\pm0.17_{\mathrm{stat}}\pm0.24_{\mathrm{syst}})\times10^{-5}$. The first uncertainties are statistical and the second are systematic. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05650 [pdf, other]

doi 10.1145/3664647.3680642

SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection

Authors: Zishuo Wang, Wenhao Zhou, Jinglin Xu, Yuxin Peng

Abstract: Open-vocabulary detection (OVD) aims to detect novel objects without instance-level annotations to achieve open-world object detection at a lower cost. Existing OVD methods mainly rely on the powerful open-vocabulary image-text alignment capability of Vision-Language Pretrained Models (VLM) such as CLIP. However, CLIP is trained on image-text pairs and lacks the perceptual ability for local region… ▽ More Open-vocabulary detection (OVD) aims to detect novel objects without instance-level annotations to achieve open-world object detection at a lower cost. Existing OVD methods mainly rely on the powerful open-vocabulary image-text alignment capability of Vision-Language Pretrained Models (VLM) such as CLIP. However, CLIP is trained on image-text pairs and lacks the perceptual ability for local regions within an image, resulting in the gap between image and region representations. Directly using CLIP for OVD causes inaccurate region classification. We find the image-region gap is primarily caused by the deformation of region feature maps during region of interest (RoI) extraction. To mitigate the inaccurate region classification in OVD, we propose a new Shape-Invariant Adapter named SIA-OVD to bridge the image-region gap in the OVD task. SIA-OVD learns a set of feature adapters for regions with different shapes and designs a new adapter allocation mechanism to select the optimal adapter for each region. The adapted region representations can align better with text representations learned by CLIP. Extensive experiments demonstrate that SIA-OVD effectively improves the classification accuracy for regions by addressing the gap between images and regions caused by shape deformation. SIA-OVD achieves substantial improvements over representative methods on the COCO-OVD benchmark. The code is available at https://github.com/PKU-ICST-MIPL/SIA-OVD_ACMMM2024. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: 9 pages, 7 figures

ACM Class: I.2.10

arXiv:2410.02825 [pdf, other]

Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG

Authors: Chenhao Fang, Derek Larson, Shitong Zhu, Sophie Zeng, Wendy Summer, Yanqing Peng, Yuriy Hulovatyy, Rajeev Rao, Gabriel Forgues, Arya Pudota, Alex Goncalves, Hervé Robert

Abstract: This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM)… ▽ More This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies. △ Less

Submitted 11 October, 2024; v1 submitted 30 September, 2024; originally announced October 2024.

arXiv:2410.02450 [pdf, other]

Personalized Federated Learning for Generative AI-Assisted Semantic Communications

Authors: Yubo Peng, Feibo Jiang, Li Dong, Kezhi Wang, Kun Yang

Abstract: Semantic Communication (SC) focuses on transmitting only the semantic information rather than the raw data. This approach offers an efficient solution to the issue of spectrum resource utilization caused by the various intelligent applications on Mobile Users (MUs). Generative Artificial Intelligence (GAI) models have recently exhibited remarkable content generation and signal processing capabilit… ▽ More Semantic Communication (SC) focuses on transmitting only the semantic information rather than the raw data. This approach offers an efficient solution to the issue of spectrum resource utilization caused by the various intelligent applications on Mobile Users (MUs). Generative Artificial Intelligence (GAI) models have recently exhibited remarkable content generation and signal processing capabilities, presenting new opportunities for enhancing SC. Therefore, we propose a GAI-assisted SC (GSC) model deployed between MUs and the Base Station (BS). Then, to train the GSC model using the local data of MUs while ensuring privacy and accommodating heterogeneous requirements of MUs, we introduce Personalized Semantic Federated Learning (PSFL). This approach incorporates a novel Personalized Local Distillation (PLD) and Adaptive Global Pruning (AGP). In PLD, each MU selects a personalized GSC model as a mentor tailored to its local resources and a unified Convolutional Neural Networks (CNN)-based SC (CSC) model as a student. This mentor model is then distilled into the student model for global aggregation. In AGP, we perform network pruning on the aggregated global model according to real-time communication environments, reducing communication energy. Finally, numerical results demonstrate the feasibility and efficiency of the proposed PSFL scheme. △ Less

Submitted 3 October, 2024; originally announced October 2024.

arXiv:2410.02421 [pdf, other]

Search for lepton number violating decays of $D_s^+\to h^-h^0e^+e^+$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (650 additional authors not shown)

Abstract: Based on 7.33 fb$^{-1}$ of $e^+e^-$ collision data collected by the BESIII detector operating at the BEPCII collider at center-of-mass energies from 4.128 to 4.226 GeV, a search for the Majorana neutrino $ν_m$ is conducted in the lepton-number-violating decays of $D_s^+\to h^-h^0e^+e^+$. Here, $h^-$ represents a $K^-$ or $π^-$, and $h^0$ represents a $π^0$, $K_S^0$ or $φ$. No significant signal is… ▽ More Based on 7.33 fb$^{-1}$ of $e^+e^-$ collision data collected by the BESIII detector operating at the BEPCII collider at center-of-mass energies from 4.128 to 4.226 GeV, a search for the Majorana neutrino $ν_m$ is conducted in the lepton-number-violating decays of $D_s^+\to h^-h^0e^+e^+$. Here, $h^-$ represents a $K^-$ or $π^-$, and $h^0$ represents a $π^0$, $K_S^0$ or $φ$. No significant signal is observed, and the upper limits of their branching fractions at the 90\% confidence level are determined to be $\mathcal{B}(D_s^+\to φπ^-e^+e^+) < 6.9 \times 10^{-5}$, $\mathcal{B}(D_s^+\to φK^-e^+e^+) < 9.9 \times 10^{-5}$, $\mathcal{B}(D_s^+\to K_S^0π^-e^+e^+) < 1.3 \times 10^{-5}$, $\mathcal{B}(D_s^+\to K_S^0K^-e^+e^+) < 2.9 \times 10^{-5}$, $\mathcal{B}(D_s^+\to π^-π^0e^+e^+) < 2.9 \times 10^{-5}$ and $\mathcal{B}(D_s^+\to K^-π^0e^+e^+) < 3.4 \times 10^{-5}$. The Majorana neutrino is searched for with different mass assumptions within the range [0.20, 0.80] GeV$/c^2$ in the decay of $D_s^+\toφe^+ν_m$ with $ν_m\toπ^-e^+$, and the upper limits of the branching fractions at the 90\% confidence level are at the level of $10^{-5}-10^{-2}$, depending on the mass of the Majorana neutrino. △ Less

Submitted 3 October, 2024; originally announced October 2024.

arXiv:2410.01231 [pdf, other]

Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search

Authors: Shuo Yang, Jiadong Xie, Yingfan Liu, Jeffrey Xu Yu, Xiyue Gao, Qianru Wang, Yanguo Peng, Jiangtao Cui

Abstract: Proximity graphs (PG) have gained increasing popularity as the state-of-the-art (SOTA) solutions to $k$-approximate nearest neighbor ($k$-ANN) search on high-dimensional data, which serves as a fundamental function in various fields, e.g. information retrieval and retrieval-augmented generation~(RAG). Although PG-based approaches have the best $k$-ANN search performance, their index construction c… ▽ More Proximity graphs (PG) have gained increasing popularity as the state-of-the-art (SOTA) solutions to $k$-approximate nearest neighbor ($k$-ANN) search on high-dimensional data, which serves as a fundamental function in various fields, e.g. information retrieval and retrieval-augmented generation~(RAG). Although PG-based approaches have the best $k$-ANN search performance, their index construction cost is superlinear to the number of points, since they have to identify close neighbors for each point to establish the edges. Such superlinear cost substantially limits their scalability in the era of big data. Hence, the goal of this paper is to accelerate the construction of PG-based methods without compromising their $k$-ANN search performance. To achieve this goal, two mainstream categories of PG are revisited: relative neighborhood graph (RNG) and navigable small world graph (NSWG). By revisiting their construction process, we find the issues of construction efficiency. To address these issues, we propose a new construction framework with a novel pruning strategy for edge selection, which accelerates RNG construction while keeping its $k$-ANN search performance. Then, we integrate this framework into NSWG construction to enhance both the construction efficiency and $k$-ANN search performance. Moreover, extensive experiments are conducted to validate our construction framework for both RNG and NSWG. The results demonstrate that it significantly reduces the PG construction cost, achieving up to 5.6x speedup while not compromising the $k$-ANN search performance. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2410.00201 [pdf]

DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Authors: Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Amanda Xin Yue Li, Jeffrey Bigham, Amy Pavel

Abstract: Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with ta… ▽ More Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types. △ Less

Submitted 30 September, 2024; originally announced October 2024.

Comments: ECCV 2024

arXiv:2409.19599 [pdf, other]

Gradient is All You Need: Gradient-Based Attention Fusion for Infrared Small Target Detection

Authors: Chen Hu, Yian Huang, Kexuan Li, Luping Zhang, Yiming Zhu, Yufei Peng, Tian Pu, Zhenming Peng

Abstract: Infrared small target detection (IRSTD) is widely used in civilian and military applications. However, IRSTD encounters several challenges, including the tendency for small and dim targets to be obscured by complex backgrounds. To address this issue, we propose the Gradient Network (GaNet), which aims to extract and preserve edge and gradient information of small targets. GaNet employs the Gradien… ▽ More Infrared small target detection (IRSTD) is widely used in civilian and military applications. However, IRSTD encounters several challenges, including the tendency for small and dim targets to be obscured by complex backgrounds. To address this issue, we propose the Gradient Network (GaNet), which aims to extract and preserve edge and gradient information of small targets. GaNet employs the Gradient Transformer (GradFormer) module, simulating central difference convolutions (CDC) to extract and integrate gradient features with deeper features. Furthermore, we propose a global feature extraction model (GFEM) that offers a comprehensive perspective to prevent the network from focusing solely on details while neglecting the background information. We compare the network with state-of-the-art (SOTA) approaches, and the results demonstrate that our method performs effectively. Our source code is available at https://github.com/greekinRoma/Gradient-Transformer. △ Less

Submitted 29 September, 2024; originally announced September 2024.

arXiv:2409.19256 [pdf, other]

doi 10.1145/3689031.3696075

HybridFlow: A Flexible and Efficient RLHF Framework

Authors: Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu

Abstract: Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs. RLHF complicates the dataflow by expanding each node into a distributed LLM training or generation program, and each edge into a… ▽ More Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs. RLHF complicates the dataflow by expanding each node into a distributed LLM training or generation program, and each edge into a many-to-many multicast. Traditional RL frameworks execute the dataflow using a single controller to instruct both intra-node computation and inter-node communication, which can be inefficient in RLHF due to large control dispatch overhead for distributed intra-node computation. Existing RLHF systems adopt a multi-controller paradigm, which can be inflexible due to nesting distributed computation and data communication. We propose HybridFlow, which combines single-controller and multi-controller paradigms in a hybrid manner to enable flexible representation and efficient execution of the RLHF dataflow. We carefully design a set of hierarchical APIs that decouple and encapsulate computation and data dependencies in the complex RLHF dataflow, allowing efficient operation orchestration to implement RLHF algorithms and flexible mapping of the computation onto various devices. We further design a 3D-HybridEngine for efficient actor model resharding between training and generation phases, with zero memory redundancy and significantly reduced communication overhead. Our experimental results demonstrate 1.53$\times$~20.57$\times$ throughput improvement when running various RLHF algorithms using HybridFlow, as compared with state-of-the-art baselines. HybridFlow source code will be available at https://github.com/volcengine/verl. △ Less

Submitted 2 October, 2024; v1 submitted 28 September, 2024; originally announced September 2024.

ACM Class: I.2

arXiv:2409.19199 [pdf, other]

Anisotropic multi-orbital Hubbard model simulated with impurity approximation

Authors: Yan Peng, Mi Jiang

Abstract: Motivated by the recent experimental findings on the orbital ordering of cuprate SC, we have investigated the multi-orbital Hubbard model in the framework of Cu impurity approximation embedded in the O lattice by incorporating the 3d$^{8}$ multiplet structure coupled to a full O-2p band.Our systematic investigation on the impact of anisotropy of various parameters reveal rich phenomena in terms of… ▽ More Motivated by the recent experimental findings on the orbital ordering of cuprate SC, we have investigated the multi-orbital Hubbard model in the framework of Cu impurity approximation embedded in the O lattice by incorporating the 3d$^{8}$ multiplet structure coupled to a full O-2p band.Our systematic investigation on the impact of anisotropy of various parameters reveal rich phenomena in terms of the ground state (GS) weight asymmetry between $\hat{x}$ and $\hat{y}$ directions.The numerical evidence demonstrate that the GS weight of Zhang-Rice singlet (ZRS) can be affected by the asymmetry of these parameters to distinct extent. Although the experimentally motivated asymmetric charge transfer energy only induces tiny weight difference, the asymmetric $d$-$p$ hybridization can result in considerable change of the weight. Besides, the nearest-neighbor $V_{pd}$ has much stronger impact than the local $U_{pp}$, which stems from the nature of ZRS consisting of nearest-neighbor two holes. Our systematic exploration provide valuable knowledge on the role of the artificial symmetry breaking on the two-hole GS nature and serves as the starting point of more sophisticated many-body simulations to uncover more interesting physics of multi-orbital Hubbard model within the symmetry breaking setup. △ Less

Submitted 27 September, 2024; originally announced September 2024.

Comments: 8 pages, 9 figures

arXiv:2409.17930 [pdf, other]

Codesigned counterdiabatic quantum optimization on a photonic quantum processor

Authors: Xiao-Wen Shang, Xuan Chen, Narendra N. Hegade, Ze-Feng Lan, Xuan-Kun Li, Hao Tang, Yu-Quan Peng, Enrique Solano, Xian-Min Jin

Abstract: Codesign, an integral part of computer architecture referring to the information interaction in hardware-software stack, is able to boost the algorithm mapping and execution in the computer hardware. This well applies to the noisy intermediate-scale quantum era, where quantum algorithms and quantum processors both need to be shaped to allow for advantages in experimental implementations. The state… ▽ More Codesign, an integral part of computer architecture referring to the information interaction in hardware-software stack, is able to boost the algorithm mapping and execution in the computer hardware. This well applies to the noisy intermediate-scale quantum era, where quantum algorithms and quantum processors both need to be shaped to allow for advantages in experimental implementations. The state-of-the-art quantum adiabatic optimization algorithm faces challenges for scaling up, where the deteriorating optimization performance is not necessarily alleviated by increasing the circuit depth given the noise in the hardware. The counterdiabatic term can be introduced to accelerate the convergence, but decomposing the unitary operator corresponding to the counterdiabatic terms into one and two-qubit gates may add additional burden to the digital circuit depth. In this work, we focus on the counterdiabatic protocol with a codesigned approach to implement this algorithm on a photonic quantum processor. The tunable Mach-Zehnder interferometer mesh provides rich programmable parameters for local and global manipulation, making it able to perform arbitrary unitary evolutions. Accordingly, we directly implement the unitary operation associated to the counterdiabatic quantum optimization on our processor without prior digitization. Furthermore, we develop and implement an optimized counterdiabatic method by tackling the higher-order many-body interaction terms. Moreover, we benchmark the performance in the case of factorization, by comparing the final success probability and the convergence speed. In conclusion, we experimentally demonstrate the advantages of a codesigned mapping of counterdiabatic quantum dynamics for quantum computing on photonic platforms. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: 9 pages, 4 figures. Comments are welcome

arXiv:2409.17484 [pdf, other]

Crafting Synthetic Realities: Examining Visual Realism and Misinformation Potential of Photorealistic AI-Generated Images

Authors: Qiyao Peng, Yingdan Lu, Yilang Peng, Sijia Qian, Xinyi Liu, Cuihua Shen

Abstract: Advances in generative models have created Artificial Intelligence-Generated Images (AIGIs) nearly indistinguishable from real photographs. Leveraging a large corpus of 30,824 AIGIs collected from Instagram and Twitter, and combining quantitative content analysis with qualitative analysis, this study unpacks AI photorealism of AIGIs from four key dimensions, content, human, aesthetic, and producti… ▽ More Advances in generative models have created Artificial Intelligence-Generated Images (AIGIs) nearly indistinguishable from real photographs. Leveraging a large corpus of 30,824 AIGIs collected from Instagram and Twitter, and combining quantitative content analysis with qualitative analysis, this study unpacks AI photorealism of AIGIs from four key dimensions, content, human, aesthetic, and production features. We find that photorealistic AIGIs often depict human figures, especially celebrities and politicians, with a high degree of surrealism and aesthetic professionalism, alongside a low degree of overt signals of AI production. This study is the first to empirically investigate photorealistic AIGIs across multiple platforms using a mixed-methods approach. Our findings provide important implications and insights for understanding visual misinformation and mitigating potential risks associated with photorealistic AIGIs. We also propose design recommendations to enhance the responsible use of AIGIs. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2409.17258 [pdf, other]

doi 10.3847/2041-8213/acd779

The kinematic bimodality: Efficient feedback and cold gas deficiency in slow-rotating galaxies

Authors: Bitao Wang, Yingjie Peng

Abstract: The bimodality in the stellar spin of low redshift (massive) galaxies, ubiquitously existing at all star formation levels and in diverse environment, suggests that galaxies grow and quench through two diverged evolutionary pathways. For spheroid-dominated galaxies of slow stellar rotation, the age composition and metallicity of their stellar populations evidence a fast quenching history with signi… ▽ More The bimodality in the stellar spin of low redshift (massive) galaxies, ubiquitously existing at all star formation levels and in diverse environment, suggests that galaxies grow and quench through two diverged evolutionary pathways. For spheroid-dominated galaxies of slow stellar rotation, the age composition and metallicity of their stellar populations evidence a fast quenching history with significant gas outflows. In this work, we measure the spin parameter $λ_{R_{\rm e}}$, i.e. the normalized specific angular momentum of stars, out of the MaNGA integral field spectroscopy for about 10000 galaxies. Among the two thirds with HI follow-up observations ($z\lesssim0.05$), we find that, compared to fast-rotating galaxies of the same stellar mass and star formation, the galaxy population of slower rotation are generally more HI gas-poor, robust against further environmental restriction and with non-detections taken into proper account using stacking technique. This cold gas deficit of slow-rotating galaxies is most apparent at high mass $\sim10^{11}\mathcal{M}_{\odot}$ below the star formation main sequence, supporting the pivotal role of gas outflows in their quenching history. With hints from HI velocity distributions, we suspect that massive gas outflows among the slow-rotating population are facilitated by high ejective feedback efficiency, which is a result of extensive coupling between disturbed volume-filling cold gas and (commonly) biconical feedback from central black holes. By contrast, in fast-rotating disc galaxies the feedback energy mostly goes to the hot circumgalactic medium rather than directly impacts the dense and planar cold gas, thus making the feedback mainly preventive against further gas inflow. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: Companion of the [Universal Kinematic Bimodality] paper: see fig.6 for a concluding schematic plot

Journal ref: 2023 ApJL 950 L22

arXiv:2409.17257 [pdf, other]

doi 10.1038/s41550-024-02376-8

Universal bimodality in kinematic morphology and the divergent pathways to galaxy quenching

Authors: Bitao Wang, Yingjie Peng, Michele Cappellari

Abstract: The hierarchical structure formation of our Universe inherently involves violent and chaotic episodes of mass assembly such as galaxy mergers. The level of bulk rotation of the collisionless stellar systems of galaxies reflects to what extent the galaxies, on the other hand, have assembled their stars during tranquil and ordered formation history, which fosters the growth of cohesively rotating st… ▽ More The hierarchical structure formation of our Universe inherently involves violent and chaotic episodes of mass assembly such as galaxy mergers. The level of bulk rotation of the collisionless stellar systems of galaxies reflects to what extent the galaxies, on the other hand, have assembled their stars during tranquil and ordered formation history, which fosters the growth of cohesively rotating structures. Observationally, galaxy populations show a wide spectrum of morphology and shapes, with different levels of rotational support. Despite the obvious variety and complexity, in this work we find that at a given stellar mass of galaxies, the distribution of the intrinsic spin parameter $λ_{R_{\rm e},\mathrm{intr}}$, i.e. the normalized specific angular momentum of stars, appears to be universally bimodal among galaxies in all star formation states and also in different environments. This ubiquitous bimodality in kinematic morphology evolves systematically with star formation and is particularly apparent for transitional galaxies of intermediate star formation rates, indicating that star formation quenching is proceeding separately within two distinct kinematic populations dominated by cold discs and hot spheroids. We show that the two populations also have contrasting recent star formation histories and metal enrichment histories, which reveal their divergent pathways to formation and quenching. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: Nature Astronomy in press. https://www.nature.com/articles/s41550-024-02376-8

arXiv:2409.16563 [pdf, other]

Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels

Authors: Yishu Wei, Xindi Wang, Hanley Ong, Yiliang Zhou, Adam Flanders, George Shih, Yifan Peng

Abstract: Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synth… ▽ More Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synthetic labels. Two tasks are jointly trained by combining their respective instruction datasets. When the quality of the task-specific synthetic labels is relatively high (e.g., generated by GPT4- o), Llama 3.1-8B achieves satisfactory performance on the open-ended disease detection task, with a micro F1 score of 0.91. Conversely, when the quality of the task-relevant synthetic labels is relatively low (e.g., from the MIMIC-CXR dataset), fine-tuned Llama 3.1-8B is able to surpass its noisy teacher labels (micro F1 score of 0.67 v.s. 0.63) when calibrated against curated labels, indicating the strong inherent underlying capability of the model. These findings demonstrate the potential of fine-tuning LLMs with synthetic labels, offering a promising direction for future research on LLM specialization in the medical domain. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.15044 [pdf, ps, other]

Search for $D^0\to K^-ηe^+ν_e$, $D^+\to K_S^0 ηe^+ν_e$ and $D^+\to ηηe^+ν_e$ decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 7.93 fb$^{-1}$, collected at the center-of-mass energy of 3.773 GeV with the BESIII detector, we search for the semileptonic decays $D^0\to K^-ηe^+ν_e$, $D^+\to K_S^0 ηe^+ν_e$ and $D^+\to ηηe^+ν_e$ for the first time. We present evidence for $D^0\to K^-ηe^+ν_e$ with a significance of $3.3σ$. The branching fraction… ▽ More By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 7.93 fb$^{-1}$, collected at the center-of-mass energy of 3.773 GeV with the BESIII detector, we search for the semileptonic decays $D^0\to K^-ηe^+ν_e$, $D^+\to K_S^0 ηe^+ν_e$ and $D^+\to ηηe^+ν_e$ for the first time. We present evidence for $D^0\to K^-ηe^+ν_e$ with a significance of $3.3σ$. The branching fraction of $D^0\to K^-ηe^+ν_e$ is measured to be $(0.84_{-0.34}^{+0.29}\pm0.22)\times 10^{-4}$. Here, the first uncertainties are statistical and the second ones are systematic. No significant signals are observed for the decays $D^+\to K_S^0 ηe^+ν_e$ and $D^+\to ηηe^+ν_e$ and we set the upper limits on their branching fractions. △ Less

Submitted 24 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

Comments: 10 pages,4 figures

arXiv:2409.14411 [pdf, other]

Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation

Authors: Minjie Zhu, Yichen Zhu, Jinming Li, Junjie Wen, Zhiyuan Xu, Ning Liu, Ran Cheng, Chaomin Shen, Yaxin Peng, Feifei Feng, Jian Tang

Abstract: Diffusion Policy is a powerful technique tool for learning end-to-end visuomotor robot control. It is expected that Diffusion Policy possesses scalability, a key attribute for deep neural networks, typically suggesting that increasing model size would lead to enhanced performance. However, our observations indicate that Diffusion Policy in transformer architecture (\DP) struggles to scale effectiv… ▽ More Diffusion Policy is a powerful technique tool for learning end-to-end visuomotor robot control. It is expected that Diffusion Policy possesses scalability, a key attribute for deep neural networks, typically suggesting that increasing model size would lead to enhanced performance. However, our observations indicate that Diffusion Policy in transformer architecture (\DP) struggles to scale effectively; even minor additions of layers can deteriorate training outcomes. To address this issue, we introduce Scalable Diffusion Transformer Policy for visuomotor learning. Our proposed method, namely \textbf{\methodname}, introduces two modules that improve the training dynamic of Diffusion Policy and allow the network to better handle multimodal action distribution. First, we identify that \DP~suffers from large gradient issues, making the optimization of Diffusion Policy unstable. To resolve this issue, we factorize the feature embedding of observation into multiple affine layers, and integrate it into the transformer blocks. Additionally, our utilize non-causal attention which allows the policy network to \enquote{see} future actions during prediction, helping to reduce compounding errors. We demonstrate that our proposed method successfully scales the Diffusion Policy from 10 million to 1 billion parameters. This new model, named \methodname, can effectively scale up the model size with improved performance and generalization. We benchmark \methodname~across 50 different tasks from MetaWorld and find that our largest \methodname~outperforms \DP~with an average improvement of 21.6\%. Across 7 real-world robot tasks, our ScaleDP demonstrates an average improvement of 36.25\% over DP-T on four single-arm tasks and 75\% on three bimanual tasks. We believe our work paves the way for scaling up models for visuomotor learning. The project page is available at scaling-diffusion-policy.github.io. △ Less

Submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.13700 [pdf, other]

MAS4POI: a Multi-Agents Collaboration System for Next POI Recommendation

Authors: Yuqian Wu, Yuhong Peng, Jiapeng Yu, Raymond S. T. Lee

Abstract: LLM-based Multi-Agent Systems have potential benefits of complex decision-making tasks management across various domains but their applications in the next Point-of-Interest (POI) recommendation remain underexplored. This paper proposes a novel MAS4POI system designed to enhance next POI recommendations through multi-agent interactions. MAS4POI supports Large Language Models (LLMs) specializing in… ▽ More LLM-based Multi-Agent Systems have potential benefits of complex decision-making tasks management across various domains but their applications in the next Point-of-Interest (POI) recommendation remain underexplored. This paper proposes a novel MAS4POI system designed to enhance next POI recommendations through multi-agent interactions. MAS4POI supports Large Language Models (LLMs) specializing in distinct agents such as DataAgent, Manager, Analyst, and Navigator with each contributes to a collaborative process of generating the next POI recommendations.The system is examined by integrating six distinct LLMs and evaluated by two real-world datasets for recommendation accuracy improvement in real-world scenarios. Our code is available at https://github.com/yuqian2003/MAS4POI. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 14 pages, 4 figures

arXiv:2409.13538 [pdf, other]

First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge

Authors: Yingzhe Peng, Yixiao Yuan, Zitian Ao, Huapeng Zhou, Kangqi Wang, Qipeng Zhu, Xu Yang

Abstract: In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided… ▽ More In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided training set. Additionally, we employed model ensemble strategies and Test Time Augmentation to boost performance. Through continuous optimization, our approach achieved a Top-1 Accuracy of 0.7647 on the leaderboard. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.13231 [pdf, other]

A survey of sulfur-bearing molecular lines toward the dense cores in eleven massive protoclusters

Authors: Mengyao Tang, Sheng-Li Qin, Tie Liu, Luis A. Zapata, Xunchuan Liu, Yaping Peng, Fengwei Xu, Chao Zhang, Ken'ichi Tatematsu

Abstract: Sulfur-bearing molecules are commonly detected in dense cores within star-forming regions, but the total sulfur budget is significantly low, when compared to the interstellar medium (ISM) value. The properties of sulfur-bearing molecules are not well understood due to the absence of large sample studies with uniform observational configurations. To deepen our understanding of this subject, we cond… ▽ More Sulfur-bearing molecules are commonly detected in dense cores within star-forming regions, but the total sulfur budget is significantly low, when compared to the interstellar medium (ISM) value. The properties of sulfur-bearing molecules are not well understood due to the absence of large sample studies with uniform observational configurations. To deepen our understanding of this subject, we conducted a study using ALMA 870 \micron~observations of 11 massive protoclusters. By checking the spectra of 248 dense cores in 11 massive protoclusters, a total of 10 sulfur-bearing species (CS, SO, \htcs, NS, \sot, \ttso, \tfsot, \ttsot, \seoo, \octfs) were identified. The parameters including systemic velocities, line widths, gas temperatures, column densities, and abundances were derived. Our results indicate that SO appears to be more easily detected in a wider range of physical environments than \htcs, despite these two species show similarities in gas distributions and abundances. \tfsot~and \htcs~are good tracers of the temperature of sulfur-bearing species, in which \htcs~traces the outer warm envelope and \tfsot~is associated with high-temperature central-regions. High-mass star-forming feedback (outflow and other non-thermal motions) significantly elevates the sulfur-bearing molecular abundances and detection rates specifically for \sot~and SO. A positive correlation between the \sot~abundance increasing factor ($F$) and temperatures suggests that \sot~could serve as a sulfur reservoir on the grain mantles of dense cores and then can be desorbed from dust to gas phase as the temperature rises. This work shows the importance of a large and unbiased survey to understand the sulfur depletion in dense cores. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.12514 [pdf, other]

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

Authors: Junjie Wen, Yichen Zhu, Jinming Li, Minjie Zhu, Kun Wu, Zhiyuan Xu, Ning Liu, Ran Cheng, Chaomin Shen, Yaxin Peng, Feifei Feng, Jian Tang

Abstract: Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor control and instruction comprehension through end-to-end learning processes. However, current VLA models face significant challenges: they are slow during inference and require extensive pre-training on large amounts of robotic data, making real-world deployment difficult. In this paper, we introduce a new family of… ▽ More Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor control and instruction comprehension through end-to-end learning processes. However, current VLA models face significant challenges: they are slow during inference and require extensive pre-training on large amounts of robotic data, making real-world deployment difficult. In this paper, we introduce a new family of compact vision-language-action models, called TinyVLA, which offers two key advantages over existing VLA models: (1) faster inference speeds, and (2) improved data efficiency, eliminating the need for pre-training stage. Our framework incorporates two essential components to build TinyVLA: (1) initializing the policy backbone with robust, high-speed multimodal models, and (2) integrating a diffusion policy decoder during fine-tuning to enable precise robot actions. We conducted extensive evaluations of TinyVLA in both simulation and on real robots, demonstrating that our approach significantly outperforms the state-of-the-art VLA model, OpenVLA, in terms of speed and data efficiency, while delivering comparable or superior performance. Additionally, TinyVLA exhibits strong generalization capabilities across various dimensions, including language instructions, novel objects, unseen positions, changes in object appearance, background variations, and environmental shifts, often matching or exceeding the performance of OpenVLA. We believe that \methodname offers an interesting perspective on utilizing pre-trained multimodal models for policy learning. Our project is at https://tiny-vla.github.io. △ Less

Submitted 27 September, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

Comments: add more citations

arXiv:2409.12370 [pdf, other]

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Authors: Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe

Abstract: Visual signals can enhance audiovisual speech recognition accuracy by providing additional contextual information. Given the complexity of visual signals, an audiovisual speech recognition model requires robust generalization capabilities across diverse video scenarios, presenting a significant challenge. In this paper, we introduce EVA, leveraging the mixture-of-Experts for audioVisual ASR to per… ▽ More Visual signals can enhance audiovisual speech recognition accuracy by providing additional contextual information. Given the complexity of visual signals, an audiovisual speech recognition model requires robust generalization capabilities across diverse video scenarios, presenting a significant challenge. In this paper, we introduce EVA, leveraging the mixture-of-Experts for audioVisual ASR to perform robust speech recognition for ``in-the-wild'' videos. Specifically, we first encode visual information into visual tokens sequence and map them into speech space by a lightweight projection. Then, we build EVA upon a robust pretrained speech recognition model, ensuring its generalization ability. Moreover, to incorporate visual information effectively, we inject visual information into the ASR model through a mixture-of-experts module. Experiments show our model achieves state-of-the-art results on three benchmarks, which demonstrates the generalization ability of EVA across diverse video domains. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: 6 pages, 2 figures, accepted by IEEE Spoken Language Technology Workshop 2024

arXiv:2409.09506 [pdf, other]

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Authors: Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe

Abstract: We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on various tasks and (ii) easy integration with popular deep neural network frameworks such as PyTorch-Lightning, Hugging Face transformers and datasets, a… ▽ More We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on various tasks and (ii) easy integration with popular deep neural network frameworks such as PyTorch-Lightning, Hugging Face transformers and datasets, and Lhotse. By replacing ESPnet design choices inherited from Kaldi with a Python-only, Bash-free interface, we dramatically reduce the effort required to build, debug, and use a new model. For example, to fine-tune a speech foundation model, ESPnet-EZ, compared to ESPnet, reduces the number of newly written code by 2.7x and the amount of dependent code by 6.7x while dramatically reducing the Bash script dependencies. The codebase of ESPnet-EZ is publicly available. △ Less

Submitted 14 September, 2024; originally announced September 2024.

Comments: Accepted to SLT 2024

arXiv:2409.09216 [pdf, other]

Spectral U-Net: Enhancing Medical Image Segmentation via Spectral Decomposition

Authors: Yaopeng Peng, Milan Sonka, Danny Z. Chen

Abstract: This paper introduces Spectral U-Net, a novel deep learning network based on spectral decomposition, by exploiting Dual Tree Complex Wavelet Transform (DTCWT) for down-sampling and inverse Dual Tree Complex Wavelet Transform (iDTCWT) for up-sampling. We devise the corresponding Wave-Block and iWave-Block, integrated into the U-Net architecture, aiming at mitigating information loss during down-sam… ▽ More This paper introduces Spectral U-Net, a novel deep learning network based on spectral decomposition, by exploiting Dual Tree Complex Wavelet Transform (DTCWT) for down-sampling and inverse Dual Tree Complex Wavelet Transform (iDTCWT) for up-sampling. We devise the corresponding Wave-Block and iWave-Block, integrated into the U-Net architecture, aiming at mitigating information loss during down-sampling and enhancing detail reconstruction during up-sampling. In the encoder, we first decompose the feature map into high and low-frequency components using DTCWT, enabling down-sampling while mitigating information loss. In the decoder, we utilize iDTCWT to reconstruct higher-resolution feature maps from down-sampled features. Evaluations on the Retina Fluid, Brain Tumor, and Liver Tumor segmentation datasets with the nnU-Net framework demonstrate the superiority of the proposed Spectral U-Net. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.09188 [pdf, other]

FiAt-Net: Detecting Fibroatheroma Plaque Cap in 3D Intravascular OCT Images

Authors: Yaopeng Peng, Zhi Chen, Andreas Wahle, Tomas Kovarnik, Milan Sonk, Danny Z. Chen

Abstract: The key manifestation of coronary artery disease (CAD) is development of fibroatheromatous plaque, the cap of which may rupture and subsequently lead to coronary artery blocking and heart attack. As such, quantitative analysis of coronary plaque, its plaque cap, and consequently the cap's likelihood to rupture are of critical importance when assessing a risk of cardiovascular events. This paper re… ▽ More The key manifestation of coronary artery disease (CAD) is development of fibroatheromatous plaque, the cap of which may rupture and subsequently lead to coronary artery blocking and heart attack. As such, quantitative analysis of coronary plaque, its plaque cap, and consequently the cap's likelihood to rupture are of critical importance when assessing a risk of cardiovascular events. This paper reports a new deep learning based approach, called FiAt-Net, for detecting angular extent of fibroatheroma (FA) and segmenting its cap in 3D intravascular optical coherence tomography (IVOCT) images. IVOCT 2D image frames are first associated with distinct clusters and data from each cluster are used for model training. As plaque is typically focal and thus unevenly distributed, a binary partitioning method is employed to identify FA plaque areas to focus on to mitigate the data imbalance issue. Additional image representations (called auxiliary images) are generated to capture IVOCT intensity changes to help distinguish FA and non-FA areas on the coronary wall. Information in varying scales is derived from the original IVOCT and auxiliary images, and a multi-head self-attention mechanism is employed to fuse such information. Our FiAt-Net achieved high performance on a 3D IVOCT coronary image dataset, demonstrating its effectiveness in accurately detecting FA cap in IVOCT images. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.07197 [pdf, other]

Measurements of the $CP$-even fractions of $D^0\toπ^{+}π^{-}π^{0}$ and $D^0\to K^{+}K^{-}π^{0}$ at BESIII

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (648 additional authors not shown)

Abstract: The $CP$-even fractions ($F_{+}$) of the decays $D^0\toπ^{+}π^{-}π^{0}$ and $D^0\to K^{+}K^{-}π^{0}$ are measured with a quantum-correlated $ψ(3770)\to D\bar{D}$ data sample collected by the BESIII experiment corresponding to an integrated luminosity of 7.93 $\mathrm{fb}^{-1}$. The results are $F_{+}^{π^{+}π^{-}π^{0}}=0.9406\pm0.0036\pm0.0021$ and $F_{+}^{K^{+}K^{-}π^{0}}=0.631\pm0.014\pm0.011$, w… ▽ More The $CP$-even fractions ($F_{+}$) of the decays $D^0\toπ^{+}π^{-}π^{0}$ and $D^0\to K^{+}K^{-}π^{0}$ are measured with a quantum-correlated $ψ(3770)\to D\bar{D}$ data sample collected by the BESIII experiment corresponding to an integrated luminosity of 7.93 $\mathrm{fb}^{-1}$. The results are $F_{+}^{π^{+}π^{-}π^{0}}=0.9406\pm0.0036\pm0.0021$ and $F_{+}^{K^{+}K^{-}π^{0}}=0.631\pm0.014\pm0.011$, where the first uncertainties are statistical and the second systematic. These measurements are consistent with the previous determinations, and the uncertainties for $F_{+}^{π^{+}π^{-}π^{0}}$ and $F_{+}^{K^{+}K^{-}π^{0}}$ are reduced by factors of 3.9 and 2.6, respectively. The reported results provide important inputs for the precise measurement of the angle $γ$ of the Cabibbo-Kobayashi-Maskawa matrix and indirect $CP$ violation in charm mixing. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: 19 pages, 8 figures

arXiv:2409.06981 [pdf]

Robust Square Root Unscented Kalman filter of graph signals

Authors: Jinhui Hu, Haiquan Zhao, Yi Peng

Abstract: Considering the problem of nonlinear and non-gaussian filtering of the graph signal, in this paper, a robust square root unscented Kalman filter based on graph signal processing is proposed. The algorithm uses a graph topology to generate measurements and an unscented transformation is used to obtain the priori state estimates. In addition, in order to enhance the numerical stability of the unscen… ▽ More Considering the problem of nonlinear and non-gaussian filtering of the graph signal, in this paper, a robust square root unscented Kalman filter based on graph signal processing is proposed. The algorithm uses a graph topology to generate measurements and an unscented transformation is used to obtain the priori state estimates. In addition, in order to enhance the numerical stability of the unscented Kalman filter, the algorithm combines the double square root decomposition method to update the covariance matrix in the graph frequency domain. Furthermore, to handle the non-Gaussian noise problem in the state estimation process, an error augmentation model is constructed in the graph frequency domain by unifying the measurement error and state error, which utilizes the Laplace matrix of the graph to effectively reduce the cumulative error at each vertex. Then the general robust cost function is adopted as the optimal criterion to deal with the error, which has more parameter options so that effectively suppresses the problems of random outliers and abnormal measurement values in the state estimation process. Finally, the convergence of the error of the proposed algorithm is firstly verified theoretically, and then the robustness of the proposed algorithm is verified by experimental simulation. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 8-page

MSC Class: 93-05; 93-08 ACM Class: I.6; K.m

arXiv:2409.05522 [pdf, other]

Design and Implementation of TAO DAQ System

Authors: Shuihan Zhang, Chao Chen, Xiaolu Ji, Fei Li, Yu Peng, Fabrizio Petrucci, Yinhui Wu, Zezhong Yu, Tingxuan Zeng, Kejun Zhu

Abstract: Purpose: The Taishan Antineutrino Observatory (TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO), also known as JUNO-TAO. Located close to one of the reactors of the Taishan Nuclear Power Plant, TAO will measure the antineutrino energy spectrum precisely as a reference spectrum for JUNO. The data acquisition (DAQ) system is designed to acquire data from the TAO… ▽ More Purpose: The Taishan Antineutrino Observatory (TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO), also known as JUNO-TAO. Located close to one of the reactors of the Taishan Nuclear Power Plant, TAO will measure the antineutrino energy spectrum precisely as a reference spectrum for JUNO. The data acquisition (DAQ) system is designed to acquire data from the TAO readout electronics and process it with software trigger and data compression algorithms. The data storage bandwidth is limited by the onsite network to be less than 100 Mb/s. Methods: The system is designed based on a distributed architecture, with fully decoupled modules to facilitate customized design and implementation. It is divided into two main components: the data flow system and the online software. The online software serves as the foundation, providing the electronics configuration, the process management, the run control, and the information sharing. The data flow system facilitates continuous data acquisition from various electronic boards or trigger systems, assembles and processes raw data, and ultimately stores it on the disk. Results: The core functionality of the system has been designed and developed. The usability of the data flow system interface and the software trigger results have been verified during the pre-installation testing phase. Conclusion: The DAQ system has been deployed for the TAO experiment. It has also successfully been applied to the integration test of the detector and electronics prototypes. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.04276 [pdf, ps, other]

Study of the decay $D^0\rightarrow ρ(770)^-e^+ν_e$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (646 additional authors not shown)

Abstract: We present a study of the semileptonic decay $D^0\rightarrow π^-π^0e^{+}ν_{e}$ using an $e^+e^-$ annihilation data sample of $7.93~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The branching fraction of $D^0\to ρ(770)^-e^+ν_e$ is measured to be $(1.439 \pm 0.033(\rm stat.) \pm 0.027(\rm syst.)) \times10^{-3}$, which is a factor 1.6 more precise tha… ▽ More We present a study of the semileptonic decay $D^0\rightarrow π^-π^0e^{+}ν_{e}$ using an $e^+e^-$ annihilation data sample of $7.93~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The branching fraction of $D^0\to ρ(770)^-e^+ν_e$ is measured to be $(1.439 \pm 0.033(\rm stat.) \pm 0.027(\rm syst.)) \times10^{-3}$, which is a factor 1.6 more precise than previous measurements. By performing an amplitude analysis, we measure the hadronic form-factor ratios of $D^0\to ρ(770)^-e^+ν_e$ at $q^2=0$ assuming the single-pole-dominance parametrization: $r_{V}=V(0)/A_1(0)=1.548\pm0.079(\rm stat.)\pm0.041(\rm syst.)$ and $r_{2}=A_2(0)/A_1(0)=0.823\pm0.056(\rm stat.)\pm0.026(\rm syst.)$. △ Less

Submitted 6 September, 2024; originally announced September 2024.

Comments: 12 pages, 3 figures

arXiv:2409.04056 [pdf, other]

doi 10.1145/3627673.3679156

Refining Wikidata Taxonomy using Large Language Models

Authors: Yiwen Peng, Thomas Bonald, Mehwish Alam

Abstract: Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version… ▽ More Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC. △ Less

Submitted 6 September, 2024; originally announced September 2024.

Comments: ACM International Conference on Information and Knowledge Management, Oct 2024, Boise, Idaho, United States

arXiv:2409.03407 [pdf, ps, other]

Tight minimum degree condition to guarantee $C_{2k+1}$-free graphs to be $r$-partite

Authors: Zilong Yan, Yuejian Peng, Xiaoli Yuan

Abstract: Erdős and Simonovits asked the following question: For an integer $c\geq 2$ and a family of graphs $\mathcal{F}$, what is the infimum of $α$ such that any $\mathcal{F}$-free $n$-vertex graph with minimum degree at least $αn$ has chromatic number at most $c$? Denote the infimum as $δ_χ(\mathcal{F}, c)$. A fundamental result of Erdős, Stone and Simonovits \cite{ES46, ES66} implies that for any… ▽ More Erdős and Simonovits asked the following question: For an integer $c\geq 2$ and a family of graphs $\mathcal{F}$, what is the infimum of $α$ such that any $\mathcal{F}$-free $n$-vertex graph with minimum degree at least $αn$ has chromatic number at most $c$? Denote the infimum as $δ_χ(\mathcal{F}, c)$. A fundamental result of Erdős, Stone and Simonovits \cite{ES46, ES66} implies that for any $c\le r-1$, $δ_χ(\mathcal{F}, c)=1-{1 \over r}$, where $r+1=χ(\mathcal{F})=\min\{χ(F): F\in \mathcal{F}\}$. So the remaining challenge is to determine $δ_χ(\mathcal{F}, c)$ for $c\ge χ(\mathcal{F})-1$. Most previous known results are under the condition $c= χ(\mathcal{F})-1$. When $c\ge χ(\mathcal{F})$, the only known exact results are $δ_χ(K_3, 3)$ by Häggkvist and Jin, and $δ_χ(K_3, c)$ for every $c\ge4$ by Brandt and Thomassé, $δ_χ(K_r, r)$ and $δ_χ(K_r, r+1)$ by Nikiforov. In this paper, we focus on odd cycles. Combining results of Thomassen and Ma, we have $Ω\bigg((c+1)^{-8(k+1)}\bigg)=δ_χ(C_{2k+1}, c)=O(\frac{k}{c})$ for $c\ge 3$. In this paper, we determine $δ_χ(C_{2k+1}, c)$ for all $c\ge 2$ and $k\ge 3c+4$ ($k\ge 5$ if $c=2$). We also obtain the following corollary. If $G$ is a non-$c$-partite graph on $n$ vertices with $c\ge 3$ and $δ(G)> {n \over 2c+2}$, then $C_{2k+1} \subset G$ for all $k\in [3c+4, {n \over 108(c+1)^c}]$. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: arXiv admin note: text overlap with arXiv:2408.15487

arXiv:2409.03168 [pdf, other]

The HI reservoir in central spiral galaxies and the implied star formation process

Authors: Jing Dou, Yingjie Peng, Qiusheng Gu, Alvio Renzini, Luis C. Ho, Filippo Mannucci, Emanuele Daddi, Chengpeng Zhang, Jiaxuan Li, Yong Shi, Tao Wang, Dingyi Zhao, Cheqiu Lyu, Di Li, Feng Yuan, Roberto Maiolino, Yulong Gao

Abstract: The cold interstellar medium (ISM) as the raw material for star formation is critical to understanding galaxy evolution. It is generally understood that galaxies stop making stars when, in one way or another, they run out of gas. However, here we provide evidence that central spiral galaxies remain rich in atomic gas even if their star formation rate and molecular gas fraction have dropped signifi… ▽ More The cold interstellar medium (ISM) as the raw material for star formation is critical to understanding galaxy evolution. It is generally understood that galaxies stop making stars when, in one way or another, they run out of gas. However, here we provide evidence that central spiral galaxies remain rich in atomic gas even if their star formation rate and molecular gas fraction have dropped significantly compared to "normal" star-forming galaxies of the same mass. Since HI is sensitive to external processes, here we investigate central spiral galaxies using a combined sample from SDSS, ALFALFA, and xGASS surveys. After proper incompleteness corrections, we find that the key HI scaling relations for central spirals show significant but regular systematic dependence on stellar mass. At any given stellar mass, the HI gas mass fraction is about constant with changing specific star formation rate (sSFR), which suggests that HI reservoir is ubiquitous in central spirals with any star formation status down to M* ~ 10^9 Msun. Together with the tight correlation between the molecular gas mass fraction and sSFR for galaxies across a wide range of different properties, it suggests that the decline of SFR of all central spirals in the local universe is due to the halt of H2 supply, though there is plenty of HI gas around. These hence provide critical observations of the dramatically different behavior of the cold multi-phase ISM, and a key to understand the star formation process and quenching mechanism. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 18 pages, 7 figures; Accepted for publication in the ApJL; This is the fourth paper in the "From Haloes to Galaxies" series

arXiv:2409.03121 [pdf, other]

QHDOPT: A Software for Nonlinear Optimization with Quantum Hamiltonian Descent

Authors: Samuel Kushnir, Jiaqi Leng, Yuxiang Peng, Lei Fan, Xiaodi Wu

Abstract: We develop an open-source, end-to-end software (named QHDOPT), which can solve nonlinear optimization problems using the quantum Hamiltonian descent (QHD) algorithm. QHDOPT offers an accessible interface and automatically maps tasks to various supported quantum backends (i.e., quantum hardware machines). These features enable users, even those without prior knowledge or experience in quantum compu… ▽ More We develop an open-source, end-to-end software (named QHDOPT), which can solve nonlinear optimization problems using the quantum Hamiltonian descent (QHD) algorithm. QHDOPT offers an accessible interface and automatically maps tasks to various supported quantum backends (i.e., quantum hardware machines). These features enable users, even those without prior knowledge or experience in quantum computing, to utilize the power of existing quantum devices for nonlinear and nonconvex optimization tasks. In its intermediate compilation layer, QHDOPT employs SimuQ, an efficient interface for Hamiltonian-oriented programming, to facilitate multiple algorithmic specifications and ensure compatible cross-hardware deployment. The detailed documentation of QHDOPT is available at https://github.com/jiaqileng/QHDOPT. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 23 pages, 7 figures. The full repository is available at https://github.com/jiaqileng/QHDOPT

arXiv:2409.02578 [pdf, other]

Search for the massless dark photon with $D^0\toωγ'$ and $D^0\toγγ'$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (648 additional authors not shown)

Abstract: Using $7.9~\rm{fb^{-1}}$ of $e^+e^-$ collision data collected at $\sqrt{s}=3.773$ GeV with the BESIII detector at the BEPCII collider, we search for the massless dark photon with the flavor-changing neutral current processes $D^0\toωγ'$ and $D^0\toγγ'$ for the first time. No significant signals are observed, and the upper limits at the 90% confidence level on the massless dark photon branching fra… ▽ More Using $7.9~\rm{fb^{-1}}$ of $e^+e^-$ collision data collected at $\sqrt{s}=3.773$ GeV with the BESIII detector at the BEPCII collider, we search for the massless dark photon with the flavor-changing neutral current processes $D^0\toωγ'$ and $D^0\toγγ'$ for the first time. No significant signals are observed, and the upper limits at the 90% confidence level on the massless dark photon branching fraction are set to be $1.1\times10^{-5}$ and $2.0\times10^{-6}$ for $D^0\toωγ'$ and $D^0\toγγ'$, respectively. These results provide the most stringent constraint on the new physics energy scale associated with $cuγ'$ coupling in the world, with the new physics energy scale related parameter $|\mathbb{C}|^2+|\mathbb{C}_5|^2<8.2\times10^{-17}~\rm{GeV}^{-2}$ at the 90% confidence level. △ Less

Submitted 14 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

Comments: 10 pages, 3 figures

arXiv:2409.01704 [pdf, other]

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Authors: Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang

Abstract: Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with… ▽ More Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT, with 580M parameters, is a unified, elegant, and end-to-end model, consisting of a high-compression encoder and a long-contexts decoder. As an OCR-2.0 model, GOT can handle all the above "characters" under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors. Furthermore, we also adapt dynamic resolution and multi-page OCR technologies to GOT for better practicality. In experiments, we provide sufficient results to prove the superiority of our model. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.01419 [pdf, other]

Study of $D^{+} \to K_{S}^{0}K^{*}(892)^{+}$ in $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: Using a data sample of $e^+e^-$ collisions corresponding to an integrated luminosity of 7.93 $\rm fb^{-1}$ collected with the BESIII detector at the center-of-mass energy 3.773~GeV, we perform the first amplitude analysis of the decay $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$. The absolute branching fraction of $D^{+} \to K_{S}^{0}K_{S}^{0} π^{+}$ is measured to be… ▽ More Using a data sample of $e^+e^-$ collisions corresponding to an integrated luminosity of 7.93 $\rm fb^{-1}$ collected with the BESIII detector at the center-of-mass energy 3.773~GeV, we perform the first amplitude analysis of the decay $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$. The absolute branching fraction of $D^{+} \to K_{S}^{0}K_{S}^{0} π^{+}$ is measured to be $(2.97 \pm 0.09_{\rm stat.} \pm 0.05_{\rm syst.})\times10^{-3}$. The dominant intermediate process is $D^{+} \to K_{S}^{0}K^{*}(892)^{+}$, whose branching fraction is determined to be $(8.72 \pm 0.28_{\rm stat.} \pm 0.15_{\rm syst.}) \times 10^{-3}$, including all the $K^*(892)^+$ decays. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Showing 1–50 of 1,528 results for author: Peng, Y