subscribe to arXiv mailings

Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding

Abstract: This paper tackles the challenging task of 3D visual grounding-locating a specific object in a 3D point cloud scene based on text descriptions. Existing methods fall into two categories: top-down and bottom-up methods. Top-down methods rely on a pre-trained 3D detector to generate and select the best bounding box, resulting in time-consuming processes. Bottom-up methods directly regress object bou… ▽ More This paper tackles the challenging task of 3D visual grounding-locating a specific object in a 3D point cloud scene based on text descriptions. Existing methods fall into two categories: top-down and bottom-up methods. Top-down methods rely on a pre-trained 3D detector to generate and select the best bounding box, resulting in time-consuming processes. Bottom-up methods directly regress object bounding boxes with coarse-grained features, producing worse results. To combine their strengths while addressing their limitations, we propose a joint top-down and bottom-up framework, aiming to enhance the performance while improving the efficiency. Specifically, in the first stage, we propose a bottom-up based proposal generation module, which utilizes lightweight neural layers to efficiently regress and cluster several coarse object proposals instead of using a complex 3D detector. Then, in the second stage, we introduce a top-down based proposal consolidation module, which utilizes graph design to effectively aggregate and propagate the query-related object contexts among the generated proposals for further refinement. By jointly training these two modules, we can avoid the inherent drawbacks of the complex proposals in the top-down framework and the coarse proposals in the bottom-up framework. Experimental results on the ScanRefer benchmark show that our framework is able to achieve the state-of-the-art performance. △ Less

Submitted 20 October, 2024; originally announced October 2024.

Comments: Accepted by ICPR2024

arXiv:2410.14228 [pdf, other]

Towards High-Speed Passive Visible Light Communication with Event Cameras and Digital Micro-Mirrors

Authors: Yanxiang Wang, Yiran Shen, Kenuo Xu, Guangrong Zhao, Mahbub Hassan, Chenren Xu, Wen Hu

Abstract: Passive visible light communication (VLC) modulates light propagation or reflection to transmit data without directly modulating the light source. Thus, passive VLC provides an alternative to conventional VLC, enabling communication where the light source cannot be directly controlled. There have been ongoing efforts to explore new methods and devices for modulating light propagation or reflection… ▽ More Passive visible light communication (VLC) modulates light propagation or reflection to transmit data without directly modulating the light source. Thus, passive VLC provides an alternative to conventional VLC, enabling communication where the light source cannot be directly controlled. There have been ongoing efforts to explore new methods and devices for modulating light propagation or reflection. The state-of-the-art has broken the 100 kbps data rate barrier for passive VLC by using a digital micro-mirror device (DMD) as the light modulating platform, or transmitter, and a photo-diode as the receiver. We significantly extend this work by proposing a massive spatial data channel framework for DMDs, where individual channels can be decoded in parallel using an event camera at the receiver. For the event camera, we introduce event processing algorithms to detect numerous channels and decode bits from individual channels with high reliability. Our prototype, built with off-the-shelf event cameras and DMDs, can decode up to $\sim$2,000 parallel channels, achieving a data transmission rate of 1.6 Mbps, markedly surpassing current benchmarks by 16x. △ Less

Submitted 21 October, 2024; v1 submitted 18 October, 2024; originally announced October 2024.

Comments: 14 pages, 21 figures, nonacm

arXiv:2410.13748 [pdf, other]

Test of lepton flavour universality with $B_s^0 \rightarrow φ\ell^+\ell^-$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1124 additional authors not shown)

Abstract: Lepton flavour universality in rare $b\rightarrow s$ transitions is tested for the first time using $B_s^0$ meson decays. The measurements are performed using $pp$ collision data collected by the LHCb experiment between 2011 and 2018, corresponding to a total integrated luminosity of 9$\,{\rm fb}^{-1}$. Branching fraction ratios between the $B_s^0 \rightarrow φe^+e^-$ and… ▽ More Lepton flavour universality in rare $b\rightarrow s$ transitions is tested for the first time using $B_s^0$ meson decays. The measurements are performed using $pp$ collision data collected by the LHCb experiment between 2011 and 2018, corresponding to a total integrated luminosity of 9$\,{\rm fb}^{-1}$. Branching fraction ratios between the $B_s^0 \rightarrow φe^+e^-$ and $B_s^0 \rightarrow φμ^+μ^-$ decays are measured in three regions of dilepton mass squared, $q^2$, with $0.1 < q^2 < 1.1$, $1.1 < q^2 < 6.0$, and $15 < q^2 < 19\,{\rm GeV}^2/c^4$. The results agree with the Standard Model expectation of lepton flavour universality. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3513/ (LHCb public pages)

Report number: LHCb-PAPER-2024-032, CERN-EP-2024-255

arXiv:2410.13597 [pdf, other]

Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model

Authors: Yida Xiong, Kun Li, Weiwei Liu, Jia Wu, Bo Du, Shirui Pan, Wenbin Hu

Abstract: Molecular optimization (MO) is a crucial stage in drug discovery in which task-oriented generated molecules are optimized to meet practical industrial requirements. Existing mainstream MO approaches primarily utilize external property predictors to guide iterative property optimization. However, learning all molecular samples in the vast chemical space is unrealistic for predictors. As a result, e… ▽ More Molecular optimization (MO) is a crucial stage in drug discovery in which task-oriented generated molecules are optimized to meet practical industrial requirements. Existing mainstream MO approaches primarily utilize external property predictors to guide iterative property optimization. However, learning all molecular samples in the vast chemical space is unrealistic for predictors. As a result, errors and noise are inevitably introduced during property prediction due to the nature of approximation. This leads to discrepancy accumulation, generalization reduction and suboptimal molecular candidates. In this paper, we propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM). TransDLM leverages standardized chemical nomenclature as semantic representations of molecules and implicitly embeds property requirements into textual descriptions, thereby preventing error propagation during diffusion process. Guided by physically and chemically detailed textual descriptions, TransDLM samples and optimizes encoded source molecules, retaining core scaffolds of source molecules and ensuring structural similarities. Moreover, TransDLM enables simultaneous sampling of multiple molecules, making it ideal for scalable, efficient large-scale optimization through distributed computation on web platforms. Furthermore, our approach surpasses state-of-the-art methods in optimizing molecular structural similarity and enhancing chemical properties on the benchmark dataset. The code is available at: https://anonymous.4open.science/r/TransDLM-A901. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.12288 [pdf, other]

A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning

Authors: Yuanning Cui, Zequn Sun, Wei Hu

Abstract: Extensive knowledge graphs (KGs) have been constructed to facilitate knowledge-driven tasks across various scenarios. However, existing work usually develops separate reasoning models for different KGs, lacking the ability to generalize and transfer knowledge across diverse KGs and reasoning settings. In this paper, we propose a prompt-based KG foundation model via in-context learning, namely KG-I… ▽ More Extensive knowledge graphs (KGs) have been constructed to facilitate knowledge-driven tasks across various scenarios. However, existing work usually develops separate reasoning models for different KGs, lacking the ability to generalize and transfer knowledge across diverse KGs and reasoning settings. In this paper, we propose a prompt-based KG foundation model via in-context learning, namely KG-ICL, to achieve a universal reasoning ability. Specifically, we introduce a prompt graph centered with a query-related example fact as context to understand the query relation. To encode prompt graphs with the generalization ability to unseen entities and relations in queries, we first propose a unified tokenizer that maps entities and relations in prompt graphs to predefined tokens. Then, we propose two message passing neural networks to perform prompt encoding and KG reasoning, respectively. We conduct evaluation on 43 different KGs in both transductive and inductive settings. Results indicate that the proposed KG-ICL outperforms baselines on most datasets, showcasing its outstanding generalization and universal reasoning capabilities. The source code is accessible on GitHub: https://github.com/nju-websoft/KG-ICL. △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: Accepted in the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

arXiv:2410.08530 [pdf, other]

Ego3DT: Tracking Every 3D Object in Ego-centric Videos

Authors: Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang

Abstract: The growing interest in embodied intelligence has brought ego-centric perspectives to contemporary research. One significant challenge within this realm is the accurate localization and tracking of objects in ego-centric videos, primarily due to the substantial variability in viewing angles. Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and track… ▽ More The growing interest in embodied intelligence has brought ego-centric perspectives to contemporary research. One significant challenge within this realm is the accurate localization and tracking of objects in ego-centric videos, primarily due to the substantial variability in viewing angles. Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video. We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment. Utilizing information from adjacent video frames, Ego3DT dynamically constructs a 3D scene of the ego view using a pre-trained 3D scene reconstruction model. Additionally, we have innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos. Moreover, the efficacy of our approach is corroborated by extensive experiments on two newly compiled datasets, with 1.04x - 2.90x in HOTA, showcasing the robustness and accuracy of our method in diverse ego-centric scenarios. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: Accepted by ACM Multimedia 2024

arXiv:2410.08309 [pdf, other]

Dynamics of Concept Learning and Compositional Generalization

Authors: Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, Hidenori Tanaka

Abstract: Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalization to entirely novel, out-of-distribution compositions. Beyond performance evaluations, these studies develop a rich empirical phenomenology of learning dynamics, showing that models generalize sequentially, res… ▽ More Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalization to entirely novel, out-of-distribution compositions. Beyond performance evaluations, these studies develop a rich empirical phenomenology of learning dynamics, showing that models generalize sequentially, respecting the compositional hierarchy of the data-generating process. Moreover, concept-centric structures within the data significantly influence a model's speed of learning the ability to manipulate a concept. In this paper, we aim to better characterize these empirical results from a theoretical standpoint. Specifically, we propose an abstraction of prior work's compositional generalization problem by introducing a structured identity mapping (SIM) task, where a model is trained to learn the identity mapping on a Gaussian mixture with structurally organized centroids. We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM's learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work. Our theory also offers several new insights -- e.g., we find a novel mechanism for non-monotonic learning dynamics of test loss in early phases of training. We validate our new predictions by training a text-conditioned diffusion model, bridging our simplified framework and complex generative models. Overall, this work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.08182 [pdf, other]

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

Authors: Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou, Mohsen Fayyaz, Pan Lu, Kai-Wei Chang, Nanyun Peng

Abstract: Existing multimodal retrieval benchmarks primarily focus on evaluating whether models can retrieve and utilize external textual knowledge for question answering. However, there are scenarios where retrieving visual information is either more beneficial or easier to access than textual data. In this paper, we introduce a multimodal retrieval-augmented generation benchmark, MRAG-Bench, in which we s… ▽ More Existing multimodal retrieval benchmarks primarily focus on evaluating whether models can retrieve and utilize external textual knowledge for question answering. However, there are scenarios where retrieving visual information is either more beneficial or easier to access than textual data. In this paper, we introduce a multimodal retrieval-augmented generation benchmark, MRAG-Bench, in which we systematically identify and categorize scenarios where visually augmented knowledge is better than textual knowledge, for instance, more images from varying viewpoints. MRAG-Bench consists of 16,130 images and 1,353 human-annotated multiple-choice questions across 9 distinct scenarios. With MRAG-Bench, we conduct an evaluation of 10 open-source and 4 proprietary large vision-language models (LVLMs). Our results show that all LVLMs exhibit greater improvements when augmented with images compared to textual knowledge, confirming that MRAG-Bench is vision-centric. Additionally, we conduct extensive analysis with MRAG-Bench, which offers valuable insights into retrieval-augmented LVLMs. Notably, the top-performing model, GPT-4o, faces challenges in effectively leveraging retrieved knowledge, achieving only a 5.82% improvement with ground-truth information, in contrast to a 33.16% improvement observed in human participants. These findings highlight the importance of MRAG-Bench in encouraging the community to enhance LVLMs' ability to utilize retrieved visual knowledge more effectively. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: https://mragbench.github.io

arXiv:2410.07746 [pdf, other]

Benign Overfitting in Single-Head Attention

Authors: Roey Magen, Shuning Shang, Zhiwei Xu, Spencer Frei, Wei Hu, Gal Vardi

Abstract: The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers… ▽ More The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers. We prove that under appropriate conditions, the model exhibits benign overfitting in a classification setting already after two steps of gradient descent. Moreover, we show conditions where a minimum-norm/maximum-margin interpolator exhibits benign overfitting. We study how the overfitting behavior depends on the signal-to-noise ratio (SNR) of the data distribution, namely, the ratio between norms of signal and noise tokens, and prove that a sufficiently large SNR is both necessary and sufficient for benign overfitting. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.03680 [pdf, other]

Leafeon: Towards Accurate, Robust and Low-cost Leaf Water Content Sensing Using mmWave Radar

Authors: Mark Cardamis, Hong Jia, Hao Qian, Wenyao Chen, Yihe Yan, Oula Ghannoum, Aaron Quigley, Chung Tung Chou, Wen Hu

Abstract: Plant sensing plays an important role in modern smart agriculture and the farming industry. Remote radio sensing allows for monitoring essential indicators of plant health, such as leaf water content. While recent studies have shown the potential of using millimeter-wave (mmWave) radar for plant sensing, many overlook crucial factors such as leaf structure and surface roughness, which can impact t… ▽ More Plant sensing plays an important role in modern smart agriculture and the farming industry. Remote radio sensing allows for monitoring essential indicators of plant health, such as leaf water content. While recent studies have shown the potential of using millimeter-wave (mmWave) radar for plant sensing, many overlook crucial factors such as leaf structure and surface roughness, which can impact the accuracy of the measurements. In this paper, we introduce Leafeon, which leverages mmWave radar to measure leaf water content non-invasively. Utilizing electronic beam steering, multiple leaf perspectives are sent to a custom deep neural network, which discerns unique reflection patterns from subtle antenna variations, ensuring accurate and robust leaf water content estimations. We implement a prototype of Leafeon using a Commercial Off-The-Shelf mmWave radar and evaluate its performance with a variety of different leaf types. Leafeon was trained in-lab using high-resolution destructive leaf measurements, achieving a Mean Absolute Error (MAE) of leaf water content as low as 3.17% for the Avocado leaf, significantly outperforming the state-of-the-art approaches with an MAE reduction of up to 55.7%. Furthermore, we conducted experiments on live plants in both indoor and glasshouse experimental farm environments (see Fig. 1). Our results showed a strong correlation between predicted leaf water content levels and drought events. △ Less

Submitted 20 September, 2024; originally announced October 2024.

arXiv:2410.03679 [pdf, other]

MotionLeaf: Fine-grained Multi-Leaf Damped Vibration Monitoring for Plant Water Stress using Low-Cost mmWave Sensors

Authors: Mark Cardamis, Chun Tung Chou, Wen Hu

Abstract: In this paper, we introduce MotionLeaf , a novel mmWave base multi-point vibration frequency measurement system that can estimate plant stress by analyzing the surface vibrations of multiple leaves. MotionLeaf features a novel signal processing pipeline that accurately estimates fine-grained damped vibration frequencies based on noisy micro-displacement measurements from a mmWave radar. Specifical… ▽ More In this paper, we introduce MotionLeaf , a novel mmWave base multi-point vibration frequency measurement system that can estimate plant stress by analyzing the surface vibrations of multiple leaves. MotionLeaf features a novel signal processing pipeline that accurately estimates fine-grained damped vibration frequencies based on noisy micro-displacement measurements from a mmWave radar. Specifically we explore the Interquartile Mean (IQM) of coherent phase differences from neighboring Frequency-Modulated Continuous Wave (FMCW) radar chirps to calculate micro-displacements. Furthermore, we use the measurements from multiple received antennas in the radar to estimate the vibration signals of different leaves via a Blind Source Separation (BSS) method. Experimental results demonstrate that MotionLeaf can accurately measure the frequency of multiple leaves in a plant with average error of 0.0176 Hz, which is less than 50% of that (0.0416 Hz) of the state-of-the-art approach (mmVib). Additionally, the estimated natural vibration frequencies from MotionLeaf are shown to be an excellent feature to detect the water stress in the plant during 7-day drought experiments. △ Less

Submitted 20 September, 2024; originally announced October 2024.

arXiv:2410.02740 [pdf, other]

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Authors: Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang

Abstract: Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. For example, while synthetic captions often provide superior quality and image-text alignment, it is not clear whether they can fully replace AltTexts: the role of synthetic captions and their interaction with original web-crawled AltTexts in pre-training is stil… ▽ More Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. For example, while synthetic captions often provide superior quality and image-text alignment, it is not clear whether they can fully replace AltTexts: the role of synthetic captions and their interaction with original web-crawled AltTexts in pre-training is still not well understood. Moreover, different multimodal foundation models may have unique preferences for specific caption formats, but efforts to identify the optimal captions for each model remain limited. In this work, we propose a novel, controllable, and scalable captioning pipeline designed to generate diverse caption formats tailored to various multimodal models. By examining Short Synthetic Captions (SSC) towards Dense Synthetic Captions (DSC+) as case studies, we systematically explore their effects and interactions with AltTexts across models such as CLIP, multimodal LLMs, and diffusion models. Our findings reveal that a hybrid approach that keeps both synthetic captions and AltTexts can outperform the use of synthetic captions alone, improving both alignment and performance, with each model demonstrating preferences for particular caption formats. This comprehensive analysis provides valuable insights into optimizing captioning strategies, thereby advancing the pre-training of multimodal foundation models. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: CV/ML

arXiv:2410.02502 [pdf, other]

Measurement of the effective leptonic weak mixing angle

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1117 additional authors not shown)

Abstract: Using $pp$ collision data at $\sqrt{s}=13$ TeV, recorded by the LHCb experiment between 2016 and 2018 and corresponding to an integrated luminosity of $5.4$ fb$^{-1}$, the forward-backward asymmetry in the $pp \to Z/γ^{*} \to μ^+μ^-$ process is measured. The measurement is carried out in ten intervals of the difference between the muon pseudorapidities, within a fiducial region covering dimuon mas… ▽ More Using $pp$ collision data at $\sqrt{s}=13$ TeV, recorded by the LHCb experiment between 2016 and 2018 and corresponding to an integrated luminosity of $5.4$ fb$^{-1}$, the forward-backward asymmetry in the $pp \to Z/γ^{*} \to μ^+μ^-$ process is measured. The measurement is carried out in ten intervals of the difference between the muon pseudorapidities, within a fiducial region covering dimuon masses between $66$ and $116$ GeV, muon pseudorapidities between $2.0$ and $4.5$ and muon transverse momenta above $20$ GeV. These forward-backward asymmetries are compared with predictions, at next-to-leading order in the strong and electroweak couplings. The measured effective leptonic weak mixing angle is $\sin^2θ_{\rm eff}^\ell = 0.23147 \pm 0.00044 \pm 0.00005 \pm 0.00023$, where the first uncertainty is statistical, the second arises from systematic uncertainties associated with the asymmetry measurement, and the third arises from uncertainties in the fit model used to extract $\sin^2θ_{\rm eff}^\ell$ from the asymmetry measurement. This result is based on an arithmetic average of results using the CT18, MSHT20, and NNPDF31 parameterisations of the proton internal structure, and is consistent with previous measurements and with predictions from the global electroweak fit. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3360/ (LHCb public pages)

Report number: LHCb-PAPER-2024-028, CERN-EP-2024-230

arXiv:2410.00236 [pdf, other]

CLASSY XI: Tracing Neutral Gas Properties using UV Absorption Lines and 21-cm Observations

Authors: Kaelee S. Parker, Danielle A. Berg, Simon Gazagnes, John Chisholm, Bethan L. James, Matthew Hayes, Timothy Heckman, Alaina Henry, Michelle A. Berg, Karla Z. Arellano-Cordova, Xinfeng Xu, Dawn K. Erb, Crystal L. Martin, Weida Hu, Evan D. Skillman, Kristen B. W. McQuinn, Zuyi Chen, Dan P. Stark

Abstract: Rest-frame far-ultraviolet (FUV) observations from JWST are revolutionizing our understanding of the high-z galaxies that drove reionization and the mechanisms by which they accomplished it. To fully interpret these observations, we must be able to diagnose how properties of the interstellar medium (ISM; e.g., column density, covering fraction, outflow velocity) directly relate to the absorption f… ▽ More Rest-frame far-ultraviolet (FUV) observations from JWST are revolutionizing our understanding of the high-z galaxies that drove reionization and the mechanisms by which they accomplished it. To fully interpret these observations, we must be able to diagnose how properties of the interstellar medium (ISM; e.g., column density, covering fraction, outflow velocity) directly relate to the absorption features produced. Using the high-S/N and high-resolution FUV spectra of 45 nearby star-forming galaxies from CLASSY, we present the largest uniform, simultaneous characterization of neutral and low-ionization state (LIS) interstellar UV absorption lines (OI, SiII, SII, CII, AlII) across a wide range of galaxy properties. We also present 21-cm HI observations for 35 galaxies, multiple of which are gas-poor or non-detected, possibly indicating the onset of a post-starburst phase. We find that our simultaneous 1-component Voigt profile fits are capable of accurately modeling the LIS absorption for ~75% of galaxies, mitigating challenges associated with saturation, infilling, and degeneracies. While the most massive galaxies require additional components, our 1-component fits return average properties of the absorbing gas and follow the scaling relations described by a single gas cloud. We explore connections between LIS absorption and direct tracers of the neutral ISM (OI, Ly-alpha, HI 21-cm), finding that CII most closely traces the neutral gas trends while other ions exhibit weaker correlations. Given the challenges with directly observing HI at higher-z, we demonstrate that LIS absorption can be a powerful means to study the neutral ISM and present empirical relationships for predicting neutral gas properties. △ Less

Submitted 30 September, 2024; originally announced October 2024.

Comments: 25 pages with 16 figures and 2 tables. Long appendix with figure sets and tables. Accepted to ApJ

arXiv:2409.20449 [pdf, other]

Linear Projections of Teacher Embeddings for Few-Class Distillation

Authors: Noel Loo, Fotis Iliopoulos, Wei Hu, Erik Vee

Abstract: Knowledge Distillation (KD) has emerged as a promising approach for transferring knowledge from a larger, more complex teacher model to a smaller student model. Traditionally, KD involves training the student to mimic the teacher's output probabilities, while more advanced techniques have explored guiding the student to adopt the teacher's internal representations. Despite its widespread success,… ▽ More Knowledge Distillation (KD) has emerged as a promising approach for transferring knowledge from a larger, more complex teacher model to a smaller student model. Traditionally, KD involves training the student to mimic the teacher's output probabilities, while more advanced techniques have explored guiding the student to adopt the teacher's internal representations. Despite its widespread success, the performance of KD in binary classification and few-class problems has been less satisfactory. This is because the information about the teacher model's generalization patterns scales directly with the number of classes. Moreover, several sophisticated distillation methods may not be universally applicable or effective for data types beyond Computer Vision. Consequently, effective distillation techniques remain elusive for a range of key real-world applications, such as sentiment analysis, search query understanding, and advertisement-query relevance assessment. Taking these observations into account, we introduce a novel method for distilling knowledge from the teacher's model representations, which we term Learning Embedding Linear Projections (LELP). Inspired by recent findings about the structure of final-layer representations, LELP works by identifying informative linear subspaces in the teacher's embedding space, and splitting them into pseudo-subclasses. The student model is then trained to replicate these pseudo-classes. Our experimental evaluation on large-scale NLP benchmarks like Amazon Reviews and Sentiment140 demonstrate the LELP is consistently competitive with, and typically superior to, existing state-of-the-art distillation algorithms for binary and few-class problems, where most KD methods suffer. △ Less

Submitted 1 October, 2024; v1 submitted 30 September, 2024; originally announced September 2024.

arXiv:2409.18523 [pdf, other]

Token Caching for Diffusion Transformer Acceleration

Authors: Jinming Lou, Wenyang Luo, Yufan Liu, Bing Li, Xinmiao Ding, Weiming Hu, Jiajiong Cao, Yuming Li, Chenguang Ma

Abstract: Diffusion transformers have gained substantial interest in diffusion generative modeling due to their outstanding performance. However, their high computational cost, arising from the quadratic computational complexity of attention mechanisms and multi-step inference, presents a significant bottleneck. To address this challenge, we propose TokenCache, a novel post-training acceleration method that… ▽ More Diffusion transformers have gained substantial interest in diffusion generative modeling due to their outstanding performance. However, their high computational cost, arising from the quadratic computational complexity of attention mechanisms and multi-step inference, presents a significant bottleneck. To address this challenge, we propose TokenCache, a novel post-training acceleration method that leverages the token-based multi-block architecture of transformers to reduce redundant computations among tokens across inference steps. TokenCache specifically addresses three critical questions in the context of diffusion transformers: (1) which tokens should be pruned to eliminate redundancy, (2) which blocks should be targeted for efficient pruning, and (3) at which time steps caching should be applied to balance speed and quality. In response to these challenges, TokenCache introduces a Cache Predictor that assigns importance scores to tokens, enabling selective pruning without compromising model performance. Furthermore, we propose an adaptive block selection strategy to focus on blocks with minimal impact on the network's output, along with a Two-Phase Round-Robin (TPRR) scheduling policy to optimize caching intervals throughout the denoising process. Experimental results across various models demonstrate that TokenCache achieves an effective trade-off between generation quality and inference speed for diffusion transformers. Our code will be publicly available. △ Less

Submitted 27 September, 2024; originally announced September 2024.

arXiv:2409.17983 [pdf, other]

GRB 240529A: A Tale of Two Shocks

Authors: Tian-Rui Sun, Jin-Jun Geng, Jing-Zhi Yan, You-Dong Hu, Xue-Feng Wu, Alberto J. Castro-Tirado, Chao Yang, Yi-Ding Ping, Chen-Ran Hu, Fan Xu, Hao-Xuan Gao, Ji-An Jiang, Yan-Tian Zhu, Yongquan Xue, Ignacio Pérez-García, Si-Yu Wu, Emilio Fernández-García, María D. Caballero-García, Rubén Sánchez-Ramírez, Sergiy Guziy, Ignacio Olivares, Carlos Jesus Pérez del Pulgar, A. Castellón, Sebastián Castillo, Ding-Rong Xiong , et al. (44 additional authors not shown)

Abstract: Thanks to the rapidly increasing time-domain facilities, we are entering a golden era of research on gamma-ray bursts (GRBs). In this Letter, we report our observations of GRB 240529A with the Burst Optical Observer and Transient Exploring System, the 1.5-meter telescope at Observatorio Sierra Nevada, the 2.5-meter Wide Field Survey Telescope of China, the Large Binocular Telescope, and the Telesc… ▽ More Thanks to the rapidly increasing time-domain facilities, we are entering a golden era of research on gamma-ray bursts (GRBs). In this Letter, we report our observations of GRB 240529A with the Burst Optical Observer and Transient Exploring System, the 1.5-meter telescope at Observatorio Sierra Nevada, the 2.5-meter Wide Field Survey Telescope of China, the Large Binocular Telescope, and the Telescopio Nazionale Galileo. The prompt emission of GRB 240529A shows two comparable energetic episodes separated by a quiescence time of roughly 400 s. Combining all available data on the GRB Coordinates Network, we reveal the simultaneous apparent X-ray plateau and optical re-brightening around $10^3-10^4$ s after the burst. Rather than the energy injection from the magnetar as widely invoked for similar GRBs, the multi-wavelength emissions could be better explained as two shocks launched from the central engine separately. The optical peak time and our numerical modeling suggest that the initial bulk Lorentz factor of the later shock is roughly 50, which indicates that the later jet should be accretion-driven and have a higher mass loading than a typical one. The quiescence time between the two prompt emission episodes may be caused by the transition between different accretion states of a central magnetar or black hole, or the fall-back accretion process. A sample of similar bursts with multiple emission episodes in the prompt phase and sufficient follow-up could help to probe the underlying physics of GRB central engines. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Resubmitted to ApJL after addressing the referee's comments; comments are welcome

arXiv:2409.17209 [pdf, other]

Search for $B_{(s)}^{*0}\toμ^+μ^-$ in $B_c^+\toπ^+μ^+μ^-$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1113 additional authors not shown)

Abstract: A search for the very rare $B^{*0}\toμ^+μ^-$ and $B_{s}^{*0}\toμ^+μ^-$ decays is conducted by analysing the $B_c^+\to π^+μ^+μ^-$ process. The analysis uses proton-proton collision data collected with the LHCb detector between 2011 and 2018, corresponding to an integrated luminosity of 9$\text{\,fb}^{-1}$. The signal signatures correspond to simultaneous peaks in the $μ^+μ^-$ and $π^+μ^+μ^-$ invari… ▽ More A search for the very rare $B^{*0}\toμ^+μ^-$ and $B_{s}^{*0}\toμ^+μ^-$ decays is conducted by analysing the $B_c^+\to π^+μ^+μ^-$ process. The analysis uses proton-proton collision data collected with the LHCb detector between 2011 and 2018, corresponding to an integrated luminosity of 9$\text{\,fb}^{-1}$. The signal signatures correspond to simultaneous peaks in the $μ^+μ^-$ and $π^+μ^+μ^-$ invariant masses. No evidence for an excess of events over background is observed for either signal decay mode. Upper limits at the $90\%$ confidence level are set on the branching fractions relative to that for $B_c^+\to J\mskip -3mu/\mskip -2muψπ^+$ decays, \begin{align*} {\cal R}_{B^{*0}(μ^+μ^-)π^+/J\mskip -3mu/\mskip -2muψπ^+} &< 3.8\times 10^{-5}\ \text{ and } {\cal R}_{B_{s}^{*0}(μ^+μ^-)π^+/J\mskip -3mu/\mskip -2muψπ^+} &< 5.0\times 10^{-5}\,. \end{align*} △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/1796/ (LHCb public pages)

Report number: CERN-EP-2024-220, LHCb-PAPER-2024-026

arXiv:2409.16092 [pdf]

Modulating dislocation reactions through preferential hydrogen segregation in bcc metals

Authors: Jie Hou, Ducheng Peng, Xiang-Shan Kong, Huiqiu Deng, Wangyu Hu, Cheng Chen, Jun Song

Abstract: The interaction between dislocations is fundamental to plastic deformation, work hardening, and defect accumulation. While extensive research has focused on the impact of solutes on individual dislocations, how solutes affect dislocation-dislocation reactions remains largely unexplored. Here, using atomistic simulations of iron as a model bcc system, we demonstrate that hydrogen solutes enable two… ▽ More The interaction between dislocations is fundamental to plastic deformation, work hardening, and defect accumulation. While extensive research has focused on the impact of solutes on individual dislocations, how solutes affect dislocation-dislocation reactions remains largely unexplored. Here, using atomistic simulations of iron as a model bcc system, we demonstrate that hydrogen solutes enable two <111>/2 screw dislocations to react and form a <001> edge dislocation junction, a process that is otherwise unfavorable in hydrogen-free environments. This phenomenon arises from the preferential segregation of hydrogen around the <001> dislocation, which reduces the energy of the reaction product. The resulting <001> dislocation demonstrates remarkable stability and transforms into a <001> vacancy-type dislocation loop under strain. These vacancy-type dislocation loops can accumulate during continuous deformation and dislocation reactions, serving as precursors for the initiation of structural damage, such as cracking and blistering. Our findings highlight the pivotal role of hydrogen in dislocation reactions, uncover a novel defect accumulation mechanism crucial for interpreting recent experimental observations, and represent a significant advance in understanding hydrogen-induced damage in bcc metals. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.14605 [pdf]

First Field Trial of LLM-Powered AI Agent for Lifecycle Management of Autonomous Driving Optical Networks

Authors: Xiaomin Liu, Qizhi Qiu, Yihao Zhang, Yuming Cheng, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: We design and demonstrate the first field trial of LLM-powered AI Agent for ADON. Three operation modes of the Agent are proposed for network lifecycle management. The Agent efficiently processes wavelength add/drop and soft/hard failures, and achieves comparable performance to human-designed algorithms for power optimization. We design and demonstrate the first field trial of LLM-powered AI Agent for ADON. Three operation modes of the Agent are proposed for network lifecycle management. The Agent efficiently processes wavelength add/drop and soft/hard failures, and achieves comparable performance to human-designed algorithms for power optimization. △ Less

Submitted 24 September, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

Comments: Version submitted to ECOC PDP 2024 on September 6th

arXiv:2409.14400 [pdf]

Preamble Design for Joint Frame Synchronization, Frequency Offset Estimation, and Channel Estimation in Upstream Burst-mode Detection of Coherent PONs

Authors: Yongxin Sun, Hexun Jiang, Yicheng Xu, Mengfan Fu, Yixiao Zhu, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: Coherent optics has demonstrated significant potential as a viable solution for achieving 100 Gb/s and higher speeds in single-wavelength passive optical networks (PON). However, upstream burst-mode coherent detection is a major challenge when adopting coherent optics in access networks. To accelerate digital signal processing (DSP) convergence with a minimal preamble length, we propose a novel bu… ▽ More Coherent optics has demonstrated significant potential as a viable solution for achieving 100 Gb/s and higher speeds in single-wavelength passive optical networks (PON). However, upstream burst-mode coherent detection is a major challenge when adopting coherent optics in access networks. To accelerate digital signal processing (DSP) convergence with a minimal preamble length, we propose a novel burst-mode preamble design based on a constant amplitude zero auto-correlation sequence. This design facilitates comprehensive estimation of linear channel effects in the frequency domain, including polarization state rotation, differential group delay, chromatic dispersion, and polarization-dependent loss, providing overall system response information for channel equalization pre-convergence. Additionally, this preamble utilizes the same training unit to jointly achieve three key DSP functions: frame synchronization, frequency offset estimation, and channel estimation. This integration contributes to a significant reduction in the preamble length. The feasibility of the proposed preamble with a length of 272 symbols and corresponding DSP was experimentally verified in a 15 Gbaud coherent system using dual-polarization 16 quadrature amplitude modulation. The experimental results based on this scheme showed a superior performance of the convergence acceleration. △ Less

Submitted 22 September, 2024; originally announced September 2024.

Comments: 10 pages, 12 figures

arXiv:2409.14122 [pdf, other]

Efficient and Effective Model Extraction

Authors: Hongyu Zhu, Wentao Hu, Sichu Liang, Fangqi Li, Wenwen Wang, Shilin Wang

Abstract: Model extraction aims to create a functionally similar copy from a machine learning as a service (MLaaS) API with minimal overhead, typically for illicit profit or as a precursor to further attacks, posing a significant threat to the MLaaS ecosystem. However, recent studies have shown that model extraction is highly inefficient, particularly when the target task distribution is unavailable. In suc… ▽ More Model extraction aims to create a functionally similar copy from a machine learning as a service (MLaaS) API with minimal overhead, typically for illicit profit or as a precursor to further attacks, posing a significant threat to the MLaaS ecosystem. However, recent studies have shown that model extraction is highly inefficient, particularly when the target task distribution is unavailable. In such cases, even substantially increasing the attack budget fails to produce a sufficiently similar replica, reducing the adversary's motivation to pursue extraction attacks. In this paper, we revisit the elementary design choices throughout the extraction lifecycle. We propose an embarrassingly simple yet dramatically effective algorithm, Efficient and Effective Model Extraction (E3), focusing on both query preparation and training routine. E3 achieves superior generalization compared to state-of-the-art methods while minimizing computational costs. For instance, with only 0.005 times the query budget and less than 0.2 times the runtime, E3 outperforms classical generative model based data-free model extraction by an absolute accuracy improvement of over 50% on CIFAR-10. Our findings underscore the persistent threat posed by model extraction and suggest that it could serve as a valuable benchmarking algorithm for future security evaluations. △ Less

Submitted 24 September, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

arXiv:2409.12961 [pdf, other]

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Authors: Zuyan Liu, Yuhao Dong, Ziwei Liu, Winston Hu, Jiwen Lu, Yongming Rao

Abstract: Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours. Existing multi-modal LLMs usually standardize these diverse visual inputs to a fixed resolution for visual encoders and yield similar numbers of tokens for LLMs. This approach is non-optimal for multimodal understanding and inefficient for processing inputs with long and short visual co… ▽ More Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours. Existing multi-modal LLMs usually standardize these diverse visual inputs to a fixed resolution for visual encoders and yield similar numbers of tokens for LLMs. This approach is non-optimal for multimodal understanding and inefficient for processing inputs with long and short visual contents. To solve the problem, we propose Oryx, a unified multimodal architecture for the spatial-temporal understanding of images, videos, and multi-view 3D scenes. Oryx offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths through two core innovations: 1) a pre-trained OryxViT model that can encode images at any resolution into LLM-friendly visual representations; 2) a dynamic compressor module that supports 1x to 16x compression on visual tokens by request. These design features enable Oryx to accommodate extremely long visual contexts, such as videos, with lower resolution and high compression while maintaining high recognition precision for tasks like document understanding with native resolution and no compression. Beyond the architectural improvements, enhanced data curation and specialized training on long-context retrieval and spatial-aware data help Oryx achieve strong capabilities in image, video, and 3D multimodal understanding simultaneously. Our work is open-sourced at https://github.com/Oryx-mllm/Oryx. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.12629 [pdf, other]

Analysis of $\itΛ^\mathrm{0}_b \rightarrow pK^-μ^+μ^-$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1114 additional authors not shown)

Abstract: The differential branching fraction and angular coefficients of \ensuremath{\itΛ^\mathrm{0}_b \rightarrow pK^-μ^+μ^-}\xspace decays are measured in bins of the dimuon mass squared and dihadron mass. The analysis is performed using a data set corresponding to 9$\aunit{fb}^{-1}$ of integrated luminosity collected with the $\mbox{LHCb}$ detector between 2011 and 2018. The data are consistent with rec… ▽ More The differential branching fraction and angular coefficients of \ensuremath{\itΛ^\mathrm{0}_b \rightarrow pK^-μ^+μ^-}\xspace decays are measured in bins of the dimuon mass squared and dihadron mass. The analysis is performed using a data set corresponding to 9$\aunit{fb}^{-1}$ of integrated luminosity collected with the $\mbox{LHCb}$ detector between 2011 and 2018. The data are consistent with receiving contributions from a mixture of $\itΛ$ resonances with different spin-parity quantum numbers. The angular coefficients show a pattern of vector--axial vector interference that is a characteristic of the type of flavour-changing neutral-current transition relevant for these decays. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/3264.html (LHCb public pages)

Report number: CERN-EP-2024-212, LHCb-PAPER-2024-024

arXiv:2409.12136 [pdf, other]

GRIN: GRadient-INformed MoE

Authors: Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen

Abstract: Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training practices, as discrete expert routing hinders standard backpropagation and thus gradient-based optimization, which are the cornerstone of deep learning. To… ▽ More Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training practices, as discrete expert routing hinders standard backpropagation and thus gradient-based optimization, which are the cornerstone of deep learning. To better pursue the scaling power of MoE, we introduce GRIN (GRadient-INformed MoE training), which incorporates sparse gradient estimation for expert routing and configures model parallelism to avoid token dropping. Applying GRIN to autoregressive language modeling, we develop a top-2 16$\times$3.8B MoE model. Our model, with only 6.6B activated parameters, outperforms a 7B dense model and matches the performance of a 14B dense model trained on the same data. Extensive evaluations across diverse tasks demonstrate the potential of GRIN to significantly enhance MoE efficacy, achieving 79.4 on MMLU, 83.7 on HellaSwag, 74.4 on HumanEval, and 58.9 on MATH. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: 58 pages

arXiv:2409.08626 [pdf, ps, other]

Convex Reformulation of Information Constrained Linear State Estimation with Mixed-Binary Variables for Outlier Accommodation

Authors: Wang Hu, Zeyi Jiang, Hamed Mohsenian-Rad, Jay A. Farrell

Abstract: This article considers the challenge of accommodating outlier measurements in state estimation. The Risk-Averse Performance-Specified (RAPS) state estimation approach addresses outliers as a measurement selection Bayesian risk minimization problem subject to an information accuracy constraint, which is a non-convex optimization problem. Prior explorations into RAPS rely on exhaustive search, which… ▽ More This article considers the challenge of accommodating outlier measurements in state estimation. The Risk-Averse Performance-Specified (RAPS) state estimation approach addresses outliers as a measurement selection Bayesian risk minimization problem subject to an information accuracy constraint, which is a non-convex optimization problem. Prior explorations into RAPS rely on exhaustive search, which becomes computationally infeasible as the number of measurements increases. This paper derives a convex formulation for the RAPS optimization problems via transforming the mixed-binary variables into linear constraints. The convex reformulation herein can be solved by convex programming toolboxes, significantly enhancing computational efficiency. We explore two specifications: Full-RAPS, utilizing the full information matrix, and Diag-RAPS, focusing on diagonal elements only. The simulation comparison demonstrates that Diag-RAPS is faster and more efficient than Full-RAPS. In comparison with Kalman Filter (KF) and Threshold Decisions (TD), Diag-RAPS consistently achieves the lowest risk, while achieving the performance specification when it is feasible. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: Accepted by the 2024 IEEE Conference on Decision and Control

arXiv:2409.08526 [pdf, other]

Deep Picard Iteration for High-Dimensional Nonlinear PDEs

Authors: Jiequn Han, Wei Hu, Jihao Long, Yue Zhao

Abstract: We present the Deep Picard Iteration (DPI) method, a new deep learning approach for solving high-dimensional partial differential equations (PDEs). The core innovation of DPI lies in its use of Picard iteration to reformulate the typically complex training objectives of neural network-based PDE solutions into much simpler, standard regression tasks based on function values and gradients. This desi… ▽ More We present the Deep Picard Iteration (DPI) method, a new deep learning approach for solving high-dimensional partial differential equations (PDEs). The core innovation of DPI lies in its use of Picard iteration to reformulate the typically complex training objectives of neural network-based PDE solutions into much simpler, standard regression tasks based on function values and gradients. This design not only greatly simplifies the optimization process but also offers the potential for further scalability through parallel data generation. Crucially, to fully realize the benefits of regressing on both function values and gradients in the DPI method, we address the issue of infinite variance in the estimators of gradients by incorporating a control variate, supported by our theoretical analysis. Our experiments on problems up to 100 dimensions demonstrate that DPI consistently outperforms existing state-of-the-art methods, with greater robustness to hyperparameters, particularly in challenging scenarios with long time horizons and strong nonlinearity. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.07447 [pdf, other]

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos

Authors: Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan

Abstract: This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience. Leveraging foundation models as priors, our approach overcomes the limitations of traditional methods and boosts the performance to ensure the high-fidelity generation required by the display devices. The proposed system consists of two… ▽ More This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience. Leveraging foundation models as priors, our approach overcomes the limitations of traditional methods and boosts the performance to ensure the high-fidelity generation required by the display devices. The proposed system consists of two main steps: depth-based video splatting for warping and extracting occlusion mask, and stereo video inpainting. We utilize pre-trained stable video diffusion as the backbone and introduce a fine-tuning protocol for the stereo video inpainting task. To handle input video with varying lengths and resolutions, we explore auto-regressive strategies and tiled processing. Finally, a sophisticated data processing pipeline has been developed to reconstruct a large-scale and high-quality dataset to support our training. Our framework demonstrates significant improvements in 2D-to-3D video conversion, offering a practical solution for creating immersive content for 3D devices like Apple Vision Pro and 3D displays. In summary, this work contributes to the field by presenting an effective method for generating high-quality stereoscopic videos from monocular input, potentially transforming how we experience digital media. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: 11 pages, 10 figures

ACM Class: I.3.0; I.4.0

arXiv:2409.07391 [pdf, other]

Improve Sensitivity Analysis Synthesizing Randomized Clinical Trials With Limited Overlap

Authors: Kuan Jiang, Wenjie Hu, Shu Yang, Xinxing Lai, Xiaohua Zhou

Abstract: To estimate the average treatment effect in real-world populations, observational studies are typically designed around real-world cohorts. However, even when study samples from these designs represent the population, unmeasured confounders can introduce bias. Sensitivity analysis is often used to estimate bounds for the average treatment effect without relying on the strict mathematical assumptio… ▽ More To estimate the average treatment effect in real-world populations, observational studies are typically designed around real-world cohorts. However, even when study samples from these designs represent the population, unmeasured confounders can introduce bias. Sensitivity analysis is often used to estimate bounds for the average treatment effect without relying on the strict mathematical assumptions of other existing methods. This article introduces a new approach that improves sensitivity analysis in observational studies by incorporating randomized clinical trial data, even with limited overlap due to inclusion/exclusion criteria. Theoretical proof and simulations show that this method provides a tighter bound width than existing approaches. We also apply this method to both a trial dataset and a real-world drug effectiveness comparison dataset for practical analysis. △ Less

Submitted 1 October, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.07075 [pdf, ps, other]

Analytical approach for pure high, even-order dispersion solitons

Authors: Xing Liao, Jiahan Huang, Daquan Lu, Wei Hu

Abstract: We theoretically solve the nonlinear Schrödinger equation describing the propagation of pure high, even order dispersion (PHEODs) solitons by variational approach. The Lagrangian for nonlinear pulse transmission systems with each dispersion order are given and the analytical solutions of PHEOD soltions are obtained and compared with the numerical results. It is shown that the variational results a… ▽ More We theoretically solve the nonlinear Schrödinger equation describing the propagation of pure high, even order dispersion (PHEODs) solitons by variational approach. The Lagrangian for nonlinear pulse transmission systems with each dispersion order are given and the analytical solutions of PHEOD soltions are obtained and compared with the numerical results. It is shown that the variational results approximate very well for lower orders of dispersion ($\le 8$) and get worst as the order increasing. In addition, using the linear stability analysis, we demonstrate that all PHEOD solitons are stable and obtain the soliton internal modes that accompany soliton transmission. These results are helpful for the application of PHEOD solitons in high energy lasers. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: 4 figures

arXiv:2409.06483 [pdf, other]

Dual Gravitational Wave Signatures of Instant Preheating

Authors: Wei-Yu Hu, Kazunori Nakayama, Volodymyr Takhistov, Yong Tang

Abstract: In the instant preheating scenario efficient particle production occurs immediately following the period of inflationary expansion in the early Universe. We demonstrate that instant preheating predicts unique gravitational wave (GW) signals arising from two distinct origins. One source is the bremsstrahlung GWs produced through the decay of superheavy particles, an inevitable consequence of instan… ▽ More In the instant preheating scenario efficient particle production occurs immediately following the period of inflationary expansion in the early Universe. We demonstrate that instant preheating predicts unique gravitational wave (GW) signals arising from two distinct origins. One source is the bremsstrahlung GWs produced through the decay of superheavy particles, an inevitable consequence of instant preheating. The other is GWs generated from the nonlinear dynamics of the inflaton and coupled scalar fields. Using numerical simulations, we show that the peak of the GW spectrum shifts depending on the coupling constants of the theory. The detection of these dual GW signatures, characteristic of instant preheating, provides novel opportunities for probing the dynamics of the early Universe. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 28 pages, 8 figures

Report number: TU-1240, KEK-QUP-2024-0020, KEK-TH-2650, KEK-Cosmo-0356

arXiv:2409.06277 [pdf, other]

Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

Authors: Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu

Abstract: Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically com… ▽ More Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To address these limitations, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (1) it employs widely applied first-order methods for efficient local updates; (2) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (3) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret. △ Less

Submitted 10 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.05440 [pdf, other]

First determination of the spin-parity of $Ξ_{c}(3055)^{+,0}$ baryons

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1109 additional authors not shown)

Abstract: The ${Ξ_{b}^{0(-)}\toΞ_{c}(3055)^{+(0)}(\to D^{+(0)}Λ)π^{-}}$ decay chains are observed, and the spin-parity of $Ξ_{c}(3055)^{+(0)}$ baryons is determined for the first time. The measurement is performed using proton-proton collision data at a center-of-mass energy of $\sqrt{s}=13\,\text{TeV}$, corresponding to an integrated luminosity of $5.4\,\text{fb}^{-1}$, recorded by the~$\text{LHCb}$ experi… ▽ More The ${Ξ_{b}^{0(-)}\toΞ_{c}(3055)^{+(0)}(\to D^{+(0)}Λ)π^{-}}$ decay chains are observed, and the spin-parity of $Ξ_{c}(3055)^{+(0)}$ baryons is determined for the first time. The measurement is performed using proton-proton collision data at a center-of-mass energy of $\sqrt{s}=13\,\text{TeV}$, corresponding to an integrated luminosity of $5.4\,\text{fb}^{-1}$, recorded by the~$\text{LHCb}$ experiment between 2016 and 2018. The spin-parity of the $Ξ_{c}(3055)^{+(0)}$ baryons is determined to be $3/2^{+}$ with a significance of more than $6.5σ$ ($3.5σ$) compared to all other tested hypotheses. The up-down asymmetries of the ${Ξ_{b}^{0(-)}\toΞ_{c}(3055)^{+(0)}π^{-}}$ transitions are measured to be $-0.92\pm0.10\pm0.05$ ($-0.92\pm0.16\pm0.22$), consistent with maximal parity violation, where the first uncertainty is statistical and the second is systematic. These results support the hypothesis that the $Ξ_{c}(3055)^{+(0)}$ baryons correspond to the first $D$-wave $λ$-mode excitation of the $Ξ_{c}$ flavor triplet. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/1603 (LHCb public pages)

Report number: LHCb-PAPER-2024-018, CERN-EP-2024-215

arXiv:2409.05001 [pdf, other]

A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement

Authors: Huan Zhang, Wei Cheng, Yuhan Wu, Wei Hu

Abstract: Large language models (LLMs) have achieved impressive performance on code generation. Although prior studies enhanced LLMs with prompting techniques and code refinement, they still struggle with complex programming problems due to rigid solution plans. In this paper, we draw on pair programming practices to propose PairCoder, a novel LLM-based framework for code generation. PairCoder incorporates… ▽ More Large language models (LLMs) have achieved impressive performance on code generation. Although prior studies enhanced LLMs with prompting techniques and code refinement, they still struggle with complex programming problems due to rigid solution plans. In this paper, we draw on pair programming practices to propose PairCoder, a novel LLM-based framework for code generation. PairCoder incorporates two collaborative LLM agents, namely a Navigator agent for high-level planning and a Driver agent for specific implementation. The Navigator is responsible for proposing promising solution plans, selecting the current optimal plan, and directing the next iteration round based on execution feedback. The Driver follows the guidance of Navigator to undertake initial code generation, code testing, and refinement. This interleaved and iterative workflow involves multi-plan exploration and feedback-based refinement, which mimics the collaboration of pair programmers. We evaluate PairCoder with both open-source and closed-source LLMs on various code generation benchmarks. Extensive experimental results demonstrate the superior accuracy of PairCoder, achieving relative pass@1 improvements of 12.00%-162.43% compared to prompting LLMs directly. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: Accepted in the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)

arXiv:2409.03496 [pdf, other]

Measurement of exclusive $J/ψ$ and $ψ(2S)$ production at $\sqrt{s}=13$ TeV

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1072 additional authors not shown)

Abstract: Measurements are presented of the cross-section for the central exclusive production of $J/ψ\toμ^+μ^-$ and $ψ(2S)\toμ^+μ^-$ processes in proton-proton collisions at $\sqrt{s} = 13 $ TeV with 2016-2018 data. They are performed by requiring both muons to be in the LHCb acceptance (with pseudorapidity $2<η_{μ^\pm} < 4.5$) and mesons in the rapidity range $2.0 < y < 4.5$. The integrated cross-section… ▽ More Measurements are presented of the cross-section for the central exclusive production of $J/ψ\toμ^+μ^-$ and $ψ(2S)\toμ^+μ^-$ processes in proton-proton collisions at $\sqrt{s} = 13 $ TeV with 2016-2018 data. They are performed by requiring both muons to be in the LHCb acceptance (with pseudorapidity $2<η_{μ^\pm} < 4.5$) and mesons in the rapidity range $2.0 < y < 4.5$. The integrated cross-section results are \begin{equation*} σ_{J/ψ\toμ^+μ^-}(2.0<y_{J/ψ}<4.5,2.0<η_{μ^\pm} < 4.5) = 400 \pm 2 \pm 5 \pm 12 \,{\rm pb}\,, \end{equation*} \begin{equation*} σ_{ψ(2S)\toμ^+μ^-}(2.0<y_{ψ(2S)}<4.5,2.0<η_{μ^\pm} < 4.5) = 9.40 \pm 0.15 \pm 0.13 \pm 0.27 \,{\rm pb}\,, \end{equation*} where the uncertainties are statistical, systematic and due to the luminosity determination. In addition, a measurement of the ratio of $ψ(2S)$ and $J/ψ$ cross-sections, at an average photon-proton centre-of-mass energy of 1 TeV, is performed, giving \begin{equation*} \frac{σ_{ψ(2S)}}{σ_{J/ψ}} = 0.1763 \pm 0.0029 \pm 0.0008 \pm 0.0039 \,, \end{equation*} where the first uncertainty is statistical, the second systematic and the third due to the knowledge of the involved branching fractions. For the first time, the dependence of the $J/ψ$ and $ψ(2S)$ cross-sections on the total transverse momentum transfer is determined in $pp$ collisions and is found consistent with the behaviour observed in electron-proton collisions. △ Less

Submitted 11 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/1801

Report number: LHCb-PAPER-2024-012, CERN-EP-2024-213

arXiv:2409.03009 [pdf, other]

Measurement of $CP$ violation in ${B^0}\rightarrow{D^{+}D^{-}}$ and ${B^{0}_{s}}\rightarrow{D^{+}_{s}D^{-}_{s}}$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1115 additional authors not shown)

Abstract: A time-dependent, flavour-tagged measurement of $CP$ violation is performed with ${B^0}\rightarrow{D^{+}D^{-}}$ and ${B^{0}_{s}}\rightarrow{D^{+}_{s}D^{-}_{s}}$ decays, using data collected by the LHCb detector in proton-proton collisions at a centre-of-mass energy of 13 TeV corresponding to an integrated luminosity of 6 fb$^{-1}$. In ${B^0}\rightarrow{D^{+}D^{-}}$ decays the $CP$-violation parame… ▽ More A time-dependent, flavour-tagged measurement of $CP$ violation is performed with ${B^0}\rightarrow{D^{+}D^{-}}$ and ${B^{0}_{s}}\rightarrow{D^{+}_{s}D^{-}_{s}}$ decays, using data collected by the LHCb detector in proton-proton collisions at a centre-of-mass energy of 13 TeV corresponding to an integrated luminosity of 6 fb$^{-1}$. In ${B^0}\rightarrow{D^{+}D^{-}}$ decays the $CP$-violation parameters are measured to be \begin{align} S_{D^{+}D^{-}} & = -0.552 \pm 0.100\,\text{(stat)} \pm 0.010\,\text{(syst)}, \nonumber \newline C_{D^{+}D^{-}} & = \phantom{-}0.128 \pm0.103\,\text{(stat)} \pm 0.010\,\text{(syst)}. \nonumber \end{align} In $B^{0}_{s} \rightarrow D^{+}_{s}D^{-}_{s}$ decays the $CP$-violating parameter formulation in terms of $φ_{s}$ and $|λ|$ results in \begin{align} φ_{s} & = -0.086 \pm 0.106 \,\text{(stat)} \pm 0.028\,\text{(syst)} \,\text{rad}, \nonumber \newline |λ_{D^{+}_{s}D^{-}_{s}}| & = \phantom{-}1.145 \pm 0.126\,\text{(stat)} \pm 0.031\,\text{(syst)}. \nonumber \end{align} These results represent the most precise single measurement of the $CP$-violation parameters in their respective channels. For the first time in a single measurement, $CP$ symmetry is observed to be violated in ${B^0}\rightarrow{D^{+}D^{-}}$ decays with a significance exceeding six standard deviations. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3262/ (LHCb public pages)

Report number: LHCb-PAPER-2024-027, CERN-EP-2024-217

arXiv:2409.02759 [pdf, other]

Measurement of $\itΛ_\it{b}^0$, $\itΛ_\it{c}^+$ and $\itΛ$ decay parameters using $\itΛ_\it{b}^0 \to \itΛ_\it{c}^+ h^-$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1103 additional authors not shown)

Abstract: A comprehensive study of the angular distributions in the bottom-baryon decays $\itΛ^\mathrm{0}_b\to\itΛ_c^+ h^-(h=π, K)$, followed by $\itΛ_c^+\to\itΛ h^+$ with $\itΛ\to \it{p} π^-$ or $\itΛ_c^+\to\it{p}\it{K}^0_\mathrm{S}$ decays, is performed using a data sample of proton-proton collisions corresponding to an integrated luminosity of $9~\mathrm{fb}^{-1}$ collected by the LHCb experiment at cent… ▽ More A comprehensive study of the angular distributions in the bottom-baryon decays $\itΛ^\mathrm{0}_b\to\itΛ_c^+ h^-(h=π, K)$, followed by $\itΛ_c^+\to\itΛ h^+$ with $\itΛ\to \it{p} π^-$ or $\itΛ_c^+\to\it{p}\it{K}^0_\mathrm{S}$ decays, is performed using a data sample of proton-proton collisions corresponding to an integrated luminosity of $9~\mathrm{fb}^{-1}$ collected by the LHCb experiment at center-of-mass energies of 7, 8 and 13 $\mathrm{Te\kern -0.1em V}$. The decay parameters and the associated charge-parity ($C\!P$) asymmetries are measured, with no significant $C\!P$ violation observed. For the first time, the $\itΛ^\mathrm{0}_b \to \itΛ_c^+ h^-$ decay parameters are measured. The most precise measurements of the decay parameters $α, β$ and $γ$ are obtained for $\itΛ_c^+$ decays and an independent measurement of the decay parameters for the strange-baryon $\itΛ$ decay is provided. The results deepen our understanding of weak decay dynamics in baryon decays. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-017.html(LHCb public pages)

Report number: LHCb-PAPER-2024-017, CERN-EP-2024-200

arXiv:2409.02095 [pdf, other]

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Authors: Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan

Abstract: Despite significant advancements in monocular depth estimation for static images, estimating video depth in the open world remains challenging, since open-world videos are extremely diverse in content, motion, camera movement, and length. We present DepthCrafter, an innovative method for generating temporally consistent long depth sequences with intricate details for open-world videos, without req… ▽ More Despite significant advancements in monocular depth estimation for static images, estimating video depth in the open world remains challenging, since open-world videos are extremely diverse in content, motion, camera movement, and length. We present DepthCrafter, an innovative method for generating temporally consistent long depth sequences with intricate details for open-world videos, without requiring any supplementary information such as camera poses or optical flow. DepthCrafter achieves generalization ability to open-world videos by training a video-to-depth model from a pre-trained image-to-video diffusion model, through our meticulously designed three-stage training strategy with the compiled paired video-depth datasets. Our training approach enables the model to generate depth sequences with variable lengths at one time, up to 110 frames, and harvest both precise depth details and rich content diversity from realistic and synthetic datasets. We also propose an inference strategy that processes extremely long videos through segment-wise estimation and seamless stitching. Comprehensive evaluations on multiple datasets reveal that DepthCrafter achieves state-of-the-art performance in open-world video depth estimation under zero-shot settings. Furthermore, DepthCrafter facilitates various downstream applications, including depth-based visual effects and conditional video generation. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Project webpage: https://depthcrafter.github.io

arXiv:2409.02048 [pdf, other]

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

Authors: Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, Yonghong Tian

Abstract: Despite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. In this work, we propose \textbf{ViewCrafter}, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images with the prior of video diffusion model. Our method takes advantage of the powerful generation capabilities… ▽ More Despite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. In this work, we propose \textbf{ViewCrafter}, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images with the prior of video diffusion model. Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames with precise camera pose control. To further enlarge the generation range of novel views, we tailored an iterative view synthesis strategy together with a camera trajectory planning algorithm to progressively extend the 3D clues and the areas covered by the novel views. With ViewCrafter, we can facilitate various applications, such as immersive experiences with real-time rendering by efficiently optimizing a 3D-GS representation using the reconstructed 3D points and the generated novel views, and scene-level text-to-3D generation for more imaginative content creation. Extensive experiments on diverse datasets demonstrate the strong generalization capability and superior performance of our method in synthesizing high-fidelity and consistent novel views. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Project page: https://drexubery.github.io/ViewCrafter/

arXiv:2409.01676 [pdf, other]

Classifier-Free Diffusion-Based Weakly-Supervised Approach for Health Indicator Derivation in Rotating Machines: Advancing Early Fault Detection and Condition Monitoring

Authors: Wenyang Hu, Gaetan Frusque, Tianyang Wang, Fulei Chu, Olga Fink

Abstract: Deriving health indicators of rotating machines is crucial for their maintenance. However, this process is challenging for the prevalent adopted intelligent methods since they may take the whole data distributions, not only introducing noise interference but also lacking the explainability. To address these issues, we propose a diffusion-based weakly-supervised approach for deriving health indicat… ▽ More Deriving health indicators of rotating machines is crucial for their maintenance. However, this process is challenging for the prevalent adopted intelligent methods since they may take the whole data distributions, not only introducing noise interference but also lacking the explainability. To address these issues, we propose a diffusion-based weakly-supervised approach for deriving health indicators of rotating machines, enabling early fault detection and continuous monitoring of condition evolution. This approach relies on a classifier-free diffusion model trained using healthy samples and a few anomalies. This model generates healthy samples. and by comparing the differences between the original samples and the generated ones in the envelope spectrum, we construct an anomaly map that clearly identifies faults. Health indicators are then derived, which can explain the fault types and mitigate noise interference. Comparative studies on two cases demonstrate that the proposed method offers superior health monitoring effectiveness and robustness compared to baseline models. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.01414 [pdf, other]

Measurement of $C\!P$ violation observables in $D^+\rightarrow K^-K^+π^+$ decays

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1109 additional authors not shown)

Abstract: A search for violation of the charge-parity $C\!P$ symmetry in the $D^+\rightarrow K^-K^+π^+$ decay is presented, with proton-proton collision data corresponding to an integrated luminosity of 5.4 fb$^{-1}$, collected at a center-of-mass energy of $13$ TeV with the LHCb detector. A novel model-independent technique is used to compare the $D^+$ and $D^-$ phase-space distributions, with instrumental… ▽ More A search for violation of the charge-parity $C\!P$ symmetry in the $D^+\rightarrow K^-K^+π^+$ decay is presented, with proton-proton collision data corresponding to an integrated luminosity of 5.4 fb$^{-1}$, collected at a center-of-mass energy of $13$ TeV with the LHCb detector. A novel model-independent technique is used to compare the $D^+$ and $D^-$ phase-space distributions, with instrumental asymmetries subtracted using the $D^+_{s}\rightarrow K^-K^+π^+$ decay as a control channel. The $p$-value for the hypothesis of $C\!P$ conservation is $8.1\%$. The $C\!P$ asymmetry observables $A_{C\!P|S}^{φπ^+} = (0.95 \pm 0.43_{stat} \pm 0.26_{syst})\times 10^{-3}$ and $A_{C\!P|S}^{\overline{K}^{*0}K^+} = (-0.26 \pm 0.56_{ stat} \pm 0.18_{syst})\times 10^{-3}$ are also measured. These results show no evidence of $C\!P$ violation and represent the most sensitive search performed through the phase space of a multibody decay. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/1616 (LHCb public pages)

Report number: CERN-EP-2024-204, LHCb-PAPER-2024-019

arXiv:2409.01212 [pdf, other]

MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation

Authors: Zewen Chen, Sunhan Xu, Yun Zeng, Haochen Guo, Jian Guo, Shuai Liu, Juan Wang, Bing Li, Weiming Hu, Dehua Liu, Hesong Li

Abstract: With the rising demand for high-resolution (HR) images, No-Reference Image Quality Assessment (NR-IQA) gains more attention, as it can ecaluate image quality in real-time on mobile devices and enhance user experience. However, existing NR-IQA methods often resize or crop the HR images into small resolution, which leads to a loss of important details. And most of them are of high computational comp… ▽ More With the rising demand for high-resolution (HR) images, No-Reference Image Quality Assessment (NR-IQA) gains more attention, as it can ecaluate image quality in real-time on mobile devices and enhance user experience. However, existing NR-IQA methods often resize or crop the HR images into small resolution, which leads to a loss of important details. And most of them are of high computational complexity, which hinders their application on mobile devices due to limited computational resources. To address these challenges, we propose MobileIQA, a novel approach that utilizes lightweight backbones to efficiently assess image quality while preserving image details through high-resolution input. MobileIQA employs the proposed multi-view attention learning (MAL) module to capture diverse opinions, simulating subjective opinions provided by different annotators during the dataset annotation process. The model uses a teacher model to guide the learning of a student model through knowledge distillation. This method significantly reduces computational complexity while maintaining high performance. Experiments demonstrate that MobileIQA outperforms novel IQA methods on evaluation metrics and computational efficiency. The code is available at https://github.com/chencn2020/MobileIQA. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: Accepted by ECCV Workshop 2024

arXiv:2409.01158 [pdf]

Microwave Photonic Multi-Mode Injection-Locked Frequency Divider With a Wide Operational Range Based on an Optoelectronic Oscillator

Authors: Siyu Liu, Kaitao Lin, Weiye Hu, Zhenzhao Yi, Xinhuan Feng, Jianghai Wo, Jianping Yao

Abstract: We propose and implement a microwave photonic multi-mode injection-locked frequency divider (ILFD) with a wide frequency operational range based on an optoelectronic oscillator (OEO). In the OEO, a Mach-Zehnder modulator (MZM) and a photodetector (PD) are employed to construct a frequency multiplier to achieve an N-1 times frequency multiplication, which is then mixed with an external injection si… ▽ More We propose and implement a microwave photonic multi-mode injection-locked frequency divider (ILFD) with a wide frequency operational range based on an optoelectronic oscillator (OEO). In the OEO, a Mach-Zehnder modulator (MZM) and a photodetector (PD) are employed to construct a frequency multiplier to achieve an N-1 times frequency multiplication, which is then mixed with an external injection signal at an electrical mixer in the OEO loop. By adjusting the round-trip gain and time delay of the OEO loop, a radio frequency (RF) signal with a frequency that is 1/N that of the injection signal is generated, thus N times frequency division is achieved. Theoretical analysis and experimental verification are conducted to evaluate the effectiveness of the proposed ILFD. The results demonstrate that the system can divide a RF signal from 2.6 to 20.8 GHz to 1.3 to 1.95 GHz with different frequency division factors ranging from 2 to 13. A significant improvement in phase noise of 35.11 dB is also obtained at a frequency offset of 100 kHz when the frequency division factor is 13. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2408.16646 [pdf, other]

Study of the rare decay $J/ψ\to μ^+μ^-μ^+μ^-$

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1096 additional authors not shown)

Abstract: The rare electromagnetic $J/ψ\to μ^+μ^-μ^+μ^-$ decay is observed with a significance greatly exceeding the discovery threshold, using proton-proton collision data collected by the LHCb experiment during 2016-2018 at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of $5.4\,\text{fb}^{-1}$. The rate of this decay is measured relative to that of the $J/ψ\to μ^+μ^-$ mode.… ▽ More The rare electromagnetic $J/ψ\to μ^+μ^-μ^+μ^-$ decay is observed with a significance greatly exceeding the discovery threshold, using proton-proton collision data collected by the LHCb experiment during 2016-2018 at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of $5.4\,\text{fb}^{-1}$. The rate of this decay is measured relative to that of the $J/ψ\to μ^+μ^-$ mode. Using the QED model for the four-muon decay in the efficiency estimation, its branching fraction is determined to be \begin{equation*} {\mathcal{B}}(J/ψ\to μ^+μ^-μ^+μ^-) = (1.13\pm0.10\pm0.05\pm0.01)\times 10^{-6}, \end{equation*} where the uncertainties are statistical, systematic and due to the uncertainty on the branching fraction of the $J/ψ\to μ^+μ^-$ decay. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3453 (LHCb public pages)

Report number: LHCb-PAPER-2024-016, CERN-EP-2024-201

arXiv:2408.14713 [pdf, other]

StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech

Authors: Haowei Lou, Helen Paik, Wen Hu, Lina Yao

Abstract: This paper introduces StyleSpeech, a novel Text-to-Speech~(TTS) system that enhances the naturalness and accuracy of synthesized speech. Building upon existing TTS technologies, StyleSpeech incorporates a unique Style Decorator structure that enables deep learning models to simultaneously learn style and phoneme features, improving adaptability and efficiency through the principles of Lower Rank A… ▽ More This paper introduces StyleSpeech, a novel Text-to-Speech~(TTS) system that enhances the naturalness and accuracy of synthesized speech. Building upon existing TTS technologies, StyleSpeech incorporates a unique Style Decorator structure that enables deep learning models to simultaneously learn style and phoneme features, improving adaptability and efficiency through the principles of Lower Rank Adaptation~(LoRA). LoRA allows efficient adaptation of style features in pre-trained models. Additionally, we introduce a novel automatic evaluation metric, the LLM-Guided Mean Opinion Score (LLM-MOS), which employs large language models to offer an objective and robust protocol for automatically assessing TTS system performance. Extensive testing on benchmark datasets shows that our approach markedly outperforms existing state-of-the-art baseline methods in producing natural, accurate, and high-quality speech. These advancements not only pushes the boundaries of current TTS system capabilities, but also facilitate the application of TTS system in more dynamic and specialized, such as interactive virtual assistants, adaptive audiobooks, and customized voice for gaming. Speech samples can be found in https://style-speech.vercel.app △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.12374 [pdf]

Doping-free Janus homojunction solar cell with efficiency exceeding 23%

Authors: Lei Li, Zi-Xuan Yang, Tao Huang, Hui Wan, Wu-Yu Chen, Tao Zhang, Gui-Fang Huang, Wangyu Hu, Wei-Qing Huang

Abstract: Photovoltaic solar cell is one of the main renewable energy sources, and its power conversion efficiency (PCE) is improved by employing doping or heterojunction to reduce the photogenerated carrier recombination. Here, we propose a doping-free homojunction solar cell utilizing two-dimensional Janus semiconductors to achieve high PCE. Thanks to the intrinsic dipole of Janus structure, doping-free J… ▽ More Photovoltaic solar cell is one of the main renewable energy sources, and its power conversion efficiency (PCE) is improved by employing doping or heterojunction to reduce the photogenerated carrier recombination. Here, we propose a doping-free homojunction solar cell utilizing two-dimensional Janus semiconductors to achieve high PCE. Thanks to the intrinsic dipole of Janus structure, doping-free Janus homojunction has naturally not only a type-II band alignment to promote the photoexciton dissociation, but also a smaller effective bandgap to enhance light absorption. More importantly, the intrinsic electric field across the Janus structure will drive photoinduced electron and hole transfer from the interface to the opposite transport layers respectively, significantly enhancing the efficiency of carrier separation and transport. We illustrate the concept in titanium-based Janus monolayer homojunction, where the theoretically observed PCE reaches 23.22% of TiSSe homojunction. Our work opens a novel avenue to design low-cost, high-efficiency solar cells. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 16 pages, 5 figures,

arXiv:2408.11850 [pdf, other]

Parallel Speculative Decoding with Adaptive Draft Length

Authors: Tianyu Liu, Yun Li, Qitan Lv, Kai Liu, Jianchen Zhu, Winston Hu

Abstract: Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice… ▽ More Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice versa. This problem is directly incurred by the asynchronous execution of the draft model and the target model, and is exacerbated due to the fixed draft length in speculative decoding. To address these challenges, we propose a conceptually simple, flexible, and general framework to boost speculative decoding, namely \textbf{P}arallel sp\textbf{E}culative decoding with \textbf{A}daptive d\textbf{R}aft \textbf{L}ength (PEARL). Specifically, PEARL proposes \textit{pre-verify} to verify the first draft token in advance during the drafting phase, and \textit{post-verify} to generate more draft tokens during the verification phase. PEARL parallels the drafting phase and the verification phase via applying the two strategies, and achieves adaptive draft length for different scenarios, which effectively alleviates the mutual waiting problem. Moreover, we theoretically demonstrate that the mean accepted tokens of PEARL is more than existing \textit{draft-then-verify} works. Experiments on various text generation benchmarks demonstrate the effectiveness of our \name, leading to a superior speedup performance up to \textbf{3.79$\times$} and \textbf{1.52$\times$}, compared to auto-regressive decoding and vanilla speculative decoding, respectively. △ Less

Submitted 4 September, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.11774 [pdf, other]

Testing Gravity with Realistic Gravitational Waveforms in Pulsar Timing Arrays

Authors: Wayne Hu, Qiuyue Liang, Meng-Xiang Lin, Mark Trodden

Abstract: We consider the effects of relaxing the assumption that gravitational waves composing the stochastic gravitational wave background (SGWB) are uncorrelated between frequencies in analyses of the data from Pulsar Timing Arrays (PTAs). While individual monochromatic plane waves are often a good approximation, a background composed of astrophysical sources cannot be monochromatic since an infinite pla… ▽ More We consider the effects of relaxing the assumption that gravitational waves composing the stochastic gravitational wave background (SGWB) are uncorrelated between frequencies in analyses of the data from Pulsar Timing Arrays (PTAs). While individual monochromatic plane waves are often a good approximation, a background composed of astrophysical sources cannot be monochromatic since an infinite plane wave carries no signal. We consider how relaxing this assumption allows us to extract potential information about modified dispersion relations and other fundamental physics questions, as both the group and phase velocity of waves become relevant. After developing the formalism we carry out simple Gaussian wavepacket examples and then consider more realistic waveforms, such as that from binary inspirals. When the frequency evolves only slowly across the PTA temporal baseline, the monochromatic assumption at an effective mean frequency remains a good approximation and we provide scaling relations that characterize its accuracy. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 30 pages, 6 figures

arXiv:2408.09526 [pdf, other]

Fine-gained air quality inference based on low-quality sensing data using self-supervised learning

Authors: Meng Xu, Ke Han, Weijian Hu, Wen Ji

Abstract: Fine-grained air quality (AQ) mapping is made possible by the proliferation of cheap AQ micro-stations (MSs). However, their measurements are often inaccurate and sensitive to local disturbances, in contrast to standardized stations (SSs) that provide accurate readings but fall short in number. To simultaneously address the issues of low data quality (MSs) and high label sparsity (SSs), a multi-ta… ▽ More Fine-grained air quality (AQ) mapping is made possible by the proliferation of cheap AQ micro-stations (MSs). However, their measurements are often inaccurate and sensitive to local disturbances, in contrast to standardized stations (SSs) that provide accurate readings but fall short in number. To simultaneously address the issues of low data quality (MSs) and high label sparsity (SSs), a multi-task spatio-temporal network (MTSTN) is proposed, which employs self-supervised learning to utilize massive unlabeled data, aided by seasonal and trend decomposition of MS data offering reliable information as features. The MTSTN is applied to infer NO$_2$, O$_3$ and PM$_{2.5}$ concentrations in a 250 km$^2$ area in Chengdu, China, at a resolution of 500m$\times$500m$\times$1hr. Data from 55 SSs and 323 MSs were used, along with meteorological, traffic, geographic and timestamp data as features. The MTSTN excels in accuracy compared to several benchmarks, and its performance is greatly enhanced by utilizing low-quality MS data. A series of ablation and pressure tests demonstrate the results' robustness and interpretability, showcasing the MTSTN's practical value for accurate and affordable AQ inference. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 17 pages

arXiv:2408.09106 [pdf, other]

Fragment-Masked Molecular Optimization

Authors: Kun Li, Xiantao Cai, Jia Wu, Bo Du, Wenbin Hu

Abstract: Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many target-based molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures… ▽ More Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many target-based molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures or their hypothesized roles in combating diseases. However, challenges such as a limited number of available targets and a difficulty capturing clear structures hinder innovative drug development. In contrast, phenotypic drug discovery (PDD) does not depend on clear target structures and can identify hits with novel and unbiased polypharmacology signatures. As a result, PDD-based molecular optimization can reduce potential safety risks while optimizing phenotypic activity, thereby increasing the likelihood of clinical success. Therefore, we propose a fragment-masked molecular optimization method based on PDD (FMOP). FMOP employs a regression-free diffusion model to conditionally optimize the molecular masked regions without training, effectively generating new molecules with similar scaffolds. On the large-scale drug response dataset GDSCv2, we optimize the potential molecules across all 945 cell lines. The overall experiments demonstrate that the in-silico optimization success rate reaches 94.4%, with an average efficacy increase of 5.3%. Additionally, we conduct extensive ablation and visualization experiments, confirming that FMOP is an effective and robust molecular optimization method. The code is available at:https://anonymous.4open.science/r/FMOP-98C2. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: 11 pages, 5 figures, 2 tables

Showing 1–50 of 1,768 results for author: Hu, W