subscribe to arXiv mailings

The impact of the local stellar radiation on the formation and evolution of dwarfs in and near Milky Way analogue

Abstract: We explore the effect of local stellar radiation on the formation and evolution of the dwarf galaxies near the Milk Way(MW) analogues. Using five simulations from the Auriga project, both with and without local stellar radiation, we find that the local stellar radiation, as a pre-reionization source, is quite effective to photoionize and heat the gas around the proto-MW analogues. As a result, the… ▽ More We explore the effect of local stellar radiation on the formation and evolution of the dwarf galaxies near the Milk Way(MW) analogues. Using five simulations from the Auriga project, both with and without local stellar radiation, we find that the local stellar radiation, as a pre-reionization source, is quite effective to photoionize and heat the gas around the proto-MW analogues. As a result, the formation of surrounding dwarf galaxies in dark matter halos with halo masses below approximately $10^{9.5}\,\mathrm{M_{\odot}}$ are significantly suppressed. After the reionization, the intensity of the local stellar radiation eventually becomes comparable to that of UVB, consequently the impact of local stellar radiation on the surrounding dwarf galaxy formation decreases with decreasing redshift, and almost vanishes after redshift $z=4$. At present day, the bright satellite population in the simulations with and without local stellar radiation is nearly identical. While our simulation have no enough resolution to resolve the fainest satellite galaxies which are most prone to the local stellar radiation, we use accreted galaxy mass function to assess the impact, and find that the reduction in the faintest satellite is around $13$ percent in case of the local stellar radiation, a factor not negligible to constrain dark matter models using the precise abundance of MW satellite galaxies. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: 9 pages, 9 figures, submit to ApJ

arXiv:2410.13720 [pdf, other]

Movie Gen: A Cast of Media Foundation Models

Authors: Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le , et al. (63 additional authors not shown)

Abstract: We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization,… ▽ More We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models. All videos from this paper are available at https://go.fb.me/MovieGenResearchVideos. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.12177 [pdf, other]

Towards Large Scale Atomic Manufacturing: Heterodyne Grating Interferometer with Zero Dead-Zone

Authors: Can Cui, Lvye Gao, Pengbo Zhao, Menghan Yang, Lifu Liu, Yu Ma, Guangyao Huang, Shengtong Wang, Linbin Luo, Xinghui Li

Abstract: This paper presents a novel heterodyne grating interferometer designed to meet the precise measurement requirements of next-generation lithography systems and large-scale atomic-level manufacturing. Utilizing a dual-frequency light source, the interferometer enables simultaneous measurement of three degrees of freedom. Key advancements include a compact zero Dead-Zone optical path configuration, s… ▽ More This paper presents a novel heterodyne grating interferometer designed to meet the precise measurement requirements of next-generation lithography systems and large-scale atomic-level manufacturing. Utilizing a dual-frequency light source, the interferometer enables simultaneous measurement of three degrees of freedom. Key advancements include a compact zero Dead-Zone optical path configuration, significantly enhancing measurement reliability by mitigating the impact of light source fluctuations and air refractive index variations. A comprehensive crosstalk error analysis was conducted, resulting in a robust correction algorithm that reduces errors to below 5%. Performance testing of the prototype, size of 90mm*90mm*40mm, demonstrated exceptional resolution (0.25 nm in the XY-axis and 0.3 nm in the Z-axis), superior linearity (6.9e-5, 8.1e-5 and 16.2e-5 for the X, Y, and Z axes, respectively), high repeatability (0.8 nm/1000 nm for the three axes) and stability (20 nm for the XY-axis and 60 nm for the Z-axis over 1000 seconds). Comparative analysis with existing measurement sensors highlights the proposed method's significant advantages in integration, multidimensional capabilities, and is expected to be widely used in fields such as integrated circuits, atomic-level manufacturing and aerospace technology. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: 8 pages,11 figures

arXiv:2410.09141 [pdf, other]

ACER: Automatic Language Model Context Extension via Retrieval

Authors: Luyu Gao, Yunyi Zhang, Jamie Callan

Abstract: Long-context modeling is one of the critical capabilities of language AI for digesting and reasoning over complex information pieces. In practice, long-context capabilities are typically built into a pre-trained language model~(LM) through a carefully designed context extension stage, with the goal of producing generalist long-context capabilities. In our preliminary experiments, however, we disco… ▽ More Long-context modeling is one of the critical capabilities of language AI for digesting and reasoning over complex information pieces. In practice, long-context capabilities are typically built into a pre-trained language model~(LM) through a carefully designed context extension stage, with the goal of producing generalist long-context capabilities. In our preliminary experiments, however, we discovered that the current open-weight generalist long-context models are still lacking in practical long-context processing tasks. While this means perfectly effective long-context modeling demands task-specific data, the cost can be prohibitive. In this paper, we draw inspiration from how humans process a large body of information: a lossy \textbf{retrieval} stage ranks a large set of documents while the reader ends up reading deeply only the top candidates. We build an \textbf{automatic} data synthesis pipeline that mimics this process using short-context LMs. The short-context LMs are further tuned using these self-generated data to obtain task-specific long-context capabilities. Similar to how pre-training learns from imperfect data, we hypothesize and further demonstrate that the short-context model can bootstrap over the synthetic data, outperforming not only long-context generalist models but also the retrieval and read pipeline used to synthesize the training data in real-world tasks such as long-context retrieval augmented generation. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.07658 [pdf, other]

SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors

Authors: Xiao Cai, Pengpeng Zeng, Lianli Gao, Junchen Zhu, Jiaxin Zhang, Sitong Su, Heng Tao Shen, Jingkuan Song

Abstract: Recent advancements in generic 3D content generation from text prompts have been remarkable by fine-tuning text-to-image diffusion (T2I) models or employing these T2I models as priors to learn a general text-to-3D model. While fine-tuning-based methods ensure great alignment between text and generated views, i.e., semantic consistency, their ability to achieve multi-view consistency is hampered by… ▽ More Recent advancements in generic 3D content generation from text prompts have been remarkable by fine-tuning text-to-image diffusion (T2I) models or employing these T2I models as priors to learn a general text-to-3D model. While fine-tuning-based methods ensure great alignment between text and generated views, i.e., semantic consistency, their ability to achieve multi-view consistency is hampered by the absence of 3D constraints, even in limited view. In contrast, prior-based methods focus on regressing 3D shapes with any view that maintains uniformity and coherence across views, i.e., multi-view consistency, but such approaches inevitably compromise visual-textual alignment, leading to a loss of semantic details in the generated objects. To achieve semantic and multi-view consistency simultaneously, we propose SeMv-3D, a novel framework for general text-to-3d generation. Specifically, we propose a Triplane Prior Learner (TPL) that learns triplane priors with 3D spatial features to maintain consistency among different views at the 3D level, e.g., geometry and texture. Moreover, we design a Semantic-aligned View Synthesizer (SVS) that preserves the alignment between 3D spatial features and textual semantics in latent space. In SVS, we devise a simple yet effective batch sampling and rendering strategy that can generate arbitrary views in a single feed-forward inference. Extensive experiments present our SeMv-3D's superiority over state-of-the-art performances with semantic and multi-view consistency in any view. Our code and more visual results are available at https://anonymous.4open.science/r/SeMv-3D-6425. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.04425 [pdf, other]

LHAASO detection of very-high-energy gamma-ray emission surrounding PSR J0248+6021

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with… ▽ More We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with 7.3 $σ$ and 13.5 $σ$, respectively. The best-fit position derived through WCDA data is R.A. = 42.06$^\circ \pm$ 0.12$^\circ$ and Dec. = 60.24$^\circ \pm $ 0.13$^\circ$ with an extension of 0.69$^\circ\pm$0.15$^\circ$ and that of the KM2A data is R.A.= 42.29$^\circ \pm $ 0.13$^\circ$ and Dec. = 60.38$^\circ \pm$ 0.07$^\circ$ with an extension of 0.37$^\circ\pm$0.07$^\circ$. No clear extended multiwavelength counterpart of this LHAASO source has been found from the radio band to the GeV band. The most plausible explanation of the VHE \gray emission is the inverse Compton process of highly relativistic electrons and positrons injected by the pulsar. These electrons/positrons are hypothesized to be either confined within the pulsar wind nebula or to have already escaped into the interstellar medium, forming a pulsar halo. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Comments: 12 pages, 10 figures, Accepted by Sci. China-Phys. Mech. Astron

arXiv:2410.02138 [pdf, other]

Study of magnetic reconnection at low-$β$ using laser-powered capacitor coils

Authors: H. Ji, L. Gao, G. Pomraning, K. Sakai, F. Guo, X. Li, A. Stanier, A. Milder, R. F. Follett, G. Fiksel, E. G. Blackman, A. Chien, S. Zhang

Abstract: Magnetic reconnection is a ubiquitous fundamental process in space and astrophysical plasmas that rapidly converts magnetic energy into some combination of flow energy, thermal energy, and non-thermal energetic particles. Over the past decade, a new experimental platform has been developed to study magnetic reconnection using strong coil currents powered by high power lasers at low plasma beta, ty… ▽ More Magnetic reconnection is a ubiquitous fundamental process in space and astrophysical plasmas that rapidly converts magnetic energy into some combination of flow energy, thermal energy, and non-thermal energetic particles. Over the past decade, a new experimental platform has been developed to study magnetic reconnection using strong coil currents powered by high power lasers at low plasma beta, typical conditions under which reconnection is energetically important in astrophysics. KJ-class lasers were used to drive parallel currents to reconnect MG-level magnetic fields in a quasi-axisymmetric geometry, similar to the Magnetic Reconnection Experiment or MRX, and thus this platform is named micro-MRX. This presentation summarizes two major findings from micro-MRX: direct measurement of accelerated electrons and observation of ion acoustic waves during anti-parallel reconnection. The angular dependence of the measured electron energy spectrum and the resulting accelerated energies, supported by particle-in-cell simulations, indicate that direct acceleration by the out-of-plane reconnection electric field is at work. Furthermore, a sudden onset of ion acoustic bursts has been measured by collective Thomson scattering in the exhaust of magnetic reconnection, followed by electron acoustic bursts with electron heating and bulk acceleration. These results demonstrate that the micro-MRX platform offers a novel and unique approach to study magnetic reconnection in the laboratory in addition to the capabilities provided by traditional magnetized plasma experiments such as MRX and the upcoming FLARE (Facility for Laboratory Reconnection experiments). Future approaches to study other particle acceleration mechanisms and ion acoustic waves from magnetic reconnection are also discussed. △ Less

Submitted 2 October, 2024; originally announced October 2024.

Comments: 16 pages, 13 figures, 89 references, accepted for publication in Physics of Plasmas

arXiv:2410.01944 [pdf, other]

One-step Noisy Label Mitigation

Authors: Hao Li, Jiayang Gu, Jingkuan Song, An Zhang, Lianli Gao

Abstract: Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise mitigation methods often encounter limitations in practical applications due to their task-specific design, model dependency, and significant computati… ▽ More Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise mitigation methods often encounter limitations in practical applications due to their task-specific design, model dependency, and significant computational overhead. In this work, we exploit the properties of high-dimensional orthogonality to identify a robust and effective boundary in cone space for separating clean and noisy samples. Building on this, we propose One-step Anti-Noise (OSA), a model-agnostic noisy label mitigation paradigm that employs an estimator model and a scoring function to assess the noise level of input pairs through just one-step inference, a cost-efficient process. We empirically demonstrate the superiority of OSA, highlighting its enhanced training robustness, improved task transferability, ease of deployment, and reduced computational costs across various benchmarks, models, and tasks. Our code is released at https://github.com/leolee99/OSA. △ Less

Submitted 2 October, 2024; originally announced October 2024.

Comments: 20 pages, 4 figures, 11 Tables

arXiv:2410.00776 [pdf, other]

Large photo-induced tuning of ferroelectricity in sliding ferroelectrics

Authors: Lingyuan Gao, Laurent Bellaiche

Abstract: Stacking nonpolar, monolayer materials has emerged as an effective strategy to harvest ferroelectricity in two-dimensional (2D) van de Waals (vdW) materials. At a particular stacking sequence, interlayer charge transfer allows for the generation of out-of-plane dipole components, and the polarization magnitude and direction can be altered by an interlayer sliding. In this work, we use {\it ab init… ▽ More Stacking nonpolar, monolayer materials has emerged as an effective strategy to harvest ferroelectricity in two-dimensional (2D) van de Waals (vdW) materials. At a particular stacking sequence, interlayer charge transfer allows for the generation of out-of-plane dipole components, and the polarization magnitude and direction can be altered by an interlayer sliding. In this work, we use {\it ab initio} calculations and demonstrate that in prototype sliding ferroelectrics 3R-stacked bilayer transition metal dichalcogenides MoS$_2$, the out-of-plane electric polarization can be robustly tuned by photoexcitation in a large range for a given sliding. Such tuning is associated with both a structural origin -- i.e., photoinduced structural distortion, and a charge origin -- namely, the distribution of photoexcited carriers. We elucidate different roles that photoexcitation plays in modulating sliding ferroelectricity under different light intensities, and we highlight the pivotal role of light in manipulating polarization of 2D vdW materials. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2409.19720 [pdf, other]

FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image Classification

Authors: Kexue Fu, Xiaoyuan Luo, Linhao Qu, Shuo Wang, Ying Xiong, Ilias Maglogiannis, Longxiang Gao, Manning Wang

Abstract: The expensive fine-grained annotation and data scarcity have become the primary obstacles for the widespread adoption of deep learning-based Whole Slide Images (WSI) classification algorithms in clinical practice. Unlike few-shot learning methods in natural images that can leverage the labels of each image, existing few-shot WSI classification methods only utilize a small number of fine-grained la… ▽ More The expensive fine-grained annotation and data scarcity have become the primary obstacles for the widespread adoption of deep learning-based Whole Slide Images (WSI) classification algorithms in clinical practice. Unlike few-shot learning methods in natural images that can leverage the labels of each image, existing few-shot WSI classification methods only utilize a small number of fine-grained labels or weakly supervised slide labels for training in order to avoid expensive fine-grained annotation. They lack sufficient mining of available WSIs, severely limiting WSI classification performance. To address the above issues, we propose a novel and efficient dual-tier few-shot learning paradigm for WSI classification, named FAST. FAST consists of a dual-level annotation strategy and a dual-branch classification framework. Firstly, to avoid expensive fine-grained annotation, we collect a very small number of WSIs at the slide level, and annotate an extremely small number of patches. Then, to fully mining the available WSIs, we use all the patches and available patch labels to build a cache branch, which utilizes the labeled patches to learn the labels of unlabeled patches and through knowledge retrieval for patch classification. In addition to the cache branch, we also construct a prior branch that includes learnable prompt vectors, using the text encoder of visual-language models for patch classification. Finally, we integrate the results from both branches to achieve WSI classification. Extensive experiments on binary and multi-class datasets demonstrate that our proposed method significantly surpasses existing few-shot classification methods and approaches the accuracy of fully supervised methods with only 0.22$\%$ annotation costs. All codes and models will be publicly available on https://github.com/fukexue/FAST. △ Less

Submitted 29 September, 2024; originally announced September 2024.

Comments: Accepted to NeurIPS 2024

arXiv:2409.16202 [pdf, other]

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data

Authors: Qian-Wen Zhang, Haochen Wang, Fang Li, Siyu An, Lingfeng Qiao, Liangcai Gao, Di Yin, Xing Sun

Abstract: Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure. With the further enhancement of this transformation, the advent of Large Language Models (LLMs) has elevated the intelligence levels of these platforms. However, current academic benchmarks provide limited guidance for real-world industry scenarios… ▽ More Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure. With the further enhancement of this transformation, the advent of Large Language Models (LLMs) has elevated the intelligence levels of these platforms. However, current academic benchmarks provide limited guidance for real-world industry scenarios. This limitation arises because educational applications require more than mere test question responses. To bridge this gap, we introduce CJEval, a benchmark based on Chinese Junior High School Exam Evaluations. CJEval consists of 26,136 samples across four application-level educational tasks covering ten subjects. These samples include not only questions and answers but also detailed annotations such as question types, difficulty levels, knowledge concepts, and answer explanations. By utilizing this benchmark, we assessed LLMs' potential applications and conducted a comprehensive analysis of their performance by fine-tuning on various educational tasks. Extensive experiments and discussions have highlighted the opportunities and challenges of applying LLMs in the field of education. △ Less

Submitted 24 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.15520 [pdf, other]

Enabling Resource-Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines

Authors: Lei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram

Abstract: Large Language Models (LLMs) have demonstrated exceptional performance in automating various tasks, such as text generation and summarization. Currently LLMs are trained and fine-tuned on large cloud server. Deploying and fine-tuning these models on resource-constrained edge devices remains a significant challenge due to their substantial memory and computational requirements. This paper introduce… ▽ More Large Language Models (LLMs) have demonstrated exceptional performance in automating various tasks, such as text generation and summarization. Currently LLMs are trained and fine-tuned on large cloud server. Deploying and fine-tuning these models on resource-constrained edge devices remains a significant challenge due to their substantial memory and computational requirements. This paper introduces a resource-efficient zeroth-order optimization approach that lowers the barriers for fine-tuning LLMs in such constrained environments. Our method features a parallelized randomized gradient estimation (P-RGE) technique, which performs gradient estimation with high parallel efficiency. P-RGE leverages outer-loop and inner-loop parallelization to perform multiple function queries and forward passes in parallel, reducing the wall-clock end-to-end training time. By integrating this technique with parameter-efficient fine-tuning methods (e.g., LoRA) and on-device inference engines (e.g., ExecuTorch), we demonstrate efficient fine-tuning of LLMs on both server-side and edge devices. Experiments show that P-RGE achieves significant runtime speedups and memory savings while maintaining fine-tuning accuracy, which paves the way for more practical deployment of LLMs in real-time, on-device applications. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.15149 [pdf, other]

Joint State-Channel Decoupling and One-Shot Quantum Coding Theorem

Authors: Hao-Chung Cheng, Frédéric Dupuis, Li Gao

Abstract: In this work, we consider decoupling a bipartite quantum state via a general quantum channel. We propose a joint state-channel decoupling approach to obtain a one-shot error exponent bound without smoothing, in which trace distance is used to measure how good the decoupling is. The established exponent is expressed in terms of a sum of two sandwiched R{é}nyi entropies, one quantifying the amount o… ▽ More In this work, we consider decoupling a bipartite quantum state via a general quantum channel. We propose a joint state-channel decoupling approach to obtain a one-shot error exponent bound without smoothing, in which trace distance is used to measure how good the decoupling is. The established exponent is expressed in terms of a sum of two sandwiched R{é}nyi entropies, one quantifying the amount of initial correlation between the state and environment, while the other characterizing the effectiveness of the quantum channel. This gives an explicit exponential decay of the decoupling error in the whole achievable region, which was missing in the previous results [Commun. Math. Phys. 328, 2014]. Moreover, it strengthens the error exponent bound obtained in a recent work [IEEE Trans. Inf. Theory, 69(12), 2023], for exponent from the channel part. As an application, we establish a one-shot error exponent bound for quantum channel coding given by a sandwiched Rényi coherent information. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 25 pages, 2 figures. Presented in QIP 2023. Comments are very welcome

arXiv:2409.14337 [pdf, other]

MobileViews: A Large-Scale Mobile GUI Dataset

Authors: Longxi Gao, Li Zhang, Shihe Wang, Shangguang Wang, Yuanchun Li, Mengwei Xu

Abstract: Mobile screen assistants help smartphone users by interpreting mobile screens and responding to user requests. The excessive private information on mobile screens necessitates small, on-device models to power these assistants. However, there is a lack of a comprehensive and large-scale mobile screen dataset with high diversity to train and enhance these models. To efficiently construct such a data… ▽ More Mobile screen assistants help smartphone users by interpreting mobile screens and responding to user requests. The excessive private information on mobile screens necessitates small, on-device models to power these assistants. However, there is a lack of a comprehensive and large-scale mobile screen dataset with high diversity to train and enhance these models. To efficiently construct such a dataset, we utilize an LLM-enhanced automatic app traversal tool to minimize human intervention. We then employ two SoC clusters to provide high-fidelity mobile environments, including more than 200 Android instances to parallelize app interactions. By utilizing the system to collect mobile screens over 81,600 device-hours, we introduce MobileViews, the largest mobile screen dataset, which includes over 600K screenshot-view hierarchy pairs from more than 20K modern Android apps. We demonstrate the effectiveness of MobileViews by training SOTA multimodal LLMs that power mobile screen assistants on it and the Rico dataset, which was introduced seven years ago. Evaluation results on mobile screen tasks show that the scale and quality of mobile screens in MobileViews demonstrate significant advantages over Rico in augmenting mobile screen assistants. △ Less

Submitted 26 September, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

Comments: Dataset: https://huggingface.co/datasets/mllmTeam/MobileViews

arXiv:2409.12929 [pdf, other]

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

Authors: Jin Jiang, Yuchen Yan, Yang Liu, Yonggang Jin, Shuai Peng, Mengdi Zhang, Xunliang Cai, Yixin Cao, Liangcai Gao, Zhi Tang

Abstract: In this paper, we present a novel approach, called LogicPro, to enhance Large Language Models (LLMs) complex Logical reasoning through Program Examples. We do this effectively by simply utilizing widely available algorithmic problems and their code solutions. First, we constructed diverse test samples input based on algorithmic questions and code solutions. Then, we designed different complex reas… ▽ More In this paper, we present a novel approach, called LogicPro, to enhance Large Language Models (LLMs) complex Logical reasoning through Program Examples. We do this effectively by simply utilizing widely available algorithmic problems and their code solutions. First, we constructed diverse test samples input based on algorithmic questions and code solutions. Then, we designed different complex reasoning questions based on algorithmic problems and test samples. Finally, combining the intermediate variable outputs of the code solutions and the complex reasoning questions, we derived the reasoning process and the final answer. With this approach, we can construct a dataset that is sufficiently difficult (all models are ineffective), diverse (synthesized from 2,360 different algorithmic questions), and scalable (building different test samples and collecting more algorithmic questions). In addition, we obtain a high-quality reasoning process guided by the values of intermediate variables. As a result, our approach achieves significant improvements in multiple models for the BBH$^{27}$, GSM8K, HellSwag, Logicqa, Reclor, and RTE datasets, outperforming a wide range of existing reasoning datasets. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.11497 [pdf, other]

Decomposing Gaussians with Unknown Covariance

Authors: Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Jacob Bien, Daniela Witten

Abstract: Common workflows in machine learning and statistics rely on the ability to partition the information in a data set into independent portions. Recent work has shown that this may be possible even when conventional sample splitting is not (e.g., when the number of samples $n=1$, or when observations are not independent and identically distributed). However, the approaches that are currently availabl… ▽ More Common workflows in machine learning and statistics rely on the ability to partition the information in a data set into independent portions. Recent work has shown that this may be possible even when conventional sample splitting is not (e.g., when the number of samples $n=1$, or when observations are not independent and identically distributed). However, the approaches that are currently available to decompose multivariate Gaussian data require knowledge of the covariance matrix. In many important problems (such as in spatial or longitudinal data analysis, and graphical modeling), the covariance matrix may be unknown and even of primary interest. Thus, in this work we develop new approaches to decompose Gaussians with unknown covariance. First, we present a general algorithm that encompasses all previous decomposition approaches for Gaussian data as special cases, and can further handle the case of an unknown covariance. It yields a new and more flexible alternative to sample splitting when $n>1$. When $n=1$, we prove that it is impossible to partition the information in a multivariate Gaussian into independent portions without knowing the covariance matrix. Thus, we use the general algorithm to decompose a single multivariate Gaussian with unknown covariance into dependent parts with tractable conditional distributions, and demonstrate their use for inference and validation. The proposed decomposition strategy extends naturally to Gaussian processes. In simulation and on electroencephalography data, we apply these decompositions to the tasks of model selection and post-selection inference in settings where alternative strategies are unavailable. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.11273 [pdf, ps, other]

Several families of entanglement criteria for multipartite quantum systems based on generalized Wigner-Yanase skew information and variance

Authors: Yan Hong, Xinlan Hao, Limin Gao

Abstract: Quantum entanglement plays a critical role in many quantum applications, but detecting entanglement, especially in multipartite or high-dimensional quantum systems, remains a challenge. In this paper, we propose several families of entanglement criteria for detecting entanglement in multipartite or high-dimensional quantum states by the generalized Wigner-Yanase skew information $I^s(ρ,X)$ for… ▽ More Quantum entanglement plays a critical role in many quantum applications, but detecting entanglement, especially in multipartite or high-dimensional quantum systems, remains a challenge. In this paper, we propose several families of entanglement criteria for detecting entanglement in multipartite or high-dimensional quantum states by the generalized Wigner-Yanase skew information $I^s(ρ,X)$ for $-1\leq s\leq0$ and variance. We also reveal a complementary character between the criteria based on the generalized Wigner-Yanase skew information and an alternative one based on variance through specific examples. We illustrate the merits of these criteria and show that the combination of the entanglement criteria has a stronger detection capability, as it is capable of detecting entangled states that remain unrecognized by other criteria. △ Less

Submitted 12 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

Comments: 15 pages

arXiv:2409.11198 [pdf, ps, other]

Quantifying nonclassical correlation via the generalized Wigner-Yanase skew information

Authors: Yan Hong, Xinlan Hao, Limin Gao

Abstract: Nonclassical correlation is an important concept in quantum information theory, referring to a special type of correlation that exists between quantum systems, which surpasses the scope of classical physics. In this paper, we introduce the concept of a family of information with important properties, namely the generalized Wigner-Yanase skew information, of which the famous quantum Fisher informat… ▽ More Nonclassical correlation is an important concept in quantum information theory, referring to a special type of correlation that exists between quantum systems, which surpasses the scope of classical physics. In this paper, we introduce the concept of a family of information with important properties, namely the generalized Wigner-Yanase skew information, of which the famous quantum Fisher information and Wigner-Yanase skew information are special cases.We classify the local observables in the generalized Wigner-Yanase skew information into two categories (i.e., orthonormal bases and a Hermitian operator with a fixed nondegenerate spectrum), and based on this, we propose two different forms of indicators to quantify nonclassical correlations of bipartite quantum states. We have not only investigated some important properties of these two kinds of indicators but also illustrated through specific examples that they can indeed capture some nonclassical correlations. Furthermore, we find that these two types of indicators reduce to entanglement measure for bipartite pure states. Specifically, we also derive the relationship between these two indicators and the entanglement measure $I$-concurrence. △ Less

Submitted 17 September, 2024; originally announced September 2024.

Comments: 10 pages

arXiv:2409.11088 [pdf, other]

Prospects for detecting cosmic filaments in Lyman-alpha emission across redshifts $z=2-5$

Authors: Yizhou Liu, Liang Gao, Shihong Liao, Kai Zhu

Abstract: The standard $\rm Λ$CDM cosmological model predicts that a large amount of diffuse neutral hydrogen distributes in cosmic filaments, which could be mapped through Lyman-alpha (Ly$α$) emission observations. We use the hydrodynamical simulation Illustris-TNG50 to investigate the evolution of surface brightness and detectability of neutral hydrogen in cosmic filaments across redshifts $z=2-5$. While… ▽ More The standard $\rm Λ$CDM cosmological model predicts that a large amount of diffuse neutral hydrogen distributes in cosmic filaments, which could be mapped through Lyman-alpha (Ly$α$) emission observations. We use the hydrodynamical simulation Illustris-TNG50 to investigate the evolution of surface brightness and detectability of neutral hydrogen in cosmic filaments across redshifts $z=2-5$. While the HI column density of cosmic filaments decreases with redshift, due to the rising temperature with cosmic time in filaments, the surface brightness of Ly$α$ emission in filaments is brighter at lower redshifts, suggesting that the detection of cosmic filaments is more feasible at lower redshifts. However, most of the Ly$α$ emission from cosmic filaments is around $10^{-21}$ $\rm erg\ s^{-1}cm^{-2}arsec^{-2}$, making it extremely challenging to detect with current observational instruments. We further generate mock images using the Multi-Unit Spectroscopic Explorer (MUSE) spectrograph installed on both the Very Large Telescope (VLT) and the upcoming Extremely Large Telescope (ELT). Our finding indicates that while the VLT can only detect filamentary structures made of dense gas in galactic centers, the ELT is expected to reveal much finer filamentary structures from diffuse neutral hydrogen outside of galaxies. Compared to the VLT, both the number density and the longest length of filaments are greatly boosted with the ELT. Hence the forthcoming ELT is highly promising to provide a clearer view of cosmic filaments in Ly$α$ emission. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.10259 [pdf, other]

Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings

Authors: Xi Wang, Xin Liu, Songming Zhu, Zhanwen Li, Lina Gao

Abstract: The recent emergence of Distributed Acoustic Sensing (DAS) technology has facilitated the effective capture of traffic-induced seismic data. The traffic-induced seismic wave is a prominent contributor to urban vibrations and contain crucial information to advance urban exploration and governance. However, identifying vehicular movements within massive noisy data poses a significant challenge. In t… ▽ More The recent emergence of Distributed Acoustic Sensing (DAS) technology has facilitated the effective capture of traffic-induced seismic data. The traffic-induced seismic wave is a prominent contributor to urban vibrations and contain crucial information to advance urban exploration and governance. However, identifying vehicular movements within massive noisy data poses a significant challenge. In this study, we introduce a real-time semi-supervised vehicle monitoring framework tailored to urban settings. It requires only a small fraction of manual labels for initial training and exploits unlabeled data for model improvement. Additionally, the framework can autonomously adapt to newly collected unlabeled data. Before DAS data undergo object detection as two-dimensional images to preserve spatial information, we leveraged comprehensive one-dimensional signal preprocessing to mitigate noise. Furthermore, we propose a novel prior loss that incorporates the shapes of vehicular traces to track a single vehicle with varying speeds. To evaluate our model, we conducted experiments with seismic data from the Stanford 2 DAS Array. The results showed that our model outperformed the baseline model Efficient Teacher and its supervised counterpart, YOLO (You Only Look Once), in both accuracy and robustness. With only 35 labeled images, our model surpassed YOLO's mAP 0.5:0.95 criterion by 18% and showed a 7% increase over Efficient Teacher. We conducted comparative experiments with multiple update strategies for self-updating and identified an optimal approach. This approach surpasses the performance of non-overfitting training conducted with all data in a single pass. △ Less

Submitted 16 September, 2024; originally announced September 2024.

arXiv:2409.08811 [pdf, other]

Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task

Authors: Shao Zhang, Xihuai Wang, Wenhao Zhang, Yongshan Chen, Landi Gao, Dakuo Wang, Weinan Zhang, Xinbing Wang, Ying Wen

Abstract: Theory of Mind (ToM) significantly impacts human collaboration and communication as a crucial capability to understand others. When AI agents with ToM capability collaborate with humans, Mutual Theory of Mind (MToM) arises in such human-AI teams (HATs). The MToM process, which involves interactive communication and ToM-based strategy adjustment, affects the team's performance and collaboration pro… ▽ More Theory of Mind (ToM) significantly impacts human collaboration and communication as a crucial capability to understand others. When AI agents with ToM capability collaborate with humans, Mutual Theory of Mind (MToM) arises in such human-AI teams (HATs). The MToM process, which involves interactive communication and ToM-based strategy adjustment, affects the team's performance and collaboration process. To explore the MToM process, we conducted a mixed-design experiment using a large language model-driven AI agent with ToM and communication modules in a real-time shared-workspace task. We find that the agent's ToM capability does not significantly impact team performance but enhances human understanding of the agent and the feeling of being understood. Most participants in our study believe verbal communication increases human burden, and the results show that bidirectional communication leads to lower HAT performance. We discuss the results' implications for designing AI agents that collaborate with humans in real-time shared workspace tasks. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 34 pages, Preprint Under Review

arXiv:2409.05840 [pdf, other]

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Authors: Run Luo, Haonan Zhang, Longze Chen, Ting-En Lin, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li

Abstract: The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs capabilities through diverse architectures, the gains have become increasingly marginal. Conversely, data-driven methods, which scale up image-text instruction… ▽ More The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs capabilities through diverse architectures, the gains have become increasingly marginal. Conversely, data-driven methods, which scale up image-text instruction data, are more effective but face limited data diversity and complexity challenges. The absence of high-quality data constitutes a significant development barrier for MLLMs. To address the data quality bottleneck, we propose MMEvol, a novel multimodal instruction data evolution framework. This framework iteratively improve data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution, generating a more complex and diverse image-text instruction dataset that empowers MLLMs with enhanced capabilities. Beginning with an initial set of instructions, SEED-163K, we utilize MMEvol to systematically broaden the diversity of instruction types, extend visual reasoning steps to improve cognitive reasoning abilities, and thoroughly explore fine-grained information within images to enhance visual understanding and robustness. To comprehensively evaluate the effectiveness of our approach, we conduct extensive qualitative analysis and quantitative experiments across 13 vision-language tasks. Compared to baseline models trained with the initial seed data, the results demonstrate that our method achieves an average accuracy improvement of 3.1 percentage points. Furthermore, our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models. △ Less

Submitted 19 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.03069 [pdf, other]

Discussion of "Data fission: splitting a single data point"

Authors: Anna Neufeld, Ameer Dharamshi, Lucy L. Gao, Daniela Witten, Jacob Bien

Abstract: Leiner et al. [2023] introduce an important generalization of sample splitting, which they call data fission. They consider two cases of data fission: P1 fission and P2 fission. While P1 fission is extremely useful and easy to use, Leiner et al. [2023] provide P1 fission operations only for the Gaussian and the Poisson distributions. They provide little guidance on how to apply P2 fission operatio… ▽ More Leiner et al. [2023] introduce an important generalization of sample splitting, which they call data fission. They consider two cases of data fission: P1 fission and P2 fission. While P1 fission is extremely useful and easy to use, Leiner et al. [2023] provide P1 fission operations only for the Gaussian and the Poisson distributions. They provide little guidance on how to apply P2 fission operations in practice, leaving the reader unsure of how to apply data fission outside of the Gaussian and Poisson settings. In this discussion, we describe how our own work provides P1 fission operations in a wide variety of families and offers insight into when P1 fission is possible. We also provide guidance on how to actually apply P2 fission in practice, with a special focus on logistic regression. Finally, we interpret P2 fission as a remedy for distributional misspecification when carrying out P1 fission operations. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 18 pages, 1 figure

arXiv:2409.00224 [pdf, ps, other]

Geometric influences on quantum Boolean cubes

Authors: David P. Blecher, Li Gao, Bang Xu

Abstract: In this work, we study three problems related to the $L_1$-influence on quantum Boolean cubes. In the first place, we obtain a dimension free bound for $L_1$-influence, which implies the quantum $L^1$-KKL Theorem result obtained by Rouze, Wirth and Zhang. Beyond that, we also obtain a high order quantum Talagrand inequality and quantum $L^1$-KKL theorem. Lastly, we prove a quantitative relation be… ▽ More In this work, we study three problems related to the $L_1$-influence on quantum Boolean cubes. In the first place, we obtain a dimension free bound for $L_1$-influence, which implies the quantum $L^1$-KKL Theorem result obtained by Rouze, Wirth and Zhang. Beyond that, we also obtain a high order quantum Talagrand inequality and quantum $L^1$-KKL theorem. Lastly, we prove a quantitative relation between the noise stability and $L^1$-influence. To this end, our technique involves the random restrictions method as well as semigroup theory. △ Less

Submitted 30 August, 2024; originally announced September 2024.

Comments: 36 pages

arXiv:2409.00147 [pdf, other]

MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models

Authors: Shuai Peng, Di Fu, Liangcai Gao, Xiuqin Zhong, Hongguang Fu, Zhi Tang

Abstract: The rapid development of large language models (LLMs) has spurred extensive research into their domain-specific capabilities, particularly mathematical reasoning. However, most open-source LLMs focus solely on mathematical reasoning, neglecting the integration with visual injection, despite the fact that many mathematical tasks rely on visual inputs such as geometric diagrams, charts, and function… ▽ More The rapid development of large language models (LLMs) has spurred extensive research into their domain-specific capabilities, particularly mathematical reasoning. However, most open-source LLMs focus solely on mathematical reasoning, neglecting the integration with visual injection, despite the fact that many mathematical tasks rely on visual inputs such as geometric diagrams, charts, and function plots. To fill this gap, we introduce \textbf{MultiMath-7B}, a multimodal large language model that bridges the gap between math and vision. \textbf{MultiMath-7B} is trained through a four-stage process, focusing on vision-language alignment, visual and math instruction-tuning, and process-supervised reinforcement learning. We also construct a novel, diverse and comprehensive multimodal mathematical dataset, \textbf{MultiMath-300K}, which spans K-12 levels with image captions and step-wise solutions. MultiMath-7B achieves state-of-the-art (SOTA) performance among open-source models on existing multimodal mathematical benchmarks and also excels on text-only mathematical benchmarks. Our model and dataset are available at {\textcolor{blue}{\url{https://github.com/pengshuai-rin/MultiMath}}}. △ Less

Submitted 30 August, 2024; originally announced September 2024.

arXiv:2408.17062 [pdf, other]

Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer

Authors: Shuai Peng, Di Fu, Baole Wei, Yong Cao, Liangcai Gao, Zhi Tang

Abstract: Despite the remarkable success of Vision Transformers (ViTs) in various visual tasks, they are often hindered by substantial computational cost. In this work, we introduce Vote\&Mix (\textbf{VoMix}), a plug-and-play and parameter-free token reduction method, which can be readily applied to off-the-shelf ViT models \textit{without any training}. VoMix tackles the computational redundancy of ViTs by… ▽ More Despite the remarkable success of Vision Transformers (ViTs) in various visual tasks, they are often hindered by substantial computational cost. In this work, we introduce Vote\&Mix (\textbf{VoMix}), a plug-and-play and parameter-free token reduction method, which can be readily applied to off-the-shelf ViT models \textit{without any training}. VoMix tackles the computational redundancy of ViTs by identifying tokens with high homogeneity through a layer-wise token similarity voting mechanism. Subsequently, the selected tokens are mixed into the retained set, thereby preserving visual information. Experiments demonstrate VoMix significantly improves the speed-accuracy tradeoff of ViTs on both images and videos. Without any training, VoMix achieves a 2$\times$ increase in throughput of existing ViT-H on ImageNet-1K and a 2.4$\times$ increase in throughput of existing ViT-L on Kinetics-400 video dataset, with a mere 0.3\% drop in top-1 accuracy. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.16229 [pdf, ps, other]

Upgrading the existing Haloscope-type detector for sensitive axion detection

Authors: L. Gao, H. Zheng, X. N. Feng, L. B. Zhao, L. F. Wei

Abstract: Haloscope is one of the typical installations to detect the electromagnetic responses (EMRs) of axion field in radio-frequency (rf) band. Given what the detection by the existing Haloscope-type detector (HTD) biased only by a high stationary magnetic field, is just the second axion-photon energy converted effect and thus the detectable signal is still significantly weak, here we propose a feasible… ▽ More Haloscope is one of the typical installations to detect the electromagnetic responses (EMRs) of axion field in radio-frequency (rf) band. Given what the detection by the existing Haloscope-type detector (HTD) biased only by a high stationary magnetic field, is just the second axion-photon energy converted effect and thus the detectable signal is still significantly weak, here we propose a feasible approach to upgrade the existing HTD by additionally applying a transverse rf modulated magnetic field for generating the first-order axion-photon energy converted signal. Accordingly, we argue that the detection sensitivity of the upgrading HTD (UHTD) could be enhanced feasibly by a few orders of magnitude, compared with those achieved by the existing HTDs. The feasibility of the proposed UHTD is also discussed. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 22 pages,3 figures

arXiv:2408.15650 [pdf, other]

Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings

Authors: Lingyu Gao

Abstract: Text classification is crucial for applications such as sentiment analysis and toxic text filtering, but it still faces challenges due to the complexity and ambiguity of natural language. Recent advancements in deep learning, particularly transformer architectures and large-scale pretraining, have achieved inspiring success in NLP fields. Building on these advancements, this thesis explores three… ▽ More Text classification is crucial for applications such as sentiment analysis and toxic text filtering, but it still faces challenges due to the complexity and ambiguity of natural language. Recent advancements in deep learning, particularly transformer architectures and large-scale pretraining, have achieved inspiring success in NLP fields. Building on these advancements, this thesis explores three challenging settings in text classification by leveraging the intrinsic knowledge of pretrained language models (PLMs). Firstly, to address the challenge of selecting misleading yet incorrect distractors for cloze questions, we develop models that utilize features based on contextualized word representations from PLMs, achieving performance that rivals or surpasses human accuracy. Secondly, to enhance model generalization to unseen labels, we create small finetuning datasets with domain-independent task label descriptions, improving model performance and robustness. Lastly, we tackle the sensitivity of large language models to in-context learning prompts by selecting effective demonstrations, focusing on misclassified examples and resolving model ambiguity regarding test example labels. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: PhD thesis

arXiv:2408.13289 [pdf]

doi 10.1109/TSTE.2024.3449909

Optimal Dispatch Strategy for a Multi-microgrid Cooperative Alliance Using a Two-Stage Pricing Mechanism

Authors: Yonghui Nie, Zhi Li, Jie Zhang, Lei Gao, Yang Li, Hengyu Zhou

Abstract: To coordinate resources among multi-level stakeholders and enhance the integration of electric vehicles (EVs) into multi-microgrids, this study proposes an optimal dispatch strategy within a multi-microgrid cooperative alliance using a nuanced two-stage pricing mechanism. Initially, the strategy assesses electric energy interactions between microgrids and distribution networks to establish a found… ▽ More To coordinate resources among multi-level stakeholders and enhance the integration of electric vehicles (EVs) into multi-microgrids, this study proposes an optimal dispatch strategy within a multi-microgrid cooperative alliance using a nuanced two-stage pricing mechanism. Initially, the strategy assesses electric energy interactions between microgrids and distribution networks to establish a foundation for collaborative scheduling. The two-stage pricing mechanism initiates with a leader-follower game, wherein the microgrid operator acts as the leader and users as followers. Subsequently, it adjusts EV tariffs based on the game's equilibrium, taking into account factors such as battery degradation and travel needs to optimize EVs' electricity consumption. Furthermore, a bi-level optimization model refines power interactions and pricing strategies across the network, significantly enhancing demand response capabilities and economic outcomes. Simulation results demonstrate that this strategy not only increases renewable energy consumption but also reduces energy costs, thereby improving the overall efficiency and sustainability of the system. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: Accepted by IEEE Transactions on Sustainable Energy, Paper no. TSTE-00122-2024

arXiv:2408.13280 [pdf, other]

Estimation of the pseudoscalar glueball mass based on a modified Transformer

Authors: Lin Gao

Abstract: A modified Transformer model is introduced for estimating the mass of pseudoscalar glueball in lattice QCD. The model takes as input a sequence of floating-point numbers with lengths ranging from 30 to 35 and produces a two-dimensional vector output. It integrates floating-point embeddings and positional encoding, and is trained using binary cross-entropy loss. The paper provides a detailed descri… ▽ More A modified Transformer model is introduced for estimating the mass of pseudoscalar glueball in lattice QCD. The model takes as input a sequence of floating-point numbers with lengths ranging from 30 to 35 and produces a two-dimensional vector output. It integrates floating-point embeddings and positional encoding, and is trained using binary cross-entropy loss. The paper provides a detailed description of the model's components and training methods, and compares the performance of the traditional least squares method, the previously used deep neural network, and the modified Transformer in mass estimation. The results show that the modified Transformer model achieves greater accuracy in mass estimation than the traditional least squares method. Additionally, compared to the deep neural network, this model utilizes positional encoding and can handle input sequences of varying lengths, offering enhanced adaptability. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.11900 [pdf, other]

Quantum highway: Observation of minimal and maximal speed limits for few and many-body states

Authors: Zitian Zhu, Lei Gao, Zehang Bao, Liang Xiang, Zixuan Song, Shibo Xu, Ke Wang, Jiachen Chen, Feitong Jin, Xuhao Zhu, Yu Gao, Yaozu Wu, Chuanyu Zhang, Ning Wang, Yiren Zou, Ziqi Tan, Aosai Zhang, Zhengyi Cui, Fanhao Shen, Jiarun Zhong, Tingting Li, Jinfeng Deng, Xu Zhang, Hang Dong, Pengfei Zhang , et al. (8 additional authors not shown)

Abstract: Tracking the time evolution of a quantum state allows one to verify the thermalization rate or the propagation speed of correlations in generic quantum systems. Inspired by the energy-time uncertainty principle, bounds have been demonstrated on the maximal speed at which a quantum state can change, resulting in immediate and practical tasks. Based on a programmable superconducting quantum processo… ▽ More Tracking the time evolution of a quantum state allows one to verify the thermalization rate or the propagation speed of correlations in generic quantum systems. Inspired by the energy-time uncertainty principle, bounds have been demonstrated on the maximal speed at which a quantum state can change, resulting in immediate and practical tasks. Based on a programmable superconducting quantum processor, we test the dynamics of various emulated quantum mechanical systems encompassing single- and many-body states. We show that one can test the known quantum speed limits and that modifying a single Hamiltonian parameter allows the observation of the crossover of the different bounds on the dynamics. We also unveil the observation of minimal quantum speed limits in addition to more common maximal ones, i.e., the lowest rate of change of a unitarily evolved quantum state. Our results establish a comprehensive experimental characterization of quantum speed limits and pave the way for their subsequent study in engineered non-unitary conditions. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 9 pages,4 figures + supplementary information

arXiv:2408.11490 [pdf, other]

DocTabQA: Answering Questions from Long Documents Using Tables

Authors: Haochen Wang, Kai Hu, Haoyu Dong, Liangcai Gao

Abstract: We study a new problem setting of question answering (QA), referred to as DocTabQA. Within this setting, given a long document, the goal is to respond to questions by organizing the answers into structured tables derived directly from the document's content. Unlike traditional QA approaches which predominantly rely on unstructured text to formulate responses, DocTabQA aims to leverage structured t… ▽ More We study a new problem setting of question answering (QA), referred to as DocTabQA. Within this setting, given a long document, the goal is to respond to questions by organizing the answers into structured tables derived directly from the document's content. Unlike traditional QA approaches which predominantly rely on unstructured text to formulate responses, DocTabQA aims to leverage structured tables as answers to convey information clearly and systematically, thereby enhancing user comprehension and highlighting relationships between data points. To the best of our knowledge, this problem has not been previously explored. In this paper, we introduce the QTabA dataset, encompassing 300 financial documents, accompanied by manually annotated 1.5k question-table pairs. Initially, we leverage Large Language Models (LLMs) such as GPT-4 to establish a baseline. However, it is widely acknowledged that LLMs encounter difficulties when tasked with generating intricate, structured outputs from long input sequences. To overcome these challenges, we present a two-stage framework, called DocTabTalk, which initially retrieves relevant sentences from extensive documents and subsequently generates hierarchical tables based on these identified sentences. DocTabTalk incorporates two key technological innovations: AlignLLaMA and TabTalk, which are specifically tailored to assist GPT-4 in tackling DocTabQA, enabling it to generate well-structured, hierarchical tables with improved organization and clarity. Comprehensive experimental evaluations conducted on both QTabA and RotoWire datasets demonstrate that our DocTabTalk significantly enhances the performances of the GPT-4 in our proposed DocTabQA task and the table generation task. The code and dataset are available at https://github.com/SmileWHC/DocTabQA for further research. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 18 pages,5 figures

arXiv:2408.10607 [pdf]

doi 10.1038/s41377-024-01606-y

Generation of squeezed vacuum state in the millihertz frequency band

Authors: Li Gao, Li-ang Zheng, Bo Lu, Shaoping Shi, Long Tian, Yaohui Zheng

Abstract: The detection of gravitational waves has ushered in a new era of observing the universe. Quantum resource advantages offer significant enhancements to the sensitivity of gravitational wave observatories. While squeezed states for ground-based gravitational wave detection have received marked attention, the generation of squeezed states suitable for mid-to-low-frequency detection has remained unexp… ▽ More The detection of gravitational waves has ushered in a new era of observing the universe. Quantum resource advantages offer significant enhancements to the sensitivity of gravitational wave observatories. While squeezed states for ground-based gravitational wave detection have received marked attention, the generation of squeezed states suitable for mid-to-low-frequency detection has remained unexplored. To address the gap in squeezed state optical fields at ultra-low frequencies, we report on the first direct observation of a squeezed vacuum field until Fourier frequency of 4 millihertz with the quantum noise reduction of up to 8 dB, by the employment of a multiple noise suppression scheme. Our work provides quantum resources for future gravitational wave observatories, facilitating the development of quantum precision measurement. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 17 pages,4 figures,1 table and 50 Refs. To be published in Light: Science & Applications

Journal ref: Light: Science & Applications 13, 294 (2024)

arXiv:2408.09784 [pdf, other]

Apostle--Auriga: Effects of stellar feedback subgrid models on the evolution of angular momentum in disc galaxies

Authors: Hang Yang, Shihong Liao, Azadeh Fattahi, Carlos S. Frenk, Liang Gao, Qi Guo, Shi Shao, Lan Wang, Ruby J. Wright, Guangquan Zeng

Abstract: Utilizing the Apostle--Auriga simulations, which start from the same zoom-in initial conditions of Local Group-like systems but run with different galaxy formation subgrid models and hydrodynamic solvers, we study the impact of stellar feedback models on the evolution of angular momentum in disc galaxies. At $z = 0$, Auriga disc galaxies tend to exhibit higher specific angular momenta compared to… ▽ More Utilizing the Apostle--Auriga simulations, which start from the same zoom-in initial conditions of Local Group-like systems but run with different galaxy formation subgrid models and hydrodynamic solvers, we study the impact of stellar feedback models on the evolution of angular momentum in disc galaxies. At $z = 0$, Auriga disc galaxies tend to exhibit higher specific angular momenta compared to their cross-matched Apostle counterparts. By tracing the evolution history of the Lagrangian mass tracers of the in-situ star particles in the $z = 0$ galaxies, we find that the specific angular momentum distributions of the gas tracers from the two simulations at the halo accretion time are relatively similar. The present-day angular momentum difference is mainly driven by the physical processes occurring inside dark matter haloes, especially galactic fountains. Due to the different subgrid implementations of stellar feedback processes, Auriga galaxies contain a high fraction of gas that has gone through recycled fountain (${\sim} 65$ per cent) which could acquire angular momentum through mixing with the high angular momentum circumgalactic medium (CGM). In Apostle, however, the fraction of gas that has undergone the recycled fountain process is significantly lower (down to ${\sim} 20$ per cent for Milky Way-sized galaxies) and the angular momentum acquisition from the CGM is marginal. As a result, the present-day Auriga galaxies overall have higher specific angular momenta. △ Less

Submitted 19 October, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: after revised, accepted for publication in MNRAS

arXiv:2408.08623 [pdf, other]

SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis

Authors: Xingyue Lin, Xingjian Hu, Shuai Peng, Jianhua Zhu, Liangcai Gao

Abstract: Sketch, a powerful artistic technique to capture essential visual information about real-world objects, is increasingly gaining attention in the image synthesis field. However, evaluating the quality of synthesized sketches presents unique unsolved challenges. Current evaluation methods for sketch synthesis are inadequate due to the lack of a unified benchmark dataset, over-reliance on classificat… ▽ More Sketch, a powerful artistic technique to capture essential visual information about real-world objects, is increasingly gaining attention in the image synthesis field. However, evaluating the quality of synthesized sketches presents unique unsolved challenges. Current evaluation methods for sketch synthesis are inadequate due to the lack of a unified benchmark dataset, over-reliance on classification accuracy for recognizability, and unfair evaluation of sketches with different levels of simplification. To address these issues, we introduce SketchRef, a benchmark dataset comprising 4 categories of reference photos--animals, human faces, human bodies, and common objects--alongside novel evaluation metrics. Considering that classification accuracy is insufficient to measure the structural consistency between a sketch and its reference photo, we propose the mean Object Keypoint Similarity (mOKS) metric, utilizing pose estimation to assess structure-level recognizability. To ensure fair evaluation sketches with different simplification levels, we propose a recognizability calculation method constrained by simplicity. We also collect 8K responses from art enthusiasts, validating the effectiveness of our proposed evaluation methods. We hope this work can provide a comprehensive evaluation of sketch synthesis algorithms, thereby aligning their performance more closely with human understanding. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.08578 [pdf, other]

TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition

Authors: Jianhua Zhu, Wenqi Zhao, Yu Li, Xingjian Hu, Liangcai Gao

Abstract: Handwritten Mathematical Expression Recognition (HMER) has extensive applications in automated grading and office automation. However, existing sequence-based decoding methods, which directly predict $\LaTeX$ sequences, struggle to understand and model the inherent tree structure of $\LaTeX$ and often fail to ensure syntactic correctness in the decoded results. To address these challenges, we prop… ▽ More Handwritten Mathematical Expression Recognition (HMER) has extensive applications in automated grading and office automation. However, existing sequence-based decoding methods, which directly predict $\LaTeX$ sequences, struggle to understand and model the inherent tree structure of $\LaTeX$ and often fail to ensure syntactic correctness in the decoded results. To address these challenges, we propose a novel model named TAMER (Tree-Aware Transformer) for handwritten mathematical expression recognition. TAMER introduces an innovative Tree-aware Module while maintaining the flexibility and efficient training of Transformer. TAMER combines the advantages of both sequence decoding and tree decoding models by jointly optimizing sequence prediction and tree structure prediction tasks, which enhances the model's understanding and generalization of complex mathematical expression structures. During inference, TAMER employs a Tree Structure Prediction Scoring Mechanism to improve the structural validity of the generated $\LaTeX$ sequences. Experimental results on CROHME datasets demonstrate that TAMER outperforms traditional sequence decoding and tree decoding models, especially in handling complex mathematical structures, achieving state-of-the-art (SOTA) performance. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.05400 [pdf, other]

doi 10.1093/mnras/stae1926

Assembly History and Internal Structure of Cluster Cold Dark Matter Haloes

Authors: Qingxiang Chen, Shihong Liao, Jie Wang, Liang Gao

Abstract: We use the Phoenix simulations to study the mass assembly history and internal structures of cluster dark matter haloes ($M_{200} \gtrsim 5\times 10^{14} h^{-1}{\rm M}_\odot$). We confirm that cluster haloes grow inside-out, similar to galactic haloes. Major merger events dominate the growth of the internal region and minor mergers/diffuse accretion shape the outskirts. However, compared to galact… ▽ More We use the Phoenix simulations to study the mass assembly history and internal structures of cluster dark matter haloes ($M_{200} \gtrsim 5\times 10^{14} h^{-1}{\rm M}_\odot$). We confirm that cluster haloes grow inside-out, similar to galactic haloes. Major merger events dominate the growth of the internal region and minor mergers/diffuse accretion shape the outskirts. However, compared to galactic haloes, cluster haloes tend to have a younger and more actively evolving inner region. On average, the majority of mass (> 80%) in the inner region ($R< 0.1 r_{200}$) of Phoenix haloes is accreted after $z = 3$, while for galactic haloes, most mass in the central region has already been accreted before $z=6$. The density profiles of cluster haloes are less stable than those of galactic haloes over different radii. The enclosed mass within $50$ or $150$ kpc of all Phoenix haloes evolves substantially in the past ${\sim} 7$ Gyr, while galactic haloes remained stable during the same period. We suggest that the relatively younger and more active state explains the various observations of cluster haloes, especially in central regions. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 12 pages, 11 figures, accepted for publication in MNRAS

arXiv:2408.02900 [pdf, other]

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Authors: Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou

Abstract: This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. These enriched annotations encompass both global textual information, such as disease/lesion type, modality, region-specific descriptions, and inter-regional relationships, as well as deta… ▽ More This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. These enriched annotations encompass both global textual information, such as disease/lesion type, modality, region-specific descriptions, and inter-regional relationships, as well as detailed local annotations for regions of interest (ROIs), including bounding boxes, segmentation masks. Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline that scales up multimodal data by generating multigranular visual and texual annotations (in the form of image-ROI-description triplets) without the need for any paired text descriptions. Specifically, data from over 90 different sources have been collected, preprocessed, and grounded using domain-specific expert models to identify ROIs related to abnormal regions. We then build a comprehensive knowledge base and prompt multimodal large language models to perform retrieval-augmented generation with the identified ROIs as guidance, resulting in multigranular texual descriptions. Compared to existing datasets, MedTrinity-25M provides the most enriched annotations, supporting a comprehensive range of multimodal tasks such as captioning and report generation, as well as vision-centric tasks like classification and segmentation. Pretraining on MedTrinity-25M, our model achieves state-of-the-art performance on VQA-RAD and PathVQA, surpassing both multimodal large language models and other representative SoTA approaches. This dataset can also be utilized to support large-scale pre-training of multimodal medical AI models, contributing to the development of future foundation models in the medical domain. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: The project page is at https://yunfeixie233.github.io/MedTrinity-25M

arXiv:2407.20570 [pdf, other]

Fine-Tuned Large Language Model for Visualization System: A Study on Self-Regulated Learning in Education

Authors: Lin Gao, Jing Lu, Zekai Shao, Ziyue Lin, Shengbin Yue, Chiokit Ieong, Yi Sun, Rory James Zauner, Zhongyu Wei, Siming Chen

Abstract: Large Language Models (LLMs) have shown great potential in intelligent visualization systems, especially for domain-specific applications. Integrating LLMs into visualization systems presents challenges, and we categorize these challenges into three alignments: domain problems with LLMs, visualization with LLMs, and interaction with LLMs. To achieve these alignments, we propose a framework and out… ▽ More Large Language Models (LLMs) have shown great potential in intelligent visualization systems, especially for domain-specific applications. Integrating LLMs into visualization systems presents challenges, and we categorize these challenges into three alignments: domain problems with LLMs, visualization with LLMs, and interaction with LLMs. To achieve these alignments, we propose a framework and outline a workflow to guide the application of fine-tuned LLMs to enhance visual interactions for domain-specific tasks. These alignment challenges are critical in education because of the need for an intelligent visualization system to support beginners' self-regulated learning. Therefore, we apply the framework to education and introduce Tailor-Mind, an interactive visualization system designed to facilitate self-regulated learning for artificial intelligence beginners. Drawing on insights from a preliminary study, we identify self-regulated learning tasks and fine-tuning objectives to guide visualization design and tuning data construction. Our focus on aligning visualization with fine-tuned LLM makes Tailor-Mind more like a personalized tutor. Tailor-Mind also supports interactive recommendations to help beginners better achieve their learning goals. Model performance evaluations and user studies confirm that Tailor-Mind improves the self-regulated learning experience, effectively validating the proposed framework. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.20118 [pdf, other]

Impact of Parameters in the Blazar Jet Magnetic Field Model on Axion-Like Particle Constraints

Authors: Lin-Qing Gao, Xiao-Jun Bi, Jun Li, Peng-Fei Yin

Abstract: The interaction between axion-like particles (ALPs) and photons induces ALP-photon oscillations in astrophysical magnetic fields, leading to spectral distortions in the $γ$-ray spectrum of blazars. The primary uncertainty of this phenomenon may originate from the magnetic field within the jet of the blazar. While many studies have explored the effects of ALP-photon oscillations using typical value… ▽ More The interaction between axion-like particles (ALPs) and photons induces ALP-photon oscillations in astrophysical magnetic fields, leading to spectral distortions in the $γ$-ray spectrum of blazars. The primary uncertainty of this phenomenon may originate from the magnetic field within the jet of the blazar. While many studies have explored the effects of ALP-photon oscillations using typical values for jet magnetic field parameters, it is important to recognize that these parameters can be constrained by multi-wavelength observations. In this study, we utilize the high energy $γ$-ray spectrum of Mrk 421 obtained from MAGIC and Fermi-LAT observations. By employing multi-wavelength fitting with a one-zone synchrotron self-Compton model, we derive the parameters characterizing the magnetic field model within the jet, and investigate their impacts on the ALP constraints. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 9 pages, 7 figures

arXiv:2407.19838 [pdf, other]

RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching

Authors: Letian Gao, Zhi John Lu

Abstract: RNA plays a crucial role in diverse life processes. In contrast to the rapid advancement of protein design methods, the work related to RNA is more demanding. Most current RNA design approaches concentrate on specified target attributes and rely on extensive experimental searches. However, these methods remain costly and inefficient due to practical limitations. In this paper, we characterize all… ▽ More RNA plays a crucial role in diverse life processes. In contrast to the rapid advancement of protein design methods, the work related to RNA is more demanding. Most current RNA design approaches concentrate on specified target attributes and rely on extensive experimental searches. However, these methods remain costly and inefficient due to practical limitations. In this paper, we characterize all sequence design issues as conditional generation tasks and offer parameterized representations for multiple problems. For these problems, we have developed a universal RNA sequence generation model based on flow matching, namely RNACG. RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs as per their requirements and integrate it into the generation network. We evaluated RNACG in RNA 3D structure inverse folding, 2D structure inverse folding, family-specific sequence generation, and 5'UTR translation efficiency prediction. RNACG attains superior or competitive performance on these tasks compared with other methods. RNACG exhibits extensive applicability in sequence generation and property prediction tasks, providing a novel approach to RNA sequence design and potential methods for simulation experiments with large-scale RNA sequence data. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.17617 [pdf, other]

Adaptive Robot Detumbling of a Non-Rigid Satellite

Authors: Longsen Gao, Claus Danielson, Rafael Fierro

Abstract: The challenge of satellite stabilization, particularly those with uncertain flexible dynamics, has become a pressing concern in control and robotics. These uncertainties, especially the dynamics of a third-party client satellite, significantly complicate the stabilization task. This paper introduces a novel adaptive detumbling method to handle non-rigid satellites with unknown motion dynamics (tra… ▽ More The challenge of satellite stabilization, particularly those with uncertain flexible dynamics, has become a pressing concern in control and robotics. These uncertainties, especially the dynamics of a third-party client satellite, significantly complicate the stabilization task. This paper introduces a novel adaptive detumbling method to handle non-rigid satellites with unknown motion dynamics (translation and rotation). The distinctive feature of our approach is that we model the non-rigid tumbling satellite as a two-link serial chain with unknown stiffness and damping in contrast to previous detumbling research works which consider the satellite a rigid body. We develop a novel adaptive robotics approach to detumble the satellite by using two space tugs as servicer despite the uncertain dynamics in the post-capture case. Notably, the stiffness properties and other physical parameters, including the mass and inertia of the two links, remain unknown to the servicer. Our proposed method addresses the challenges in detumbling tasks and paves the way for advanced manipulation of non-rigid satellites with uncertain dynamics. △ Less

Submitted 15 September, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

Comments: This paper has been accepted by the 63rd IEEE Conference on Decision and Control(CDC2024) as a regular paper

arXiv:2407.15435 [pdf, other]

Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures

Authors: Ruizhe Wang, Chunliang Hua, Tomakayev Shingys, Mengyuan Niu, Qingxin Yang, Lizhong Gao, Yi Zheng, Junyan Yang, Qiao Wang

Abstract: The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream t… ▽ More The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream technology in 3D reconstruction. Its only input is a set of images but it relies heavily on geometric parameters computed by the SfM process. At the same time, there is an existing abundance of raw 3D models, that could inform the structural perception of certain buildings but cannot be applied. In this paper, we propose a straightforward method to harness these raw 3D models to guide 3D Gaussians in capturing the basic shape of the building and improve the visual quality of textures and details when photos are captured non-systematically. This exploration opens up new possibilities for improving the effectiveness of 3D reconstruction techniques in the field of architectural design. △ Less

Submitted 25 September, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15225 [pdf]

An electro-optically tunable arrayed waveguide grating fabricated on thin film lithium niobate

Authors: Zhe Wang, 1 Zhiwei Fang, Yiran Zhu, Jian Liu, Lang Gao, Jianping Yu, Haisu Zhang, Min Wang, Ya Cheng

Abstract: We design and fabricate an 8-channel thin film lithium niobate (TFLN) arrayed-waveguide grating (AWG) and demonstrate the electro-optical tunability of the device. The monolithically integrated microelectrodes are designed for waveguides phase modulation and wavelength tunning. Experiments show that the fabricated electro-optically controlled TFLN AWG has a channel spacing of 200 GHz and a wavelen… ▽ More We design and fabricate an 8-channel thin film lithium niobate (TFLN) arrayed-waveguide grating (AWG) and demonstrate the electro-optical tunability of the device. The monolithically integrated microelectrodes are designed for waveguides phase modulation and wavelength tunning. Experiments show that the fabricated electro-optically controlled TFLN AWG has a channel spacing of 200 GHz and a wavelength tuning efficiency of 10 pm/V. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.13168 [pdf, other]

SciCode: A Research Coding Benchmark Curated by Scientists

Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we created a scientist-curated coding benchmark, SciCode. The problems in SciCode naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems. It offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards becoming helpful scientific assistants and sheds light on the development and evaluation of scientific AI in the future. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 25 pages, 9 figures, 7 tables

arXiv:2407.12684 [pdf, other]

4Dynamic: Text-to-4D Generation with Hybrid Priors

Authors: Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao

Abstract: Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges… ▽ More Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges, including lack of realism and insufficient dynamic motions. In this paper, we propose a novel method for text-to-4D generation, which ensures the dynamic amplitude and authenticity through direct supervision provided by a video prior. Specifically, we adopt a text-to-video diffusion model to generate a reference video and divide 4D generation into two stages: static generation and dynamic generation. The static 3D generation is achieved under the guidance of the input text and the first frame of the reference video, while in the dynamic generation stage, we introduce a customized SDS loss to ensure multi-view consistency, a video-based SDS loss to improve temporal consistency, and most importantly, direct priors from the reference video to ensure the quality of geometry and texture. Moreover, we design a prior-switching training strategy to avoid conflicts between different priors and fully leverage the benefits of each prior. In addition, to enrich the generated motion, we further introduce a dynamic modeling representation composed of a deformation network and a topology network, which ensures dynamic continuity while modeling topological changes. Our method not only supports text-to-4D generation but also enables 4D generation from monocular videos. The comparison experiments demonstrate the superiority of our method compared to existing methods. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12292 [pdf, other]

Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

Authors: Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song

Abstract: Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose $\bf{G}$eneralized… ▽ More Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose $\bf{G}$eneralized $\bf{A}$dversarial attac$\bf{KER}$ ($\bf{GAKer}$), which is able to construct adversarial examples to any target class. The core idea behind GAKer is to craft a latently infected representation during adversarial example generation. To this end, the extracted latent representations of the target object are first injected into intermediate features of an input image in an adversarial generator. Then, the generator is optimized to ensure visual consistency with the input image while being close to the target object in the feature space. Since the GAKer is class-agnostic yet model-agnostic, it can be regarded as a general tool that not only reveals the vulnerability of more DNNs but also identifies deficiencies of DNNs in a wider range of classes. Extensive experiments have demonstrated the effectiveness of our proposed method in generating adversarial examples for both known and unknown classes. Notably, compared with other generative methods, our method achieves an approximately $14.13\%$ higher attack success rate for unknown classes and an approximately $4.23\%$ higher success rate for known classes. Our code is available in https://github.com/VL-Group/GAKer. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.12010 [pdf, ps, other]

Study of the mass of pseudoscalar glueball with a deep neural network

Authors: Lin Gao

Abstract: A deep neural network (DNN) is utilized to study the mass of the pseudoscalar glueball in lattice QCD based on Monte Carlo simulations. To obtain an accurate and stable mass value, I constructed a new network. The results show that this DNN provides a more precise and stable mass estimate compared to the traditional least squares method. A deep neural network (DNN) is utilized to study the mass of the pseudoscalar glueball in lattice QCD based on Monte Carlo simulations. To obtain an accurate and stable mass value, I constructed a new network. The results show that this DNN provides a more precise and stable mass estimate compared to the traditional least squares method. △ Less

Submitted 9 August, 2024; v1 submitted 22 June, 2024; originally announced July 2024.

Comments: 5 figures

arXiv:2407.09289 [pdf, other]

How buildings change the fundamental allometry

Authors: Fabiano L. Ribeiro, Peiran Zhang, Liang Gao, Diego Rybski

Abstract: We demonstrate that the original fundamental allometry alone cannot accurately describe the relationship between urban area and population size. Instead, building height is a third factor that interplays with area and population. To illustrate this, we propose a straightforward model based on the idea that city area is the result of people's desire to live close to one another while also having su… ▽ More We demonstrate that the original fundamental allometry alone cannot accurately describe the relationship between urban area and population size. Instead, building height is a third factor that interplays with area and population. To illustrate this, we propose a straightforward model based on the idea that city area is the result of people's desire to live close to one another while also having sufficient living space. This leads to a more general form of fundamental allometry (relating area, population, and building height). Our argument is supported by empirical data from different countries. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08084 [pdf, other]

Decentralized Adaptive Aerospace Transportation of Unknown Loads Using A Team of Robots

Authors: Longsen Gao, Kevin Aubert, David Saldana, Claus Danielson, Rafael Fierro

Abstract: Transportation missions in aerospace are limited to the capability of each aerospace robot and the properties of the target transported object, such as mass, inertia, and grasping locations. We present a novel decentralized adaptive controller design for multiple robots that can be implemented in different kinds of aerospace robots. Our controller adapts to unknown objects in different gravity env… ▽ More Transportation missions in aerospace are limited to the capability of each aerospace robot and the properties of the target transported object, such as mass, inertia, and grasping locations. We present a novel decentralized adaptive controller design for multiple robots that can be implemented in different kinds of aerospace robots. Our controller adapts to unknown objects in different gravity environments. We validate our method in an aerial scenario using multiple fully actuated hexarotors with grasping capabilities, and a space scenario using a group of space tugs. In both scenarios, the robots transport a payload cooperatively through desired three-dimensional trajectories. We show that our method can adapt to unexpected changes that include the loss of robots during the transportation mission. △ Less

Submitted 30 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: This paper has been accepted by DARS2024 Conference. The permission for the preprint version on Arxiv has been approved through the DARS2024 Committee and Springer Press

Showing 1–50 of 1,155 results for author: Gao, L