-
Discrete empirical interpolation in the tensor t-product framework
Authors:
Sridhar Chellappa,
Lihong Feng,
Peter Benner
Abstract:
The discrete empirical interpolation method (DEIM) is a well-established approach, widely used for state reconstruction using sparse sensor/measurement data, nonlinear model reduction, and interpretable feature selection. We introduce the tensor t-product Q-DEIM (t-Q-DEIM), an extension of the DEIM framework for dealing with tensor-valued data. The proposed approach seeks to overcome one of the ke…
▽ More
The discrete empirical interpolation method (DEIM) is a well-established approach, widely used for state reconstruction using sparse sensor/measurement data, nonlinear model reduction, and interpretable feature selection. We introduce the tensor t-product Q-DEIM (t-Q-DEIM), an extension of the DEIM framework for dealing with tensor-valued data. The proposed approach seeks to overcome one of the key drawbacks of DEIM, viz., the need for matricizing the data, which can distort any structural and/or geometric information. Our method leverages the recently developed tensor t-product algebra to avoid reshaping the data. In analogy with the standard DEIM, we formulate and solve a tensor-valued least-squares problem, whose solution is achieved through an interpolatory projection. We develop a rigorous, computable upper bound for the error resulting from the t-Q-DEIM approximation. Using five different tensor-valued datasets, we numerically illustrate the better approximation properties of t-Q-DEIM and the significant computational cost reduction it offers.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Adaptive L-statistics for high dimensional test problem
Authors:
Huifang Ma,
Long Feng,
Zhaojun Wang
Abstract:
In this study, we focus on applying L-statistics to the high-dimensional one-sample location test problem. Intuitively, an L-statistic with $k$ parameters tends to perform optimally when the sparsity level of the alternative hypothesis matches $k$. We begin by deriving the limiting distributions for both L-statistics with fixed parameters and those with diverging parameters. To ensure robustness a…
▽ More
In this study, we focus on applying L-statistics to the high-dimensional one-sample location test problem. Intuitively, an L-statistic with $k$ parameters tends to perform optimally when the sparsity level of the alternative hypothesis matches $k$. We begin by deriving the limiting distributions for both L-statistics with fixed parameters and those with diverging parameters. To ensure robustness across varying sparsity levels of alternative hypotheses, we first establish the asymptotic independence between L-statistics with fixed and diverging parameters. Building on this, we propose a Cauchy combination test that integrates L-statistics with different parameters. Both simulation results and real-data applications highlight the advantages of our proposed methods.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation
Authors:
Zhengrui Guo,
Fangxu Zhou,
Wei Wu,
Qichen Sun,
Lishuang Feng,
Jinzhuo Wang,
Hao Chen
Abstract:
Modeling the nonlinear dynamics of neuronal populations represents a key pursuit in computational neuroscience. Recent research has increasingly focused on jointly modeling neural activity and behavior to unravel their interconnections. Despite significant efforts, these approaches often necessitate either intricate model designs or oversimplified assumptions. Given the frequent absence of perfect…
▽ More
Modeling the nonlinear dynamics of neuronal populations represents a key pursuit in computational neuroscience. Recent research has increasingly focused on jointly modeling neural activity and behavior to unravel their interconnections. Despite significant efforts, these approaches often necessitate either intricate model designs or oversimplified assumptions. Given the frequent absence of perfectly paired neural-behavioral datasets in real-world scenarios when deploying these models, a critical yet understudied research question emerges: how to develop a model that performs well using only neural activity as input at inference, while benefiting from the insights gained from behavioral signals during training?
To this end, we propose BLEND, the behavior-guided neural population dynamics modeling framework via privileged knowledge distillation. By considering behavior as privileged information, we train a teacher model that takes both behavior observations (privileged features) and neural activities (regular features) as inputs. A student model is then distilled using only neural activity. Unlike existing methods, our framework is model-agnostic and avoids making strong assumptions about the relationship between behavior and neural activity. This allows BLEND to enhance existing neural dynamics modeling architectures without developing specialized models from scratch. Extensive experiments across neural population activity modeling and transcriptomic neuron identity prediction tasks demonstrate strong capabilities of BLEND, reporting over 50% improvement in behavioral decoding and over 15% improvement in transcriptomic neuron identity prediction after behavior-guided distillation. Furthermore, we empirically explore various behavior-guided distillation strategies within the BLEND framework and present a comprehensive analysis of effectiveness and implications for model performance.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Data-Augmented Predictive Deep Neural Network: Enhancing the extrapolation capabilities of non-intrusive surrogate models
Authors:
Shuwen Sun,
Lihong Feng,
Peter Benner
Abstract:
Numerically solving a large parametric nonlinear dynamical system is challenging due to its high complexity and the high computational costs. In recent years, machine-learning-aided surrogates are being actively researched. However, many methods fail in accurately generalizing in the entire time interval $[0, T]$, when the training data is available only in a training time interval $[0, T_0]$, wit…
▽ More
Numerically solving a large parametric nonlinear dynamical system is challenging due to its high complexity and the high computational costs. In recent years, machine-learning-aided surrogates are being actively researched. However, many methods fail in accurately generalizing in the entire time interval $[0, T]$, when the training data is available only in a training time interval $[0, T_0]$, with $T_0<T$.
To improve the extrapolation capabilities of the surrogate models in the entire time domain, we propose a new deep learning framework, where kernel dynamic mode decomposition (KDMD) is employed to evolve the dynamics of the latent space generated by the encoder part of a convolutional autoencoder (CAE). After adding the KDMD-decoder-extrapolated data into the original data set, we train the CAE along with a feed-forward deep neural network using the augmented data. The trained network can predict future states outside the training time interval at any out-of-training parameter samples. The proposed method is tested on two numerical examples: a FitzHugh-Nagumo model and a model of incompressible flow past a cylinder. Numerical results show accurate and fast prediction performance in both the time and the parameter domain.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Shining Light on the Dark Sector: Search for Axion-like Particles and Other New Physics in Photonic Final States with FASER
Authors:
FASER collaboration,
Roshan Mammen Abraham,
Xiaocong Ai,
John Anders,
Claire Antel,
Akitaka Ariga,
Tomoko Ariga,
Jeremy Atkinson,
Florian U. Bernlochner,
Emma Bianchi,
Tobias Boeckh,
Jamie Boyd,
Lydia Brenner,
Angela Burger,
Franck Cadoux,
Roberto Cardella,
David W. Casper,
Charlotte Cavanagh,
Xin Chen,
Eunhyung Cho,
Dhruv Chouhan,
Andrea Coccaro,
Stephane Débieux,
Monica D'Onofrio,
Ansh Desai
, et al. (83 additional authors not shown)
Abstract:
The first FASER search for a light, long-lived particle decaying into a pair of photons is reported. The search uses LHC proton-proton collision data at $\sqrt{s}=13.6~\text{TeV}$ collected in 2022 and 2023, corresponding to an integrated luminosity of $57.7\text{fb}^{-1}$. A model with axion-like particles (ALPs) dominantly coupled to weak gauge bosons is the primary target. Signal events are cha…
▽ More
The first FASER search for a light, long-lived particle decaying into a pair of photons is reported. The search uses LHC proton-proton collision data at $\sqrt{s}=13.6~\text{TeV}$ collected in 2022 and 2023, corresponding to an integrated luminosity of $57.7\text{fb}^{-1}$. A model with axion-like particles (ALPs) dominantly coupled to weak gauge bosons is the primary target. Signal events are characterised by high-energy deposits in the electromagnetic calorimeter and no signal in the veto scintillators. One event is observed, compared to a background expectation of $0.44 \pm 0.39$ events, which is entirely dominated by neutrino interactions. World-leading constraints on ALPs are obtained for masses up to $300~\text{MeV}$ and couplings to the Standard Model W gauge boson, $g_{aWW}$, around $10^{-4}$ GeV$^{-1}$, testing a previously unexplored region of parameter space. Other new particle models that lead to the same experimental signature, including ALPs coupled to gluons or photons, U(1)$_B$ gauge bosons, up-philic scalars, and a Type-I two-Higgs doublet model, are also considered for interpretation, and new constraints on previously viable parameter space are presented in this paper.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Text4Seg: Reimagining Image Segmentation as Text Generation
Authors:
Mengcheng Lan,
Chaofeng Chen,
Yue Zhou,
Jiaxing Xu,
Yiping Ke,
Xinjiang Wang,
Litong Feng,
Wayne Zhang
Abstract:
Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly sim…
▽ More
Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the segmentation process. Our key innovation is semantic descriptors, a new textual representation of segmentation masks where each image patch is mapped to its corresponding text label. This unified representation allows seamless integration into the auto-regressive training pipeline of MLLMs for easier optimization. We demonstrate that representing an image with $16\times16$ semantic descriptors yields competitive segmentation performance. To enhance efficiency, we introduce the Row-wise Run-Length Encoding (R-RLE), which compresses redundant text sequences, reducing the length of semantic descriptors by 74% and accelerating inference by $3\times$, without compromising performance. Extensive experiments across various vision tasks, such as referring expression segmentation and comprehension, show that Text4Seg achieves state-of-the-art performance on multiple datasets by fine-tuning different MLLM backbones. Our approach provides an efficient, scalable solution for vision-centric tasks within the MLLM framework.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Revisiting the Galactic Winds in M82 II: Development of Multiphase Outflows in Simulations
Authors:
Xue-Fu Li,
Weishan Zhu,
Tian-Rui Wang,
Long-Long Feng
Abstract:
We performed a suit of three-dimensional hydrodynamical simulations with a resolution of $\sim10$ parsecs to investigate the development of multiphase galactic wind in M82. The star formation and related feedback processes are solved self-consistently using a sink particle method, rather than relying on various assumptions that were used in previous studies. Our simulations produce a starburst eve…
▽ More
We performed a suit of three-dimensional hydrodynamical simulations with a resolution of $\sim10$ parsecs to investigate the development of multiphase galactic wind in M82. The star formation and related feedback processes are solved self-consistently using a sink particle method, rather than relying on various assumptions that were used in previous studies. Our simulations produce a starburst event lasting around 25 Myr, which has a total stellar mass of 1.62 - 3.34 $\times 10^8\, \rm{M_{\odot}}$, consistent with observational estimates. The total injected supernova energy is between $1.14\times 10^{57}$ and $2.4\times 10^{57} \rm{erg}$. Supernova (SN) feedback heats portions of the cool gas in the central disc to warm and hot phases, and then drives the gas in all three phases out, eventually forming multiphase outflows. These outflows can replicate key properties of the winds observed in M82, such as morphology, mass outflow rate, and X-ray emission flux, provided the gas return from star-forming clumps to the interstellar medium is implemented appropriately. The maximum mass outflow rate of all gas (hot) is about 6-12 (2-3)$\rm{M_{\odot}/yr}$ at $r\sim4.0\,$ kpc, corresponding to a mass loading factor of 2-4. However, the outflow velocities in our simulations are slower than observational estimates by $\sim 20\%-60\%$. The gas return process significantly influences the outflow properties, while the initial gas distribution in the nuclear region has a moderate effect. However, our results face some challenges in achieving convergence as the resolution increases. We discuss potential improvements to address these issues in future work.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Prototype-based Optimal Transport for Out-of-Distribution Detection
Authors:
Ao Ke,
Wenlong Chen,
Chuanwen Feng,
Yukun Cao,
Xike Xie,
S. Kevin Zhou,
Lei Feng
Abstract:
Detecting Out-of-Distribution (OOD) inputs is crucial for improving the reliability of deep neural networks in the real-world deployment. In this paper, inspired by the inherent distribution shift between ID and OOD data, we propose a novel method that leverages optimal transport to measure the distribution discrepancy between test inputs and ID prototypes. The resulting transport costs are used t…
▽ More
Detecting Out-of-Distribution (OOD) inputs is crucial for improving the reliability of deep neural networks in the real-world deployment. In this paper, inspired by the inherent distribution shift between ID and OOD data, we propose a novel method that leverages optimal transport to measure the distribution discrepancy between test inputs and ID prototypes. The resulting transport costs are used to quantify the individual contribution of each test input to the overall discrepancy, serving as a desirable measure for OOD detection. To address the issue that solely relying on the transport costs to ID prototypes is inadequate for identifying OOD inputs closer to ID data, we generate virtual outliers to approximate the OOD region via linear extrapolation. By combining the transport costs to ID prototypes with the costs to virtual outliers, the detection of OOD data near ID data is emphasized, thereby enhancing the distinction between ID and OOD inputs. Experiments demonstrate the superiority of our method over state-of-the-art methods.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Inverse Design of Photonic Crystal Waveguides Using Neural Networks and Dispersion Optimization
Authors:
Lucian Feng
Abstract:
Photonic crystal waveguides (PCWs) play a critical role in precisely controlling light propagation, enabling high-performance functions in applications such as optical communication and integrated photonics. The design of PCWs traditionally relies on complex numerical methods, including finite-difference time-domain (FDTD) and plane-wave expansion (PWE) methods, which are often inefficient when de…
▽ More
Photonic crystal waveguides (PCWs) play a critical role in precisely controlling light propagation, enabling high-performance functions in applications such as optical communication and integrated photonics. The design of PCWs traditionally relies on complex numerical methods, including finite-difference time-domain (FDTD) and plane-wave expansion (PWE) methods, which are often inefficient when dealing with high-dimensional parameter spaces, particularly for subwavelength structures. To overcome these challenges, a convolutional neural network (CNN)-based inverse design method is introduced to optimize the structural parameters of PCWs. By simulating band structures under varying line defect widths and air hole radii using MIT Photonic Bands (MPB), a large dataset was generated, mapping structural parameters to corresponding band characteristics. Backpropagation neural networks (BPNN) and CNN models were trained on this dataset to predict key PCW structural parameters. The CNN model demonstrated superior performance in predicting complex geometries, maintaining high accuracy even when extrapolating beyond the training dataset, with precision up to four decimal places. In contrast, the BPNN model exhibited faster training times and higher computational efficiency on smaller datasets, though it performed less effectively on larger datasets. Cross-validation using MPB confirmed the generalization capability and reliability of both models. This study highlights the potential of deep learning techniques in photonic device design, particularly for advancing the development of high-efficiency, low-loss components in integrated photonics and optical communication systems.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
LHAASO detection of very-high-energy gamma-ray emission surrounding PSR J0248+6021
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with…
▽ More
We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with 7.3 $σ$ and 13.5 $σ$, respectively. The best-fit position derived through WCDA data is R.A. = 42.06$^\circ \pm$ 0.12$^\circ$ and Dec. = 60.24$^\circ \pm $ 0.13$^\circ$ with an extension of 0.69$^\circ\pm$0.15$^\circ$ and that of the KM2A data is R.A.= 42.29$^\circ \pm $ 0.13$^\circ$ and Dec. = 60.38$^\circ \pm$ 0.07$^\circ$ with an extension of 0.37$^\circ\pm$0.07$^\circ$. No clear extended multiwavelength counterpart of this LHAASO source has been found from the radio band to the GeV band. The most plausible explanation of the VHE \gray emission is the inverse Compton process of highly relativistic electrons and positrons injected by the pulsar. These electrons/positrons are hypothesized to be either confined within the pulsar wind nebula or to have already escaped into the interstellar medium, forming a pulsar halo.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Artificial intelligence inspired freeform optics design: a review
Authors:
Lei Feng,
Jingxing Liao,
Jingna Yang
Abstract:
Integrating artificial intelligence (AI) techniques such as machine learning and deep learning into freeform optics design has significantly enhanced design efficiency, expanded the design space, and led to innovative solutions. This article reviews the latest developments in AI applications within this field, highlighting their roles in initial design generation, optimization, and performance pre…
▽ More
Integrating artificial intelligence (AI) techniques such as machine learning and deep learning into freeform optics design has significantly enhanced design efficiency, expanded the design space, and led to innovative solutions. This article reviews the latest developments in AI applications within this field, highlighting their roles in initial design generation, optimization, and performance prediction. It also addresses the benefits of AI, such as improved accuracy and performance, alongside challenges like data requirements, model interpretability, and computational complexity. Despite these challenges, the future of AI in freeform optics design looks promising, with potential advancements in hybrid design methods, interpretable AI, AI-driven manufacturing, and targeted research for specific applications. Collaboration among researchers, engineers, and designers is essential to fully harness AI's potential and drive innovation in optics.
△ Less
Submitted 17 September, 2024;
originally announced October 2024.
-
Non-Hermitian gauged reciprocity and symmetry
Authors:
Jiecheng Lyu,
Zihe Gao,
Liang Feng,
Li Ge
Abstract:
The Lorentz reciprocity is a fundamental property in electromagnetism and well known to break down due to an external magnetic field. With a fictitious or imaginary vector potential, however, its behavior is largely unknown. Here we show that in systems with an imaginary vector potential and displaying the non-Hermitian skin effect, the Lorentz reciprocity is broken but still governed by a rigorou…
▽ More
The Lorentz reciprocity is a fundamental property in electromagnetism and well known to break down due to an external magnetic field. With a fictitious or imaginary vector potential, however, its behavior is largely unknown. Here we show that in systems with an imaginary vector potential and displaying the non-Hermitian skin effect, the Lorentz reciprocity is broken but still governed by a rigorous mathematical relation, which we term non-Hermitian gauged reciprocity. When mimicking an imaginary vector potential using just linear integrated photonic elements, however, the conditions that lead to the Lorentz reciprocity are still satisfied and hence the latter cannot be broken. Nevertheless, we show that the non-Hermitian gauged reciprocity can still be observed with a proper choice of inputs and outputs, alongside the Lorentz reciprocity. In addition, we also reveal another equal-amplitude response in the same system, which we attribute to a non-Hermitian gauged symmetry. Furthermore, we show that light propagation is not impinged by the non-Hermitian topological funnel effect, highlighting an underappreciated difference between coherently driven and non-driven systems. These findings are confirmed using a tight-binding model and full-wave simulations of coupled optical micro-ring resonators, providing a valuable extension of the Lorentz reciprocity in the non-Hermitian domain.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting
Authors:
Longyu Feng,
Mengze Hong,
Chen Jason Zhang
Abstract:
Batch prompting is a common technique in large language models (LLMs) used to process multiple inputs simultaneously, aiming to improve computational efficiency. However, as batch sizes increase, performance degradation often occurs due to the model's difficulty in handling lengthy context inputs. Existing methods that attempt to mitigate these issues rely solely on batch data arrangement and majo…
▽ More
Batch prompting is a common technique in large language models (LLMs) used to process multiple inputs simultaneously, aiming to improve computational efficiency. However, as batch sizes increase, performance degradation often occurs due to the model's difficulty in handling lengthy context inputs. Existing methods that attempt to mitigate these issues rely solely on batch data arrangement and majority voting rather than improving the design of the batch prompt itself. In this paper, we address these limitations by proposing "Auto-Demo Prompting," a novel approach that leverages the question-output pairs from earlier questions within a batch as demonstrations for subsequent answer inference. We provide a formal theoretical analysis of how Auto-Demo Prompting functions within the autoregressive generation process of LLMs, illustrating how it utilizes prior outputs to optimize the model's internal representations. Our method effectively bridges the gap between batch prompting and few-shot prompting, enhancing performance with only a slight compromise in token usage. Experimental results across five NLP tasks demonstrate its effectiveness in mitigating performance degradation and occasionally outperforming single prompts. Furthermore, it opens new avenues for applying few-shot learning techniques, such as demonstration selection, within batch prompting, making it a robust solution for real-world applications.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Adaptive teachers for amortized samplers
Authors:
Minsu Kim,
Sanghyeok Choi,
Taeyoung Yun,
Emmanuel Bengio,
Leo Feng,
Jarrid Rector-Brooks,
Sungsoo Ahn,
Jinkyoo Park,
Nikolay Malkin,
Yoshua Bengio
Abstract:
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training fac…
▽ More
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose to use an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions. The Teacher, an auxiliary behavior model, is trained to sample high-error regions of the Student and can generalize across unexplored modes, thereby enhancing mode coverage by providing an efficient training curriculum. We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge, two diffusion-based sampling tasks, and four biochemical discovery tasks demonstrating its ability to improve sample efficiency and mode coverage.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Were RNNs All We Needed?
Authors:
Leo Feng,
Frederick Tung,
Mohamed Osama Ahmed,
Yoshua Bengio,
Hossein Hajimirsadegh
Abstract:
The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional recurrent neural networks (RNNs) from over a decade ago: LSTMs (19…
▽ More
The scalability limitations of Transformers regarding sequence length have renewed interest in recurrent sequence models that are parallelizable during training. As a result, many novel recurrent architectures, such as S4, Mamba, and Aaren, have been proposed that achieve comparable performance. In this work, we revisit traditional recurrent neural networks (RNNs) from over a decade ago: LSTMs (1997) and GRUs (2014). While these models were slow due to requiring to backpropagate through time (BPTT), we show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel. Building on this, we introduce minimal versions (minLSTMs and minGRUs) that (1) use significantly fewer parameters than their traditional counterparts and (2) are fully parallelizable during training (175x faster for a sequence of length 512). Lastly, we show that these stripped-down versions of decade-old RNNs match the empirical performance of recent sequence models.
△ Less
Submitted 4 October, 2024; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Search for proton decay via $p\rightarrow{e^+η}$ and $p\rightarrow{μ^+η}$ with a 0.37 Mton-year exposure of Super-Kamiokande
Authors:
Super-Kamiokande Collaboration,
:,
N. Taniuchi,
K. Abe,
S. Abe,
Y. Asaoka,
C. Bronner,
M. Harada,
Y. Hayato,
K. Hiraide,
K. Hosokawa,
K. Ieki,
M. Ikeda,
J. Kameda,
Y. Kanemura,
R. Kaneshima,
Y. Kashiwagi,
Y. Kataoka,
S. Miki,
S. Mine,
M. Miura,
S. Moriyama,
M. Nakahata,
S. Nakayama,
Y. Noguchi
, et al. (267 additional authors not shown)
Abstract:
A search for proton decay into $e^+/μ^+$ and a $η$ meson has been performed using data from a 0.373 Mton$\cdot$year exposure (6050.3 live days) of Super-Kamiokande. Compared to previous searches this work introduces an improved model of the intranuclear $η$ interaction cross section, resulting in a factor of two reduction in uncertainties from this source and $\sim$10\% increase in signal efficien…
▽ More
A search for proton decay into $e^+/μ^+$ and a $η$ meson has been performed using data from a 0.373 Mton$\cdot$year exposure (6050.3 live days) of Super-Kamiokande. Compared to previous searches this work introduces an improved model of the intranuclear $η$ interaction cross section, resulting in a factor of two reduction in uncertainties from this source and $\sim$10\% increase in signal efficiency. No significant data excess was found above the expected number of atmospheric neutrino background events resulting in no indication of proton decay into either mode. Lower limits on the proton partial lifetime of $1.4\times\mathrm{10^{34}~years}$ for $p\rightarrow e^+η$ and $7.3\times\mathrm{10^{33}~years}$ for $p\rightarrow μ^+η$ at the 90$\%$ C.L. were set. These limits are around 1.5 times longer than our previous study and are the most stringent to date.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models
Authors:
Yu Zhou,
Xingyu Wu,
Jibin Wu,
Liang Feng,
Kay Chen Tan
Abstract:
Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter…
▽ More
Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning
Authors:
Min Tan,
Yushun Tao,
Boyun Zheng,
GaoSheng Xie,
Lijuan Feng,
Zeyang Xia,
Jing Xiong
Abstract:
With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms, often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety a…
▽ More
With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms, often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, we introduce an Enhanced Exploration Mechanism (EEM) to address the low exploration efficiency of the standard PPO. Additionally, a reward-penalty adjustment (RPA) is implemented to penalize unsafe actions during initial interventions. Furthermore, Behavior Cloning Similarity (BCS) is included as an auxiliary objective to ensure the agent emulates expert actions. Comparative experiments conducted in a simulated platform across various anatomical colon segments demonstrate that our model effectively and safely guides RDE.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Spatial Sign based Principal Component Analysis for High Dimensional Data
Authors:
Long Feng
Abstract:
This article focuses on the robust principal component analysis (PCA) of high-dimensional data with elliptical distributions. We investigate the PCA of the sample spatial-sign covariance matrix in both nonsparse and sparse contexts, referring to them as SPCA and SSPCA, respectively. We present both nonasymptotic and asymptotic analyses to quantify the theoretical performance of SPCA and SSPCA. In…
▽ More
This article focuses on the robust principal component analysis (PCA) of high-dimensional data with elliptical distributions. We investigate the PCA of the sample spatial-sign covariance matrix in both nonsparse and sparse contexts, referring to them as SPCA and SSPCA, respectively. We present both nonasymptotic and asymptotic analyses to quantify the theoretical performance of SPCA and SSPCA. In sparse settings, we demonstrate that SSPCA, implemented through a combinatoric program, achieves the optimal rate of convergence. Our proposed SSPCA method is computationally efficient and exhibits robustness against heavy-tailed distributions compared to existing methods. Simulation studies and real-world data applications further validate the superiority of our approach.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
THz Second and Third Harmonic Generation in PdCoO$_2$ Thin Films
Authors:
T. Priessnitz,
L. Feng,
T. V. A. G. de Oliveira,
G. Baker,
I. Ilyakov,
A. Ponomaryov,
A. Arshad,
G. L. Prajapati,
J. -C. Deinert,
S. Kovalev,
B. Keimer,
S. Kaiser
Abstract:
Terahertz high harmonic generation (THz HHG) is a common property of nonlinear systems. Recently it has been used to investigate fundamental principles that govern transport and nonlinear dynamics in novel quantum materials like graphene, Dirac semimetals or high-temperature superconductors. Here, we report on the observation of exceptionally large THz second harmonic and third harmonic generation…
▽ More
Terahertz high harmonic generation (THz HHG) is a common property of nonlinear systems. Recently it has been used to investigate fundamental principles that govern transport and nonlinear dynamics in novel quantum materials like graphene, Dirac semimetals or high-temperature superconductors. Here, we report on the observation of exceptionally large THz second harmonic and third harmonic generation in thin films of the highly conducting delafossite PdCoO$_2$ down to low temperatures. The growth of this material on offcut substrate allows for a significant enhancement of the third harmonic intensity compared to ordinary $c$-axis grown thin films. Furthermore, it appears to be a necessity for the observation of THz second harmonic generation. We model the temperature dependence of the third harmonic generation by means of Boltzmann transport theory and provide an explanation for the second harmonic generation by comparing the system to the electric field induced second harmonic generation. The present investigation thus provides an important contribution to the ongoing discussion of low temperature origins of THz HHG and might serve as a new platform for THz high harmonic applications.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
Authors:
Liang Feng,
Zhixuan Shen,
Lihua Wen,
Shiyao Li,
Ming Xu
Abstract:
This paper introduces GateAttentionPose, an innovative approach that enhances the UniRepLKNet architecture for pose estimation tasks. We present two key contributions: the Agent Attention module and the Gate-Enhanced Feedforward Block (GEFB). The Agent Attention module replaces large kernel convolutions, significantly improving computational efficiency while preserving global context modeling. The…
▽ More
This paper introduces GateAttentionPose, an innovative approach that enhances the UniRepLKNet architecture for pose estimation tasks. We present two key contributions: the Agent Attention module and the Gate-Enhanced Feedforward Block (GEFB). The Agent Attention module replaces large kernel convolutions, significantly improving computational efficiency while preserving global context modeling. The GEFB augments feature extraction and processing capabilities, particularly in complex scenes. Extensive evaluations on COCO and MPII datasets demonstrate that GateAttentionPose outperforms existing state-of-the-art methods, including the original UniRepLKNet, achieving superior or comparable results with improved efficiency. Our approach offers a robust solution for pose estimation across diverse applications, including autonomous driving, human motion capture, and virtual reality.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution
Authors:
Liang Feng,
Ming Xu,
Lihua Wen,
Zhixuan Shen
Abstract:
Pose estimation is a crucial task in computer vision, with wide applications in autonomous driving, human motion capture, and virtual reality. However, existing methods still face challenges in achieving high accuracy, particularly in complex scenes. This paper proposes a novel pose estimation method, GatedUniPose, which combines UniRepLKNet and Gated Convolution and introduces the GLACE module fo…
▽ More
Pose estimation is a crucial task in computer vision, with wide applications in autonomous driving, human motion capture, and virtual reality. However, existing methods still face challenges in achieving high accuracy, particularly in complex scenes. This paper proposes a novel pose estimation method, GatedUniPose, which combines UniRepLKNet and Gated Convolution and introduces the GLACE module for embedding. Additionally, we enhance the feature map concatenation method in the head layer by using DySample upsampling. Compared to existing methods, GatedUniPose excels in handling complex scenes and occlusion challenges. Experimental results on the COCO, MPII, and CrowdPose datasets demonstrate that GatedUniPose achieves significant performance improvements with a relatively small number of parameters, yielding better or comparable results to models with similar or larger parameter sizes.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Prospects for searching for sterile neutrinos with gravitational wave and $γ$-ray burst joint observations
Authors:
Lu Feng,
Tao Han,
Jing-Fei Zhang,
Xin Zhang
Abstract:
Sterile neutrinos can influence the evolution of the universe, and thus cosmological observations can be used to detect them. Future gravitational wave (GW) observations can precisely measure absolute cosmological distances, helping to break parameter degeneracies generated by traditional cosmological observations. This advancement can lead to much tighter constraints on sterile neutrino parameter…
▽ More
Sterile neutrinos can influence the evolution of the universe, and thus cosmological observations can be used to detect them. Future gravitational wave (GW) observations can precisely measure absolute cosmological distances, helping to break parameter degeneracies generated by traditional cosmological observations. This advancement can lead to much tighter constraints on sterile neutrino parameters. This work provides a preliminary forecast for detecting sterile neutrinos using third-generation GW detectors in combination with future short $γ$-ray burst observations from a THESEUS-like telescope, an approach not previously explored in the literature. Both massless and massive sterile neutrinos are considered within the $Λ$CDM cosmology. We find that using GW data can greatly enhance the detection capability for massless sterile neutrinos, reaching 3$σ$ level. For massive sterile neutrinos, GW data can also greatly assist in improving the parameter constraints, but it seems that effective detection is still not feasible.
△ Less
Submitted 26 August, 2024;
originally announced September 2024.
-
Holographic Air-quality Monitor (HAM)
Authors:
Nicholas Bravo-Frank,
Lei Feng,
Jiarong Hong
Abstract:
We introduce the holographic air-quality monitor (HAM) system, uniquely tailored for monitoring large particulate matter (PM) over 10 um in diameter, i.e., particles critical for disease transmission and public health but overlooked by most commercial PM sensors. The HAM system utilizes a lensless digital inline holography (DIH) sensor combined with a deep learning model, enabling real-time detect…
▽ More
We introduce the holographic air-quality monitor (HAM) system, uniquely tailored for monitoring large particulate matter (PM) over 10 um in diameter, i.e., particles critical for disease transmission and public health but overlooked by most commercial PM sensors. The HAM system utilizes a lensless digital inline holography (DIH) sensor combined with a deep learning model, enabling real-time detection of PMs, with greater than 97% true positive rate at less than 0.6% false positive rate, and analysis of PMs by size and morphology at a sampling rate of 26 liters per minute (LPM), for a wide range of particle concentrations up to 4000 particles/L. Such throughput not only significantly outperforms traditional imaging-based sensors but also rivals some lower-fidelity, non-imaging sensors. Additionally, the HAM system is equipped with additional sensors for smaller PMs and various air quality conditions, ensuring a comprehensive assessment of indoor air quality. The performance of the DIH sensor within the HAM system was evaluated through comparison with brightfield microscopy, showing high concordance in size measurements. The efficacy of the DIH sensor was also demonstrated in two two-hour experiments under different environments simulating practical conditions with one involving distinct PM-generating events. These tests highlighted the HAM system's advanced capability to differentiate PM events from background noise and its exceptional sensitivity to irregular, large-sized PMs of low concentration.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models
Authors:
Yuxiao Huang,
Xuebin Lv,
Shenghao Wu,
Jibin Wu,
Liang Feng,
Kay Chen Tan
Abstract:
Evolutionary Multi-task Optimization (EMTO) is a paradigm that leverages knowledge transfer across simultaneously optimized tasks for enhanced search performance. To facilitate EMTO's performance, various knowledge transfer models have been developed for specific optimization tasks. However, designing these models often requires substantial expert knowledge. Recently, large language models (LLMs)…
▽ More
Evolutionary Multi-task Optimization (EMTO) is a paradigm that leverages knowledge transfer across simultaneously optimized tasks for enhanced search performance. To facilitate EMTO's performance, various knowledge transfer models have been developed for specific optimization tasks. However, designing these models often requires substantial expert knowledge. Recently, large language models (LLMs) have achieved remarkable success in autonomous programming, aiming to produce effective solvers for specific problems. In this work, a LLM-based optimization paradigm is introduced to establish an autonomous model factory for generating knowledge transfer models, ensuring effective and efficient knowledge transfer across various optimization tasks. To evaluate the performance of the proposed method, we conducted comprehensive empirical studies comparing the knowledge transfer model generated by the LLM with existing state-of-the-art knowledge transfer methods. The results demonstrate that the generated model is able to achieve superior or competitive performance against hand-crafted knowledge transfer models in terms of both efficiency and effectiveness.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
The Spatial Distribution of $\rm CH_4$ and $\rm CO_2$ Ice around Protostars IRAS 16253-2429 and IRAS 23385+6053
Authors:
Lei Lei,
Lei Feng,
Yi-Zhong Fan
Abstract:
The origin and evolution of organic molecules represent a pivotal issue in the fields of astrobiology and astrochemistry, potentially shedding light on the origins of life. The James Webb Space Telescope (JWST), with its exceptional sensitivity and spectral resolution, is well suitable to observe molecules such as methane ($\rm CH_4$). Our analysis focused on the distribution of $\rm CH_4$,…
▽ More
The origin and evolution of organic molecules represent a pivotal issue in the fields of astrobiology and astrochemistry, potentially shedding light on the origins of life. The James Webb Space Telescope (JWST), with its exceptional sensitivity and spectral resolution, is well suitable to observe molecules such as methane ($\rm CH_4$). Our analysis focused on the distribution of $\rm CH_4$, $\rm CO_2$, $\rm H_2O$, $\rm{CH_3OH+NH_4^+}$ ice and silicate absorption dips at approximately 7.7, 15.0, 6.0, 6.7 and 10.0 micrometres in two protostars: IRAS 16253-2429 and IRAS 23385+6053. We extract the $\rm CH_4$, $\rm CO_2$, $\rm H_2O$, $\rm{CH_3OH+NH_4^+}$ ice equivalent width (EW) maps and silicate extinction maps of the two sources. Our results reveal that the spatial distribution of $\rm CH_4$ in the protostellar system IRAS 16253-2429 closely mirrors that of its $\rm CO_2$ ice, forming a surrounded distribution that encircles the central protostar. This alignment suggests a common formation mechanism and subsequent trapping within the protostellar envelope, which is consistent with the "Classical" dark-cloud chemistry with ion-molecule reaction. In contrast, the spatial distributions of various molecules in the system IRAS 23385+6053 exhibit low similarities, which may be attributed to the dynamic influences of outflows or accretion processes. These discrepancies highlight the complex interplay between physical processes and chemical evolution in protostellar environments.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
Authors:
Jiajie Zhang,
Yushi Bai,
Xin Lv,
Wanjun Gu,
Danqing Liu,
Minhao Zou,
Shulin Cao,
Lei Hou,
Yuxiao Dong,
Ling Feng,
Juanzi Li
Abstract:
Though current long-context large language models (LLMs) have demonstrated impressive capacities in answering user questions based on extensive text, the lack of citations in their responses makes user verification difficult, leading to concerns about their trustworthiness due to their potential hallucinations. In this work, we aim to enable long-context LLMs to generate responses with fine-graine…
▽ More
Though current long-context large language models (LLMs) have demonstrated impressive capacities in answering user questions based on extensive text, the lack of citations in their responses makes user verification difficult, leading to concerns about their trustworthiness due to their potential hallucinations. In this work, we aim to enable long-context LLMs to generate responses with fine-grained sentence-level citations, improving their faithfulness and verifiability. We first introduce LongBench-Cite, an automated benchmark for assessing current LLMs' performance in Long-Context Question Answering with Citations (LQAC), revealing considerable room for improvement. To this end, we propose CoF (Coarse to Fine), a novel pipeline that utilizes off-the-shelf LLMs to automatically generate long-context QA instances with precise sentence-level citations, and leverage this pipeline to construct LongCite-45k, a large-scale SFT dataset for LQAC. Finally, we train LongCite-8B and LongCite-9B using the LongCite-45k dataset, successfully enabling their generation of accurate responses and fine-grained sentence-level citations in a single output. The evaluation results on LongBench-Cite show that our trained models achieve state-of-the-art citation quality, surpassing advanced proprietary models including GPT-4o.
△ Less
Submitted 10 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
FPF@FCC: Neutrino, QCD, and BSM Physics Opportunities with Far-Forward Experiments at a 100 TeV Proton Collider
Authors:
Roshan Mammen Abraham,
Jyotismita Adhikary,
Jonathan L. Feng,
Max Fieg,
Felix Kling,
Jinmian Li,
Junle Pei,
Tanjona R. Rabemananjara,
Juan Rojo,
Sebastian Trojanowski
Abstract:
Proton-proton collisions at energy-frontier facilities produce an intense flux of high-energy light particles, including neutrinos, in the forward direction. At the LHC, these particles are currently being studied with the far-forward experiments FASER/FASER$ν$ and SND@LHC, while new dedicated experiments have been proposed in the context of a Forward Physics Facility (FPF) operating at the HL-LHC…
▽ More
Proton-proton collisions at energy-frontier facilities produce an intense flux of high-energy light particles, including neutrinos, in the forward direction. At the LHC, these particles are currently being studied with the far-forward experiments FASER/FASER$ν$ and SND@LHC, while new dedicated experiments have been proposed in the context of a Forward Physics Facility (FPF) operating at the HL-LHC. Here we present a first quantitative exploration of the reach for neutrino, QCD, and BSM physics of far-forward experiments integrated within the proposed Future Circular Collider (FCC) project as part of its proton-proton collision program (FCC-hh) at $\sqrt{s} \simeq 100$ TeV. We find that $10^9$ electron/muon neutrinos and $10^7$ tau neutrinos could be detected, an increase of several orders of magnitude compared to (HL-)LHC yields. We study the impact of neutrino DIS measurements at the FPF@FCC to constrain the unpolarised and spin partonic structure of the nucleon and assess their sensitivity to nuclear dynamics down to $x \sim 10^{-9}$ with neutrinos produced in proton-lead collisions. We demonstrate that the FPF@FCC could measure the neutrino charge radius for $ν_{e}$ and $ν_μ$ and reach down to five times the SM value for $ν_τ$. We fingerprint the BSM sensitivity of the FPF@FCC for a variety of models, including dark Higgs bosons, relaxion-type scenarios, quirks, and millicharged particles, finding that these experiments would be able to discover LLPs with masses as large as 50 GeV and couplings as small as $10^{-8}$, and quirks with masses up to 10 TeV. Our study highlights the remarkable opportunities made possible by integrating far-forward experiments into the FCC project, and it provides new motivation for the FPF at the HL-LHC as an essential precedent to optimize the forward physics experiments that will enable the FCC to achieve its full physics potential.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He
Authors:
F. Alemanno,
Q. An,
P. Azzarello,
F. C. T. Barbato,
P. Bernardini,
X. J. Bi,
I. Cagnoli,
M. S. Cai,
E. Casilli,
E. Catanzani,
J. Chang,
D. Y. Chen,
J. L. Chen,
Z. F. Chen,
P. Coppin,
M. Y. Cui,
T. S. Cui,
Y. X. Cui,
H. T. Dai,
A. De Benedittis,
I. De Mitri,
F. de Palma,
A. Di Giovanni,
Q. Ding,
T. K. Dong
, et al. (126 additional authors not shown)
Abstract:
Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp…
▽ More
Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
From Model Explanation to Data Misinterpretation: Uncovering the Pitfalls of Post Hoc Explainers in Business Research
Authors:
Ronilo Ragodos,
Tong Wang,
Lu Feng,
Yu,
Hu
Abstract:
Machine learning models have been increasingly used in business research. However, most state-of-the-art machine learning models, such as deep neural networks and XGBoost, are black boxes in nature. Therefore, post hoc explainers that provide explanations for machine learning models by, for example, estimating numerical importance of the input features, have been gaining wide usage. Despite the in…
▽ More
Machine learning models have been increasingly used in business research. However, most state-of-the-art machine learning models, such as deep neural networks and XGBoost, are black boxes in nature. Therefore, post hoc explainers that provide explanations for machine learning models by, for example, estimating numerical importance of the input features, have been gaining wide usage. Despite the intended use of post hoc explainers being explaining machine learning models, we found a growing trend in business research where post hoc explanations are used to draw inferences about the data. In this work, we investigate the validity of such use. Specifically, we investigate with extensive experiments whether the explanations obtained by the two most popular post hoc explainers, SHAP and LIME, provide correct information about the true marginal effects of X on Y in the data, which we call data-alignment. We then identify what factors influence the alignment of explanations. Finally, we propose a set of mitigation strategies to improve the data-alignment of explanations and demonstrate their effectiveness with real-world data in an econometric context. In spite of this effort, we nevertheless conclude that it is often not appropriate to infer data insights from post hoc explanations. We articulate appropriate alternative uses, the most important of which is to facilitate the proposition and subsequent empirical investigation of hypotheses. The ultimate goal of this paper is to caution business researchers against translating post hoc explanations of machine learning models into potentially false insights and understanding of data.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Pair Counting without Binning -- A New Approach to Correlation Functions in Clustering Statistics
Authors:
Shiyu Yue,
Longlong Feng,
Wenjie Ju,
Jun Pan,
Zhiqi Huang,
Feng Fang,
Zhuoyang Li,
Yan-Chuan Cai,
Weishan Zhu
Abstract:
This paper presents a novel perspective on correlation functions in the clustering analysis of the large-scale structure of the universe. We first recognise that pair counting in bins of radial separation is equivalent to evaluating counts-in-cells (CIC), which can be modelled using a filtered density field with a binning-window function. This insight leads to an in situ expression for the two-poi…
▽ More
This paper presents a novel perspective on correlation functions in the clustering analysis of the large-scale structure of the universe. We first recognise that pair counting in bins of radial separation is equivalent to evaluating counts-in-cells (CIC), which can be modelled using a filtered density field with a binning-window function. This insight leads to an in situ expression for the two-point correlation function (2PCF). Essentially, the core idea underlying our method is to introduce a window function to define the binning scheme, enabling pair-counting without binning. This approach develops a concept of generalised 2PCF, which extends beyond conventional discrete pair counting by accommodating non-sharp-edged window functions. To extend this framework to N-point correlation functions (NPCF) using current optimal edge-corrected estimators, we developed a binning scheme independent of the specific parameterisation of polyhedral configurations. In particular, we demonstrate a fast algorithm for the three-point correlation function (3PCF), where triplet counting is accomplished by assigning either a spherical tophat or a Gaussian filter to each vertex of triangles. Additionally, we derive analytical expressions for the 3PCF using a multipole expansion in Legendre polynomials, accounting for filtered field (binning) corrections. Numerical tests using several suites of N-body simulation samples show that our approach aligns remarkably well with the theoretical predictions. Our method provides an exact solution for quantifying binning effects in practical measurements and offers a high-speed algorithm, enabling high-order clustering analysis in extremely large datasets from ongoing and upcoming surveys such as Euclid, LSST, and DESI.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework
Authors:
Longyu Feng,
Huahang Li,
Chen Jason Zhang
Abstract:
Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. The inherent uncertainty of current schema matching algorithms leads to the generation of a set of candidate matches. Storing these results necessitates the use of databases and systems capable of handling proba…
▽ More
Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. The inherent uncertainty of current schema matching algorithms leads to the generation of a set of candidate matches. Storing these results necessitates the use of databases and systems capable of handling probabilistic queries. This complicates the querying process and increases the associated storage costs. Motivated by GPT-4 outstanding performance, we explore its potential to reduce uncertainty. Our proposal is to supplant the role of crowdworkers with GPT-4 for querying the set of candidate matches. To get more precise correspondence verification responses from GPT-4, We have crafted Semantic-match and Abbreviation-match prompt for GPT-4, achieving state-of-the-art results on two benchmark datasets DeepMDatasets 100% (+0.0) and Fabricated-Datasets 91.8% (+2.2) recall rate. To optimise budget utilisation, we have devised a cost-aware solution. Within the constraints of the budget, our solution delivers favourable outcomes with minimal time expenditure.
We introduce a novel framework, Prompt-Matcher, to reduce the uncertainty in the process of integration of multiple automatic schema matching algorithms and the selection of complex parameterization. It assists users in diminishing the uncertainty associated with candidate schema match results and in optimally ranking the most promising matches. We formally define the Correspondence Selection Problem, aiming to optimise the revenue within the confines of the GPT-4 budget. We demonstrate that CSP is NP-Hard and propose an approximation algorithm with minimal time expenditure. Ultimately, we demonstrate the efficacy of Prompt-Matcher through rigorous experiments.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond
Authors:
Minghao Liu,
Zonglin Di,
Jiaheng Wei,
Zhongruo Wang,
Hengxiang Zhang,
Ruixuan Xiao,
Haoyu Wang,
Jinlong Pang,
Hao Chen,
Ankit Shah,
Hongxin Wei,
Xinlei He,
Zhaowei Zhao,
Haobo Wang,
Lei Feng,
Jindong Wang,
James Davis,
Yang Liu
Abstract:
Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (A…
▽ More
Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (ADC), an innovative methodology that automates dataset creation with negligible cost and high efficiency. Taking the image classification task as a starting point, ADC leverages LLMs for the detailed class design and code generation to collect relevant samples via search engines, significantly reducing the need for manual annotation and speeding up the data generation process. Despite these advantages, ADC also encounters real-world challenges such as label errors (label noise) and imbalanced data distributions (label bias). We provide open-source software that incorporates existing methods for label error detection, robust learning under noisy and biased data, ensuring a higher-quality training data and more robust model training procedure. Furthermore, we design three benchmark datasets focused on label noise detection, label noise learning, and class-imbalanced learning. These datasets are vital because there are few existing datasets specifically for label noise detection, despite its importance. Finally, we evaluate the performance of existing popular methods on these datasets, thereby facilitating further research in the field.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Design Principle Transfer in Neural Architecture Search via Large Language Models
Authors:
Xun Zhou,
Liang Feng,
Xingyu Wu,
Zhichao Lu,
Kay Chen Tan
Abstract:
Transferable neural architecture search (TNAS) has been introduced to design efficient neural architectures for multiple tasks, to enhance the practical applicability of NAS in real-world scenarios. In TNAS, architectural knowledge accumulated in previous search processes is reused to warm up the architecture search for new tasks. However, existing TNAS methods still search in an extensive search…
▽ More
Transferable neural architecture search (TNAS) has been introduced to design efficient neural architectures for multiple tasks, to enhance the practical applicability of NAS in real-world scenarios. In TNAS, architectural knowledge accumulated in previous search processes is reused to warm up the architecture search for new tasks. However, existing TNAS methods still search in an extensive search space, necessitating the evaluation of numerous architectures. To overcome this challenge, this work proposes a novel transfer paradigm, i.e., design principle transfer. In this work, the linguistic description of various structural components' effects on architectural performance is termed design principles. They are learned from established architectures and then can be reused to reduce the search space by discarding unpromising architectures. Searching in the refined search space can boost both the search performance and efficiency for new NAS tasks. To this end, a large language model (LLM)-assisted design principle transfer (LAPT) framework is devised. In LAPT, LLM is applied to automatically reason the design principles from a set of given architectures, and then a principle adaptation method is applied to refine these principles progressively based on the new search results. Experimental results show that LAPT can beat the state-of-the-art TNAS methods on most tasks and achieve comparable performance on others.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Surrogate-Assisted Search with Competitive Knowledge Transfer for Expensive Optimization
Authors:
Xiaoming Xue,
Yao Hu,
Liang Feng,
Kai Zhang,
Linqi Song,
Kay Chen Tan
Abstract:
Expensive optimization problems (EOPs) have attracted increasing research attention over the decades due to their ubiquity in a variety of practical applications. Despite many sophisticated surrogate-assisted evolutionary algorithms (SAEAs) that have been developed for solving such problems, most of them lack the ability to transfer knowledge from previously-solved tasks and always start their sea…
▽ More
Expensive optimization problems (EOPs) have attracted increasing research attention over the decades due to their ubiquity in a variety of practical applications. Despite many sophisticated surrogate-assisted evolutionary algorithms (SAEAs) that have been developed for solving such problems, most of them lack the ability to transfer knowledge from previously-solved tasks and always start their search from scratch, making them troubled by the notorious cold-start issue. A few preliminary studies that integrate transfer learning into SAEAs still face some issues, such as defective similarity quantification that is prone to underestimate promising knowledge, surrogate-dependency that makes the transfer methods not coherent with the state-of-the-art in SAEAs, etc. In light of the above, a plug and play competitive knowledge transfer method is proposed to boost various SAEAs in this paper. Specifically, both the optimized solutions from the source tasks and the promising solutions acquired by the target surrogate are treated as task-solving knowledge, enabling them to compete with each other to elect the winner for expensive evaluation, thus boosting the search speed on the target task. Moreover, the lower bound of the convergence gain brought by the knowledge competition is mathematically analyzed, which is expected to strengthen the theoretical foundation of sequential transfer optimization. Experimental studies conducted on a series of benchmark problems and a practical application from the petroleum industry verify the efficacy of the proposed method. The source code of the competitive knowledge transfer is available at https://github.com/XmingHsueh/SAS-CKT.
△ Less
Submitted 20 August, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
Double Robust high dimensional alpha test for linear factor pricing model
Authors:
Ping Zhao,
Long Feng,
Hongfei Wang,
Zhaojun Wang
Abstract:
In this paper, we investigate alpha testing for high-dimensional linear factor pricing models. We propose a spatial sign-based max-type test to handle sparse alternative cases. Additionally, we prove that this test is asymptotically independent of the spatial-sign-based sum-type test proposed by Liu et al. (2023). Based on this result, we introduce a Cauchy Combination test procedure that combines…
▽ More
In this paper, we investigate alpha testing for high-dimensional linear factor pricing models. We propose a spatial sign-based max-type test to handle sparse alternative cases. Additionally, we prove that this test is asymptotically independent of the spatial-sign-based sum-type test proposed by Liu et al. (2023). Based on this result, we introduce a Cauchy Combination test procedure that combines both the max-type and sum-type tests. Simulation studies and real data applications demonstrate that the new proposed test procedure is robust not only for heavy-tailed distributions but also for the sparsity of the alternative hypothesis.
△ Less
Submitted 14 September, 2024; v1 submitted 12 August, 2024;
originally announced August 2024.
-
Cryogenic nonlinear conversion processes in periodically-poled thin-film lithium niobate waveguides
Authors:
Yujie Cheng,
Xiaoting Li,
Lantian Feng,
Haochuan Li,
Wenzhao Sun,
Xinyu Song,
Yuyang Ding,
Guangcan Guo,
Cheng Wang,
Xifeng Ren
Abstract:
Periodically poled thin-film lithium niobate (TFLN) waveguides, which enable efficient quadratic nonlinear processes, serve as crucial foundation for classical and quantum signal processing with photonic integrated circuits. To expand their application scope, we provide, to our best knowledge, the first investigation of nonlinear conversion processes in periodically poled TFLN waveguides at cryoge…
▽ More
Periodically poled thin-film lithium niobate (TFLN) waveguides, which enable efficient quadratic nonlinear processes, serve as crucial foundation for classical and quantum signal processing with photonic integrated circuits. To expand their application scope, we provide, to our best knowledge, the first investigation of nonlinear conversion processes in periodically poled TFLN waveguides at cryogenic condition. Through systematic experimental characterization, we find that the periodically poled TFLN waveguide maintains consistent conversion efficiencies at both cryogenic and room temperatures for both classical second-harmonic generation and quantum photon-pair generation processes, demonstrating the significant potential of TFLN wavelength conversion devices for cryogenic applications. This breakthrough will foster future scalable quantum photonic systems and optical interfacing among different cryogenic platforms.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Various Features of the X-class White-light Flares in Super Active Region NOAA 13664
Authors:
Ying Li,
Xiaofeng Liu,
Zhichen Jing,
Wei Chen,
Qiao Li,
Yang Su,
De-Chao Song,
M. D. Ding,
Li Feng,
Hui Li,
Weiqun Gan
Abstract:
Super active region NOAA 13664 produced 12 X-class flares (including the largest one, an occulted X8.7 flare, in solar cycle 25 so far) during 2024 May 8-15 and 11 of them are identified as white-light flares. Here we present various features of these X-class white-light flares observed by the White-light Solar Telescope (WST) on board the Advanced Space-based Solar Observatory and the Helioseismi…
▽ More
Super active region NOAA 13664 produced 12 X-class flares (including the largest one, an occulted X8.7 flare, in solar cycle 25 so far) during 2024 May 8-15 and 11 of them are identified as white-light flares. Here we present various features of these X-class white-light flares observed by the White-light Solar Telescope (WST) on board the Advanced Space-based Solar Observatory and the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory. It is found that both the white-light emissions at WST 3600 Å (Balmer continuum) and HMI 6173 Å (Paschen continuum) show up in different regions of the sunspot group in these flares, including outside the sunspots and within the penumbra and umbra of the sunspots. They exhibit a point-, ribbon-, loop-, or ejecta-like shape, which can come from flare ribbons (or footpoints), flare loops, and plasma ejecta depending on the perspective view. The white-light duration and relative enhancement are measured and both parameters for 3600 Å emission have greater values than those for 6173 Å emission. It is also found that these white-light emissions are cospatial well with the hard X-ray (HXR) sources in the on-disk flares but have some offsets with the HXR emissions in the off-limb flares. In addition, it is interesting that the 3600 and 6173 Å emissions show different correlations with the peak HXR fluxes, with the former one more sensitive to the HXR emission. All these greatly help us understand the white-light flares of a large magnitude from a super active region on the Sun and also provide important insights into superflares on Sun-like stars.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Authors:
Mengcheng Lan,
Chaofeng Chen,
Yiping Ke,
Xinjiang Wang,
Litong Feng,
Wayne Zhang
Abstract:
Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring sp…
▽ More
Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring spatially consistent local visual representations, yet they fall short in semantic understanding. This paper introduces ProxyCLIP, an innovative framework designed to harmonize the strengths of both CLIP and VFMs, facilitating enhanced open-vocabulary semantic segmentation. ProxyCLIP leverages the spatial feature correspondence from VFMs as a form of proxy attention to augment CLIP, thereby inheriting the VFMs' robust local consistency and maintaining CLIP's exceptional zero-shot transfer capacity. We propose an adaptive normalization and masking strategy to get the proxy attention from VFMs, allowing for adaptation across different VFMs. Remarkably, as a training-free approach, ProxyCLIP significantly improves the average mean Intersection over Union (mIoU) across eight benchmarks from 40.3 to 44.4, showcasing its exceptional efficacy in bridging the gap between spatial precision and semantic richness for the open-vocabulary segmentation task.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Inflight Performance and Calibrations of the Lyman-alpha Solar Telescope on board the Advanced Space-based Solar Observatory
Authors:
Bo Chen,
Li Feng,
Guang Zhang,
Hui Li,
Lingping He,
Kefei Song,
Quanfeng Guo,
Ying Li,
Yu Huang,
Jingwei Li,
Jie Zhao,
Jianchao Xue,
Gen Li,
Guanglu Shi,
Dechao Song,
Lei Lu,
Beili Ying,
Haifeng Wang,
Shuang Dai,
Xiaodong Wang,
Shilei Mao,
Peng Wang,
Kun Wu,
Shuai Ren,
Liang Sun
, et al. (18 additional authors not shown)
Abstract:
The Lyman-alpha Solar Telescope (LST) on board the Advanced Space-based Solar Observatory (ASO-S) is the first payload to image the full solar disk and the solar corona in both white-light (WL) and ultraviolet (UV) H I Lya, extending up to 2.5 solar radii (Rs). Since the launch of the ASO-S on 9 October 2022, LST has captured various significant solar activities including flares, prominences, coro…
▽ More
The Lyman-alpha Solar Telescope (LST) on board the Advanced Space-based Solar Observatory (ASO-S) is the first payload to image the full solar disk and the solar corona in both white-light (WL) and ultraviolet (UV) H I Lya, extending up to 2.5 solar radii (Rs). Since the launch of the ASO-S on 9 October 2022, LST has captured various significant solar activities including flares, prominences, coronal mass ejections (CMEs). LST covers different passbands of 121.6 nm, 360 nm and 700 nm. The Lya Solar Disk Imager (SDI) has a field of view (FOV) of 38.4 arcmin and a spatial resolution of around 9.5 arcsec, while the White-Light Solar Telescope (WST) has a FOV of 38.43 arcmin and a spatial resolution of around 3.0 arcsec. The FOV of the Lya Solar Corona Imager (SCI) reaches 81.1 arcmin and its spatial resolution is 4.3 arcsec. The stray-light level in the 700 nm waveband is about 7.8e-6 MSB (mean solar brightness) at 1.1 Rs and 7.6e-7 MSB at 2.5 Rs, and in the Lya waveband it is around 4.3e-3 MSB at 1.1 Rs and 4.1e-4 MSB at 2.5 Rs. This article will detail the results from on-orbit tests and calibrations.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
An evidence of pion condensation
Authors:
Wei Zhu,
Yu-Chen Tang,
Lei Feng
Abstract:
Pion condensation is a theoretical prediction, where pions form a special state of matter under certain extreme conditions in heavy ion collisions or neutron stars. However, there is currently no solid experimental evidence confirming the existence of pion condensation. We present a near-direct evidence for the existence of pion condensation. In actively changing galactic processes protons can be…
▽ More
Pion condensation is a theoretical prediction, where pions form a special state of matter under certain extreme conditions in heavy ion collisions or neutron stars. However, there is currently no solid experimental evidence confirming the existence of pion condensation. We present a near-direct evidence for the existence of pion condensation. In actively changing galactic processes protons can be accelerated to very high energies and collide with the medium (protons or nuclei). The kinetic energy of the protons is mainly used to produce a large number of pions in the central region via gluons. When the collision energy exceeds a certain threshold, the huge amount of soft gluons condensed in protons pour into the central region, the number of pion increases abruptly to the saturation limit, and almost all the available collision energy is used to make pions, creating a dense, low-temperature pion condensation environment. Due to energy conservation and relativistic covariance, the gamma ray spectra produced by condensed pions exhibit recognizable broken power law with the gluon condensation characteristics. We find that they are already present in many recorded gamma ray spectra. Our findings reveal a novel mechanism for the generation of pion condensation, which is prevalent in the formation of high-energy cosmic rays, and deepen our understanding of related topics in a variety of disciplines, including particle physics, astrophysics, condensed matter physics and nuclear physics.
△ Less
Submitted 3 August, 2024;
originally announced August 2024.
-
A spectral Lovász-Simonovits theorem
Authors:
Yongtao Li,
Lihua Feng,
Yuejian Peng
Abstract:
A fundamental result in extremal graph theory attributes to Mantel's theorem, which states that every graph on $n$ vertices with more than $\lfloor n^2/4 \rfloor$ edges contains a triangle. About half of a century ago, Lovász and Simonovits (1975) provided a supersaturation phenomenon, which asserts that for $q< n/2$, every graph with $\lfloor n^2/4 \rfloor +q$ edges contains at least…
▽ More
A fundamental result in extremal graph theory attributes to Mantel's theorem, which states that every graph on $n$ vertices with more than $\lfloor n^2/4 \rfloor$ edges contains a triangle. About half of a century ago, Lovász and Simonovits (1975) provided a supersaturation phenomenon, which asserts that for $q< n/2$, every graph with $\lfloor n^2/4 \rfloor +q$ edges contains at least $q\lfloor n/2 \rfloor$ triangles. This result solved a conjecture proposed by Erdős in 1962. In this paper, we establish a spectral version of the result of Lovász and Simonovits. Let $Y_{n,2,q}$ be the graph obtained from the bipartite Turán graph $T_{n,2}$ by embedding a matching with $q$ edges into the vertex part of size $\lceil n/2\rceil$. Using the supersaturation-stability method and the classical spectral techniques, we firstly prove that for $n\ge 300q^2$, each graph $G$ on $n$ vertices with $λ(G) \ge λ(Y_{n,2,q})$ contains at least $q\lfloor n/2 \rfloor$ triangles. Moreover, let $T_{n,2,q}$ be the graph obtained from $T_{n,2}$ by embedding a star with $q$ edges into the vertex part of size $\lceil n/2\rceil$. Secondly, we show further that $T_{n,2,q}$ is the unique spectral extremal graph that contains at most $q\lfloor n/2 \rfloor$ triangles and attains the maximum of the spectral radius. This result answers a spectral triangle counting problem due to Ning and Zhai (2023). Thirdly, we present an asymptotically spectral stability result under a specific constraint on the triangle covering number. The third result could be regarded as a spectral extension of a recent result proved by Balogh and Clemen (2023), and independently by Liu and Mubayi (2022).
△ Less
Submitted 3 August, 2024;
originally announced August 2024.
-
CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles
Authors:
Rong Gu,
Kaige Tan,
Andreas Holck Høeg-Petersen,
Lei Feng,
Kim Guldstrand Larsen
Abstract:
Combining machine learning and formal methods (FMs) provides a possible solution to overcome the safety issue of autonomous driving (AD) vehicles. However, there are gaps to be bridged before this combination becomes practically applicable and useful. In an attempt to facilitate researchers in both FMs and AD areas, this paper proposes a framework that combines two well-known tools, namely CommonR…
▽ More
Combining machine learning and formal methods (FMs) provides a possible solution to overcome the safety issue of autonomous driving (AD) vehicles. However, there are gaps to be bridged before this combination becomes practically applicable and useful. In an attempt to facilitate researchers in both FMs and AD areas, this paper proposes a framework that combines two well-known tools, namely CommonRoad and UPPAAL. On the one hand, CommonRoad can be enhanced by the rigorous semantics of models in UPPAAL, which enables a systematic and comprehensive understanding of the AD system's behaviour and thus strengthens the safety of the system. On the other hand, controllers synthesised by UPPAAL can be visualised by CommonRoad in real-world road networks, which facilitates AD vehicle designers greatly adopting formal models in system design. In this framework, we provide automatic model conversions between CommonRoad and UPPAAL. Therefore, users only need to program in Python and the framework takes care of the formal models, learning, and verification in the backend. We perform experiments to demonstrate the applicability of our framework in various AD scenarios, discuss the advantages of solving motion planning in our framework, and show the scalability limit and possible solutions.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Nanohertz gravitational waves from a quasar-based supermassive black hole binary population model as dark sirens
Authors:
Si-Ren Xiao,
Yue Shao,
Ling-Feng Wang,
Ji-Yu Song,
Lu Feng,
Jing-Fei Zhang,
Xin Zhang
Abstract:
Recently, several pulsar timing array (PTA) projects have detected evidence of the existence of a stochastic gravitational wave background (SGWB) in the nanohertz frequency band, providing confidence in detecting individual supermassive black hole binaries (SMBHBs) in the future. Nanohertz GWs emitted by inspiraling SMBHBs encode the luminosity distances of SMBHBs. They can serve as dark sirens to…
▽ More
Recently, several pulsar timing array (PTA) projects have detected evidence of the existence of a stochastic gravitational wave background (SGWB) in the nanohertz frequency band, providing confidence in detecting individual supermassive black hole binaries (SMBHBs) in the future. Nanohertz GWs emitted by inspiraling SMBHBs encode the luminosity distances of SMBHBs. They can serve as dark sirens to explore the cosmic expansion history via a statistical method to obtain the redshift information of GW sources' host galaxies using galaxy catalogs. The theoretical analysis of the dark siren method relies on the modeling of the population of SMBHBs. Using a population model consistent with the latest SGWB observations is essential, as the SGWB provides significant information about the distribution of SMBHBs. In this work, we employ a quasar-based model, which can self-consistently account for the SGWB amplitude, to estimate the population of SMBHBs. We constrain the Hubble constant using the mock GW data from different detection cases of PTAs in the future. Our results show that a PTA consisting of 100 pulsars with a white noise level of 20 ns could measure the Hubble constant with a precision close to $1\%$ over a 10-year observation period, and a PTA with 200 pulsars may achieve this goal over a 5-year observation period. The results indicate that modeling the SMBHB population significantly influences the analysis of dark sirens, and SMBHB dark sirens have the potential to be developed as a valuable cosmological probe.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
GroupCDL: Interpretable Denoising and Compressed Sensing MRI via Learned Group-Sparsity and Circulant Attention
Authors:
Nikola Janjusevic,
Amirhossein Khalilian-Gourtani,
Adeen Flinker,
Li Feng,
Yao Wang
Abstract:
Nonlocal self-similarity within images has become an increasingly popular prior in deep-learning models. Despite their successful image restoration performance, such models remain largely uninterpretable due to their black-box construction. Our previous studies have shown that interpretable construction of a fully convolutional denoiser (CDLNet), with performance on par with state-of-the-art black…
▽ More
Nonlocal self-similarity within images has become an increasingly popular prior in deep-learning models. Despite their successful image restoration performance, such models remain largely uninterpretable due to their black-box construction. Our previous studies have shown that interpretable construction of a fully convolutional denoiser (CDLNet), with performance on par with state-of-the-art black-box counterparts, is achievable by unrolling a convolutional dictionary learning algorithm. In this manuscript, we seek an interpretable construction of a convolutional network with a nonlocal self-similarity prior that performs on par with black-box nonlocal models. We show that such an architecture can be effectively achieved by upgrading the L1 sparsity prior (soft-thresholding) of CDLNet to an image-adaptive group-sparsity prior (group-thresholding). The proposed learned group-thresholding makes use of nonlocal attention to perform spatially varying soft-thresholding on the latent representation. To enable effective training and inference on large images with global artifacts, we propose a novel circulant-sparse attention. We achieve competitive natural-image denoising performance compared to black-box nonlocal DNNs and transformers. The interpretable construction of our network allows for a straightforward extension to Compressed Sensing MRI (CS-MRI), yielding state-of-the-art performance. Lastly, we show robustness to noise-level mismatches between training and inference for denoising and CS-MRI reconstruction.
△ Less
Submitted 19 October, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
AsyCo: An Asymmetric Dual-task Co-training Model for Partial-label Learning
Authors:
Beibei Li,
Yiyuan Zheng,
Beihong Jin,
Tao Xiang,
Haobo Wang,
Lei Feng
Abstract:
Partial-Label Learning (PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problem caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allo…
▽ More
Partial-Label Learning (PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problem caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo, which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo. The code is available at https://github.com/libeibeics/AsyCo.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
Time Series Generative Learning with Application to Brain Imaging Analysis
Authors:
Zhenghao Li,
Sanyou Wu,
Long Feng
Abstract:
This paper focuses on the analysis of sequential image data, particularly brain imaging data such as MRI, fMRI, CT, with the motivation of understanding the brain aging process and neurodegenerative diseases. To achieve this goal, we investigate image generation in a time series context. Specifically, we formulate a min-max problem derived from the $f$-divergence between neighboring pairs to learn…
▽ More
This paper focuses on the analysis of sequential image data, particularly brain imaging data such as MRI, fMRI, CT, with the motivation of understanding the brain aging process and neurodegenerative diseases. To achieve this goal, we investigate image generation in a time series context. Specifically, we formulate a min-max problem derived from the $f$-divergence between neighboring pairs to learn a time series generator in a nonparametric manner. The generator enables us to generate future images by transforming prior lag-k observations and a random vector from a reference distribution. With a deep neural network learned generator, we prove that the joint distribution of the generated sequence converges to the latent truth under a Markov and a conditional invariance condition. Furthermore, we extend our generation mechanism to a panel data scenario to accommodate multiple samples. The effectiveness of our mechanism is evaluated by generating real brain MRI sequences from the Alzheimer's Disease Neuroimaging Initiative. These generated image sequences can be used as data augmentation to enhance the performance of further downstream tasks, such as Alzheimer's disease detection.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Asymmetric Hard X-ray Radiation of Two Ribbons in a Thermal-Dominated C-Class Flare
Authors:
Guanglu Shi,
Li Feng,
Jun Chen,
Beili Ying,
Shuting Li,
Qiao Li,
Hui Li,
Ying Li,
Kaifan Ji,
Yu Huang,
Weiqun Gan,
the LST team
Abstract:
The asymmetry in hard X-ray (HXR) emission at the footpoints (FPs) of flare loops is a ubiquitous feature closely associated with nonthermal electron transport. We analyze the asymmetric HXR radiation at two flare ribbons which is thermal-dominated during a long-duration C4.4 flare that occurred on March 20, 2023, combining multi-view and multi-waveband observations from the ASO-S, SolO, and SDO s…
▽ More
The asymmetry in hard X-ray (HXR) emission at the footpoints (FPs) of flare loops is a ubiquitous feature closely associated with nonthermal electron transport. We analyze the asymmetric HXR radiation at two flare ribbons which is thermal-dominated during a long-duration C4.4 flare that occurred on March 20, 2023, combining multi-view and multi-waveband observations from the ASO-S, SolO, and SDO spacecraft. We find that the H I Ly$α$ emission captures similar features to the He II $λ$304 in both light curve and spatio-temporal evolution of a pair of conjugate flare ribbons. The spectra and imaging analysis of the HXR emission, detected by STIX in 4-18 keV, reveal that the two-ribbon flare radiation is thermal dominated by over 95%, and the radiation source mainly concentrates on the northern ribbon, leading to an asymmetric distribution. To understand the underlying reasons for the HXR radiation asymmetry, we extrapolate the magnetic field within the active region using the NLFFF model. For 78% of the magnetic field lines starting from the northern flare ribbon, their lengths from the loop-tops (LTs) to the northern FPs are shorter than those to the southern FPs. For 62% of the field lines, their magnetic field strengths at the southern FPs exceed those at the northern FPs. In addition, considering the larger density, $\approx1.0\times10^{10}$ cm$^{-3}$, of the low-lying flare loops (< 32 Mm), we find the shorter path from the LT to the northern FP enables more electrons to reach the northern FP more easily after collisions with the surrounding plasma. Therefore, in this thermal-dominated C-class flare, the asymmetric location of the flare LT relative to its two FPs plays a dominant role in the HXR radiation asymmetry, while such asymmetry is also slightly influenced by the magnetic mirror effect resulting in larger HXR radiation at the FPs with weaker magnetic strength.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Authors:
Mengcheng Lan,
Chaofeng Chen,
Yiping Ke,
Xinjiang Wang,
Litong Feng,
Wayne Zhang
Abstract:
Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades…
▽ More
Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades segmentation quality. With a comparative analysis of statistical properties in the residual connection and the attention output across different pretrained models, we discover that CLIP's image-text contrastive training paradigm emphasizes global features at the expense of local discriminability, leading to noisy segmentation results. In response, we propose ClearCLIP, a novel approach that decomposes CLIP's representations to enhance open-vocabulary semantic segmentation. We introduce three simple modifications to the final layer: removing the residual connection, implementing the self-self attention, and discarding the feed-forward network. ClearCLIP consistently generates clearer and more accurate segmentation maps and outperforms existing approaches across multiple benchmarks, affirming the significance of our discoveries.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
A $\sim 43$ GeV $γ$-ray line signature in the directions of a group of nearby massive galaxy clusters
Authors:
Yi-Zhong Fan,
Zhao-Qiang Shen,
Yun-Feng Liang,
Xiang Li,
Kai-Kai Duan,
Zi-Qing Xia,
Xiao-Yuan Huang,
Lei Feng,
Qiang Yuan
Abstract:
As the largest gravitationally bound objects in the Universe, galaxy clusters have provided the first piece of evidence for the presence of dark matter and may be suitable targets for indirect dark matter searches. Among various signals, the GeV-TeV $γ$-ray line has been taken as the smoking-gun signal of the dark matter annihilation/decay since no known astrophysical/physical process(es) could ge…
▽ More
As the largest gravitationally bound objects in the Universe, galaxy clusters have provided the first piece of evidence for the presence of dark matter and may be suitable targets for indirect dark matter searches. Among various signals, the GeV-TeV $γ$-ray line has been taken as the smoking-gun signal of the dark matter annihilation/decay since no known astrophysical/physical process(es) could generate such a peculiar spectrum. With 15.5 years of Fermi-LAT P8R3 publicly available data, we search for the $γ$-ray line emission in the directions of a group of 13 nearby massive galaxy clusters with an unbinned likelihood analysis. A $γ$-ray line signal at $\sim 43.2$ GeV has a net TS value of $\approx 30$ if we only take into account the data in the directions of Virgo, Fornax and Ophiuchus clusters, three massive clusters with the highest J-factors expected to generate the dark matter annihilation signal. The signal still presents when the data of other 10 nearby massive clusters have also been included, though the TS value decreases to $\approx 21$ likely because of their lower signal-to-noise ratios. The absence of this signal in the inner Galaxy disfavors both the instrumental effect and the canonical dark matter annihilation interpretation, and a more sophisticated dark matter model or very peculiar astrophysical scenario might be needed. This $γ$-ray line signal, if intrinsic, could be unambiguously verified by the Very Large Area $γ$-ray Space Telescope in its first two years of performance.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.